W3C

RDFa for HTML Authors,

Steven Pemberton, W3C/CWI

Version date: 2009-05-14

Introduction

RDFa is a thin layer of markup you can add to your web pages that makes them understandable for machines as well as people. You could describe it as a CSS for meaning. By adding it, browsers, search engines, and other software can understand more about the pages, and in so doing offer more services or better results for the user. For instance, if a browser knows that a page is about an event such as a conference, it can offer to add it to your calendar, show it on a map, locate hotels or flights, or any number of other things.

This document introduces RDFa and gives examples of its use.

Table of Contents

Metadata in HTML

If you know HTML markup, you will know that you can add metadata to an HTML document by adding <meta> and <link> elements in the head. For instance:

<meta name="description" content="A site about fish" />

gives a description of the current document. You could say that the current page has a description property, whose value is "A site about fish".

Similarly you can say:

<link rel="next" href="thecod.html" />

which says that if you consider this page as one in a series of pages, the next one is thecod.html. In other words, this page has a next relation to thecod.html.

There are a smattering of other places in HTML where you can add some metadata, such as the title element and attribute in places, and the cite attribute on <blockquote> and others, but that is about it.

In passing, you might wonder why you can't say

<meta name="description">A site about fish</meta>

and the answer is simply that at the time this feature was added to HTML, some browsers would incorrectly have displayed the text in the meta element, even though it was in the <head> and so to prevent that happening the content was put in an attribute instead (this, by the way, is being fixed in XHTML2).

The Use of Metadata

Typically the metadata in a document is used for several purposes:

and so on.

RDF: A Generalised Way of Representing Metadata

In the time since the meta element was added to HTML, a generalised way of representing metadata has been defined at W3C. This is called RDF, the resource description framework ('resource' roughly speaking means 'document' here, but you'll see examples of other things than documents later).

RDF is a very simple framework. Essentially all knowledge is gathered as assertions of the form:

URI — property — value

where 'URI' is the URI of the thing being described, 'property' is (the URI of) a property, and 'value' is the value that that property can take, either another URI, a literal string, or a chunk of XML.

So assuming the example document above has a URL of http://www.example.com/home.html, then the RDF assertion, or triple as it is often called, for the description property is

http://www.example.com/home.html — [html:description] — "A site about fish"

and the RDF triple for the next relation would be

http://www.example.com/home.html — [html:next] — http://www.example.com/thecod.html

The value [html:next] means here "the url that represents the HTML next property", and is expressed here as a Compact URI or CURIE for short. More on those later.

Extending Metadata in XHTML

RDFa extends the possibilities of metadata in XHTML, by generalising the attributes on meta and link and allowing them to be used on any element, not just meta and link (so that you may now have metadata in the body of the page as well as the head) and then defining how those attributes can be interpreted as RDF.

To take a simple example, many people add a number of so-called Dublin Core properties to their pages, such as title and author (which is called creator in Dublin Core, since the properties can be used with other things, such as paintings):

<html xmlns="http://www.w3.org/1999/xhtml">
<head profile="http://dublincore.org/documents/dcq-html/">
   <title>John Smith's Fish of the World</title>
   <meta name="DC.title" content="Fish of the World"/>
   <meta name="DC.creator" content="John Smith"/>
</head>
<body>
   <h1>Fish of the World</h1>
   <p>by John Smith</p>

The Dublin Core Metadata Initiative organization defined these and other properties for defining the metadata about books, works of art and so on. You can see in the example above that they duplicate information in the document itself. A nice thing about RDFa is that you can attach the properties to the document text instead:

<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/elements/1.1/">
<head>
   <title>John Smith's Fish of the World</title>
</head>
<body>
   <h1 property="dc:title">Fish of the World</h1>
   <p>by <span property="dc:creator">John Smith</span></p>

What this does is declare that we are going to use the Dublin Core properties, and prefix them with dc:. It then attaches the Dublin Core properties title and creator to the relevant parts of the text. Of course, a major advantage of this is that the visible versions in the text don't get out of sync with the metadata versions.

Compact URIs

In the last example we had property="dc:title". This says "the property called title from the vocabulary identified by dc:". But we also said earlier that a property was kept as a URI. A form such as dc:title is called a Compact URI, or CURIE for short. The URI it represents is just the concatenation of the URI in the declaration of the prefix (in this case xmlns:dc="http://purl.org/dc/elements/1.1/") and whatever follows the colon. So in this case dc:title is a short form of the full URI http://purl.org/dc/elements/1.1/title. (You can now probably see why CURIEs are nice to have.)

In the case that there is no prefix (as in the case of something like rel="index"), then a default prefix is used. For XHTML that default is http://www.w3.org/1999/xhtml/vocab#.

Using rel

Using the property attribute like this gives you an equivalent of the meta element, but then in the text of your page. To get the equivalent of a link element, you use the rel attribute. For instance, pages often have a clickable "Next" link to take you to the next page:

<a href="thecod.html">Next</a>

It can be expressed like this:

<a href="thecod.html" rel="next">Next</a>

Similarly,

<a href="page2.html">Back</a>

can be written

<a href="page2.html" rel="prev">Back</a>

Another typical use for rel is to use it to point to the copyright or licensing information of a page. Instead of:

<a href="copyright.html">Copyright</a>

You can write

<a href="copyright.html" rel="copyright">Copyright</a>

(By the way, it doesn't matter what order you put the href and rel in.)

Of course, you could already do this in HTML. What is new is that it is now defined how to interpret this as RDF, and, as you will later see, you can apply it to more than just <a> elements.

Talking about Other Documents

Most of the metadata in HTML only allows you to talk about the document itself, and in all the examples we have given so far, we have been giving metadata about the page in question. But you may want to be able to talk about other things than just the current document (and you will see more examples of this shortly).

For this you can use the about attribute to specify what it is the information applies to. For instance, suppose you link to some data:

Here is a plot of the data: <img src="plot.png" alt="Rainfall 1900-1999"/>.
The <a href="rainfall.csv">raw data</a> is available.

and you want to include the licensing conditions of that data:

The data is available under <a href="license.html">these conditions</a>.

then you can say this:

The data is available under
<a about="rainfall.csv" rel="license" href="license.html">these conditions</a>.

If you use about on a container element, like a <p> then the about applies to all the contained relations:

<p about="rainfall.csv">
   The data <strong property="dc:title">Rainfall 1900-1999</strong>
   is the property of <em property="dc:creator">Data Be We, Inc</em>
   and is available under
   <a rel="license" href="license.html">these conditions</a>.
</p>

Using URIs and CURIEs in the about attribute

Note that the about attribute contains a URI. It can point to anything on the Web:

<p about="http://www.w3.org/TR/rdfa-syntax">The title of the RDFa specification is
  <em property="dc:title">RDFa in XHTML: Syntax and Processing</em>...</p>

Occasionally you may want to use a CURIE instead of a URI in about (as you will see shortly), and so to distinguish a CURIE from a URI in those cases, you enclose a CURIE in square brackets. For instance, suppose you had defined xmlns:tr="http://www.w3.org/TR/", then you could write the above in the following way:

<p about="[tr:rdfa-syntax]">The title of the RDFa specification is
  <em property="dc:title">RDFa in XHTML: Syntax and Processing</em>...</p>

Talking about People, Places and Things

Up to now we have been talking about assigning properties to things with URIs. But there is a problem: not everything that you might want to talk about has a URI. The city of Amsterdam doesn't have a URI. Nor does a person, or an object like a car, or a concept like love. Of course, these things have pages about them, but that is different. It is important not to confuse a website about something with that thing itself.

To take an example to explain the difference, suppose we want to say that T.S. Eliot is the author of the poem The Waste Land. Well, we might do a search for the poem, and find http://en.wikipedia.org/wiki/The_Waste_Land. You might then be tempted to say:

<span about="http://en.wikipedia.org/wiki/The_Waste_Land"
      property="dc:creator">T.S. Eliot</span>

Unfortunately, this says that T.S. Eliot created the Wikipedia page, which is patently not true. So what do we do?

Well, RDFa has a notation that allows you to create a local name for something that doesn't have a URI (or that has a URI that you don't know), and say something about it anyway:

<link about="[_:TheWasteLand]" rel="foaf:isPrimaryTopicOf"
      href="http://en.wikipedia.org/wiki/The_Waste_Land" />

The "_:" is a reserved prefix for this notation. You can put any identifier after the colon. What this says is "There is something (which we shall call 'TheWasteLand') which is the primary topic of the page at http://en.wikipedia.org/wiki/The_Waste_Land."

Now that we have uniquely identified the poem we can record that its creator was 'T.S. Eliot'":

<span about="[_:TheWasteLand]" property="dc:creator">T.S. Eliot</span>

(By the way, the foaf properties are identified by xmlns:foaf="http://xmlns.com/foaf/0.1/").

In this way we can mint all sorts of names for people, places, organizations and other things that haven't got URIs, and uniquely identify them. A person:

<link about="[_:StevenPemberton]"
      rel="foaf:isPrimaryTopicOf" href="http://www.cwi.nl/~steven/" />

A place:

<link about="[_:Amsterdam]"
      rel="foaf:isPrimarytopicOf" href="http://www.amsterdam.nl/" />

An organization:

<link about="[_:W3C]"
      rel="foaf:isPrimaryTopicOf" href="http://www.w3.org/" />

And then we can use those names in order to talk about them:

<a about="[_:W3C]" rel="foaf:homepage" href="http://www.w3.org/">W3C</a>

These special CURIEs beginning "_:" are called blank nodes or bnodes. Note that they are local to a document, so you have to redeclare them in each document that you use them.

By the way, the important thing with blank nodes is to uniquely identify them by some means if you can. foaf:isPrimaryTopicOf is one way, but any property that is unique will work. For instance:

<link about="[_:StevenPemberton]" rel="foaf:mbox" href="mailto:steven@w3.org" />

is just as good, since there is only one person who has that email address, and so we have uniquely identified that person.

Note that since an empty URI "" means 'the current page', on your own home page you can add code like

<link about="[_:StevenPemberton]" rel="foaf:primaryTopicOf" href=""/>

which says "The thing we call StevenPemberton is the primary topic of this page".

Overriding the Content

Sometimes although the content contains information that needs to be tagged, it is not always in the form you need it. For instance:

<p>Amsterdam is located
    at latitude 52°22'23"N and longitude 4°53'32"E</p>

While there are properties for recording latitude and longitude, they expect the values to be decimal numbers. Well we can write this:

<p about="[_:Amsterdam]"><span property="dc:name">Amsterdam</span> is located
   at latitude <span property="geo:lat" content="52.373">52°22'23"N</span>
 and longitude <span property="geo:long" content="4.892">4°53'32"E</span></p>

This is of course the same contentattribute you know from the meta element. Its value overrides whatever is in the content of the element.

(The geo properties are at xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#")

Swapping subject and object

A lesser-used but nevertheless useful relationship in HTML is the reverse relationship rev. This relationship is like rel, but reverses the relationship. For instance, if a document doc.html is indexed by the page index.html, then doc.html can record this fact with the link:

<link rel="index" href="index.html"/>

However, index.html can also record the relationship:

<link rev="index" href="doc.html"/>

which says "this page is the index for doc.html".

You can use rev similarly in RDFa. All it does is swap the subject (the 'about') with the object (the 'href'). For instance, suppose we have a set of data about a person:

<p about="[_:StevenPemberton]">
   Name: <span property="foaf:name">Steven Pemberton</span>
   Mail: <a rel="foaf:mbox" href="mailto:steven@w3.org">steven@w3.org</a>
</p>

Now, foaf has a property img that says that a particular image is a picture of some person. But the relationship is from the picture, to the person. What we would like to say is:

<link about="Steven.jpg" rel="foaf:img" href="[_:StevenPemberton]"/>

except that at the moment we are talking about the person, and not the image. So if we want to add this information to the block above, we just reverse the relationship with rev:

<p about="[_:StevenPemberton]">
   Name: <span property="foaf:name">Steven Pemberton</span>
   Mail: <a rel="foaf:mbox" href="mailto:steven@w3.org">steven@w3.org</a>
   Mugshot: <a rev="foaf:img" href="Steven.jpg">Photo</a>
</p>

Note that you can have (if you want) both rel and rev on an element:

<a rel="next" rev="prev" href="page2.html">Next</a>

(Not that this example gives you very much in terms of extra information!)

Advanced topics

You now know enough to use RDFa for day-to-day use, but there are a few extras you might find useful.

The resource attribute

Alongside the href attribute, there is also a resource attribute with the same purpose, but usable when you don't want the link to be clickable, or you want to use a CURIE (since you can't use a CURIE in href):

The photo is entitled
  <em about="Steven.jpg" rel="foaf:img" resource="[_:StevenPemberton]">Steven in London</em>

Note in passing that you may have more than one relation on an element. So we could also say:

The photo is entitled
  <em about="Steven.jpg" rel="foaf:img" resource="[_:StevenPemberton]"
      property="dc:title">Steven in London</em>

Packaging a group of relations

Often a group of properties together make up a whole. For instance an event can have a title, a description, a location, and a start and end date. If you want to say that a section of markup contains such a group of properties, you can use the typeof attribute. For instance, to mark up a conference:

<div xmlns:event="http://www.w3.org/2002/12/cal#" typeof="event:Vevent">
     <h3 property="event:summary">WWW 2009</h3>
     <p property="event:description">18th International World Wide Web Conference</p>
     <p>To be held from
        <span property="event:dtstart" content="2009-04-20">20th April 2009</span>
        until <span property="event:dtend" content="2009-04-24">24th April</span>,
        in <span property="event:location">Madrid, Spain</span>.</p>
</div>

or a TV program:

<div typeof="event:Vevent">
    <h3 property="event:summary">Have I Got Old News For You</h3>
    <p property="event:location">BBC2</p>
    <p><span property="event:dtstart" content="2008-06-28T21:00:00">Saturday 28 June,
       9pm</span>-<span property="event:dtend" content="2008-06-28T21:30:00">9.30pm</p>
    <p property="event:description">Team captains Paul Merton and Ian
       Hislop are joined by returning guest host Jeremy Clarkson and panellists
       Danny Baker and Germaine Greer for the topical news quiz.
       <abbr title="in stereo">[S]</abbr></p>
</div>

Note the use of content here to get the dates and times into a machine-readable format.

Data Types

Occasionally you may want to specify that a particular property is of a certain data type. The datatype attribute is precisely for this purpose:

<span property="event:dtend" datatype="xsd:date" content="2009-04-24">24th April</span>

This would need an xmlns:xsd="http://www.w3.org/2001/XMLSchema".

Validating

If you want to make sure your page validates correctly, you should ensure your pages have the following at the top of the document (before the <html>).

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
  "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">

The validator at http://validator.w3.org/ will check your pages.

Looking at the results

There are a number of online services willing to extract all the properties from a RDFa-enabled page, and tell you what they are. For instance, the RDFa Distiller at http://www.w3.org/2007/08/pyRdfa/.

Summary of Attributes

about
Specifies the subject of a relationship. If not given, then the subject is the current document.
rel
Defines a relation between the subject and a URL given by either href or resource. The subject is either specified by the closest about or src attribute, @@
rev
The same as the the rel attribute, except that subject and object are reversed.
property
Defines a relationship between the subject and either a string (if the content attribute is present) or a piece of markup otherwise (the content of the element that the property attribute is on).
content
Specifies a string to use as an object for the property attribute
href
Specifies an object URI for the rev and rel attributes. Takes precedence over the resource attribute.
resource
Specifies an object URI for the rev and rel attributes if href is not present.
src
Specifies the subject of a relationship.
datatype
Specifies a datatype of the object of the property attribute (either in the content attribute, or the content of the element that the datattype attribute is on.) By default, data in the content attribute is of type string, and data in the content of an element has type xml:Literal. If datatype="" is used, then for the RDF the element content is stripped of markup, and is of type string.
typeof
Creates a blank node, which becomes the subject, and asserts that the current element contains relationships that match the given RDF type.

Examples

There are many vocabularies available across the web (called taxonomies by some), and there are more being created all the time. Here is a selection:

See the RDFa Wiki list of vocabularies and RDFa examples in the wild for some more.

Further Reading

RDFa Specification - not written for beginners, and therefore hard going, but the final arbiter on RDFa

RDFa Primer - another introduction to RDFa

rdfa.info - news and information about developments.

RDFa Wiki - community meeting place for RDFa.