Difference between revisions of "Linked Data Basics for Techies"

From OpenOrg
Jump to: navigation, search
(Open Linked Data)
Line 211: Line 211:
This site is used by people to ask where to find open data:  http://getthedata.org/
This site is used by people to ask where to find open data:  http://getthedata.org/
= Open Linked Data =
= Open Linked Data =

Revision as of 16:49, 20 January 2011


Intended Audience

This is intended to be a crash course for a techie/programmer who needs to learn the basics ASAP. It is not intended as an introduction for managers or policy makers (I suggest looking at Tim Berners-Lee's TED talks if you want the executive summary).

It's primarily aimed at people who're tasked with creating RDF and don't have time to faff around. It will also be useful to people who want to work with RDF data. RDF is a data structure perfect for people creating mash-ups!

Corrections and Improvements

If you are new to RDF & find that something isn't explained very well, please drop me a line so I can improve it. cjg@ecs.soton.ac.uk - this wiki is *not* editable by the public, it's just a wiki to make it easy for a few of us to edit.


Some things in this guide are deliberately over-simplified. It is intended to get you started really fast, rather than cover every facet of the subject.

RDF Data

RDF & Triples

RDF is a way of structuring information to be very interoperable and extendable. The most simple unit of information is called a 'triple'. A triple consists of three parts;

  1. the ID of a thing,
  2. the ID of a property that relates that thing to another thing or a value
  3. the ID of the thing it relates to OR a value, like text, a number or date.

For example:

<Person23> <hasDateOfBirth> "1969-05-23" .
<Person23> <name> "Marvin Fenderson" .
<Person23> <memberOf> <Group003> .

The first thing is called the Subject'. The second is sometimes called a Predicate,Property or Relation, the last bit is the Object. If the last bit is a value rather than the ID of a thing it's called a Literal. ID's may represent absolutely anything, but we use web addresses for them. These are called URIs (note that URI and URL are slightly different. It can be confusing at first because http://webscience.org/person/6 refers to a person, not a webpage, but it's a very handy way to ensure that these IDs are globally unique.

One little caveat, a literal can have a datatype (like integer or string, also represented by a URI, but we still call this a "Triple" (yes that's dumb)).

The neat thing about this structure is that you can represent almost any other kind of data using it. It's not great at doing ordered lists of values.


A URI represents a single concept or thing, but many URIs can represent the same thing.

If you resolve a URI it's considered good practice to return some useful triples about the concept the URI represents, but don't lose sleep over doing that -- it's an optional bonus feature of RDF.

All URLs are URIs. Not all URIs are URLs.

URI: Universal Resource Indicator - this identifies something uniquely.

URL: Universal Resource Location - this not only identifies something, but also describes where it is located.


<http://dbpedia.org/resource/Julius_Caesar> is a URI for Julius Caesar.

<http://en.wikipedia.org/wiki/Julius_Caesar> is a URL (and therefore also a URI) for a web page about Julius Caesar.

There is no URL for Julius Caesar as you can't download him via the web as he's dead and also not a string of ones and zeros!

Two URIs may indicate the same concept, just like two URLs may return exactly the same document. <http://www4.wiwiss.fu-berlin.de/gutendata/resource/people/Caesar_Julius_100_BC-44_BC> is another URI for Julius Caesar, created by a different organisation. You can choose if you treat two URIs as referring to the same concept or not. It depends on the problem you're trying to solve. There are no objective truths!

RDF Documents


There are several ways of writing RDF triples into a file. The most common is called RDF+XML (which people often just called RDF). It usually looks something like this:

 <foaf:Person rdf:about='http://webscience.org/person/7'>
    <foaf:name>Christopher Gutteridge</foaf:name>

If you want to produce RDF+XML See this Guide. To parse RDF+XML just find and use a library, there's one in most popular langauges!

This wiki uses a simple subset of RDF+XML for examples.

Turtle (and N3)

N3 is quite complicated so some bright person defined a cut-down version called Turtle which is really easy to read and write, but is sadly not as widely supported as RDF+XML.

Turtle looks something like this:

 <http://webscience.org/person/7> a foaf:Person ;
    foaf:name "Christopher Gutteridge" .

Note that this is NOT XML. The angle brackets just go around URIs which are not abbreviated using a prefix. (see later in this guide)


RDFa is a way to embed triples into an HTML document. It can be confusing for beginners, but some software tools generate valid RDFa which is fine, but don't try to create it by hand until you get some experience!

Other Serialisations

There's one for JSON, and N-Triples which just writes out triples, one per line (and is a subset of Turtle).

rdf:type and classes

The most common predicate (property) is 'rdf:type' to relate a thing to a class. For example, relating me to the fact I'm of rdf:type foaf:Person. Things can have any number of types.

The 'object' of rdf:type is often referred to as a class.


For URIs it's common to define a bunch of related concepts in the same "namespace". A namespace is bit like a directory on a filesystem; it usually ends with either "/" and "#" and the IDs in the namespace generally don't have "/" or "#" in as that confuses things.

You will probably define your own namespace for your own concepts, such as your organistations members, or the bus stops nearby, but for classes and predicates you'll often use other people's namespaces. in RDF+XML and Turtle it's common to use a namespace prefix to make things more readable. In RDF+XML you must use namespace prefixes for predicates. The following examples mean exactly the same thing:

Example 1 (in RDF+XML you have to use a namespace for the predicates to make them legal XML tags)

<?xml version="1.0" encoding="utf-8"?>
  <rdf:Description rdf:about="http://webscience.org/person/7">
     <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person" />
     <foaf:name>Christopher Gutteridge</foaf:name>

Same as Example 2 (Turtle, but represents the same data as above). Turtle auto defines 'rdf:' so you don't have to (unlike RDF+XML where you always have to define it)

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix wsperson: <http://webscience.org/person/> .
wsperson:7 rdf:type foaf:Person .
wsperson:7 foaf:name "Christopher Gutteridge" .

Same as Example 3 (N-Triples, but the same actual data)

<http://webscience.org/person/7> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://webscience.org/person/7> <http://xmlns.com/foaf/0.1/name> "Christopher Gutteridge" .
  • See also the prefix.cc service listed in the "Tools and Services" section of this guide.

Some Common Namespaces

Here's a quick summary of the most common namespaces.

rdf - has the core parts of RDF, but usually you'll only see rdf:type.

rdfs - used for making statements about predicates and classes, also has rdfs:label and rdfs:comment which are good basic ways of giving something a label and making comments about it.

owl - used for making much more complex statements about predicates and classes. This is cool, but don't worry about it too much when you're just getting started. Also defines owl:sameAs to indicate two URIs represent the same thing (in your opinion).

dcterms - Dublin Core terms. Very useful generic properties for making statements about resources -- who created them, when, who published it, title, description etc. An older version is called 'Dublin Core Elements' and this can be confusing. In general always use dcterms. Some people use "dc" as the abbreviation, but this is confusing as it's not obvious if it's dc-terms or dc-elements, so don't do that.

foaf - Friend of a Friend. This is good for describing the facts from a person or organisations 'profile', things like their email address, phone numbers, names, what groups they are a member of etc.

geo - Allows you to specify a latitude & longitude of something. Even of it's a big thing, then you can still give a useful reference point to navigate to.

dbpedia - DBPedia defines a URI for the primary of every page of wikipedia. So http://dbpedia.org/resource/Southampton is a URI representing the city of Southampton.

Also: skos, sioc, void, dcat, geonames, doap, vcard, org, event, prog, bibo, aiiso, vann, scovo

For a list of the namespaces on prefix.cc see; http://prefix.cc/popular/all

Semantics (Schemas, OWL, Ontologies etc)

The semantic bit of the semantic web is that if you resolve the URI for a class or predicate you often get back some rdfs: and/or owl: describing what it means, and some semantics about it. This lets you do clever reasoning like knowing that 'ancestor' is a transative property so the ancestor of an ancestor of X is therefore also an ancestor of X.

This is complicated, and not required to get started. If it confuses you, don't worry too much about it.

Purists insist you should write schemas, just ignore them until YOU find a need to write a schema (eventually it turns out to be useful, but there's no hurry).


You can do lists by saying something like,

<Person> <hasToDoList> <List0001> .
<List0001> <label> "Marvin's TODO List" .
<List0001> rdf:type rdf:Seq .
<List0001> rdf:_1 "Buy Milk" .
<List0001> rdf:_2 "Walk Dog" .
<List0001> rdf:_3 "Drink Milk" .

Tools and Services


You can get a list of the standard abbreviations for common namespaces using http://prefix.cc/

If you're really lazy, you can get the stub of an XML document out of it, for example http://prefix.cc/foaf,dcterms,geo.rdf

Triple Stores and SPARQL

A triple store is a bit like an SQL database, but optimised to just import, store, and query a huge pile of triples. Triple stores are queried using a language called SPARQL.

They are funky because rather than deal with triples document by document you can query over any facets of the data in the "SPARQL Endpoint". If you have the staff resources to do so, it's good practice to provide a SPARQL endpoint, but don't lose sleep if you don't.

These are useful and powerful but not required to produce and work with RDF and Open Linked Data.


It's quite easy to write valid RDF serialisations, but you can check them using the W3C Validator Service, what is also useful is to eyeball your data to make sure it looks sane. I use my own RDF Browser but you may prefer others. The Graphite Browser will show triples from any of N3, RDF+XML and RDFa.

Command Line Tools

There's a C-Library and unix/linux command line tool called "rapper" which can parse and validate various formats.

Guide to how to produce datasets

Well, this website is for that!

Other sites you should know about

Semantic Overflow


Allows you to ask questions about this stuff, and see existing questions and answeres.



This site is a comprehensive index of sources of data sets on the web (many, but not all, in RDF). You should consider registering your datasets if you want other people to find them.


This site is used by people to ask where to find open data: http://getthedata.org/

Open Linked Data

Tim BL, inventer of the web, says this is cool and you should be doing it. See if you agree...

Linked Data

What is Linked Data? It's when you either:

  • Use URIs for Subjects and/or Objects from other peoples datasets.
  • Use owl:sameAs to link your identifiers to other peoples.

This lets people do really cool mashups using data from multiple sources. There's over 100 known sites in the world who link to other dataasets. See The Linked Data Cloud to see how they interlink.

It doesn't have to be RDF, but it usually is.

Open Data

Open data is data with an open license which makes it easy for people to reuse it with confidence, and available online, to all, without restrictions.

Open Linked Data

Is the combination of the two, obviously.