Difference between revisions of "Linked Data Basics for Techies"

From OpenOrg
Jump to: navigation, search
(Common Namespaces)
(RDF & Triples)
Line 1: Line 1:
  
 
INTRO: This needs to be simple list of useful tips. We need to avoid overloading people.
 
INTRO: This needs to be simple list of useful tips. We need to avoid overloading people.
 +
 +
== Intended Audience ==
 +
 +
This is intended to be a crash course for a techie/programmer who needs to learn the basics ASAP. It is not intended as an introduction for managers or policy makers (I suggest looking at [http://www.google.co.uk/search?q=ted+berners-lee Tim Berners-Lee's TED talks] if you want the executive summary).
  
 
== RDF & Triples ==
 
== RDF & Triples ==
  
RDF is a way of structuring information to be very interoperable and extendable. The most simple unit of information is called a 'triple'. A triple consists of three parts; the ID of a thing, the ID of a property that relates that thing to another thing or a value (text or a date or a number or something).
+
RDF is a way of structuring information to be very interoperable and extendable. The most simple unit of information is called a 'triple'. A triple consists of three parts;  
 +
# the ID of a thing,  
 +
# the ID of a property that relates that thing to another thing or a value  
 +
# the ID of the thing it relates to OR a value, like text, a number or date.
 +
 
 +
For example:
 +
<Person23> <hasDateOfBirth> "1969-05-23" .
 +
<Person23> <name> "Marvin Fenderson" .
 +
<Person23> <memberOf> <Group003> .
  
 
The first thing is called the '''Subject'''. The property is sometimes called a '''Predicate''' or '''Relation''', the last bit is the '''Object'''. If the last bit is a value rather than the ID of a thing it's called a '''Literal'''. ID's may represent absolutely anything, but we use web addresses for them. These are called '''URIs'''. It can be confusing at first because http://webscience.org/person/6 refers to a person, not a webpage, but it's a very handy way to ensure that these IDs are globally unique.
 
The first thing is called the '''Subject'''. The property is sometimes called a '''Predicate''' or '''Relation''', the last bit is the '''Object'''. If the last bit is a value rather than the ID of a thing it's called a '''Literal'''. ID's may represent absolutely anything, but we use web addresses for them. These are called '''URIs'''. It can be confusing at first because http://webscience.org/person/6 refers to a person, not a webpage, but it's a very handy way to ensure that these IDs are globally unique.
Line 13: Line 25:
  
 
If you resolve a URI it's considered good practice to return some useful triples about the concept the URI represents, but don't lose sleep over doing that -- it's an optional bonus feature of RDF.
 
If you resolve a URI it's considered good practice to return some useful triples about the concept the URI represents, but don't lose sleep over doing that -- it's an optional bonus feature of RDF.
 +
 +
One little caveat, a literal can have a type (like integer or string, also represented by a URI, but we still call this a "Triple" (yes that's dumb)).
 +
 +
== Lists ==
 +
 +
You can do lists by saying something like,
 +
<Person> <hasToDoList> <List0001> .
 +
<List0001> <label> "Marvin's TODO List" .
 +
<List0001> rdf:_1 "Buy Milk" .
 +
<List0001> rdf:_2 "Walk Dog" .
 +
<List0001> rdf:_3 "Drink Milk" .
  
 
== RDF Documents ==
 
== RDF Documents ==

Revision as of 17:31, 15 January 2011

INTRO: This needs to be simple list of useful tips. We need to avoid overloading people.

Intended Audience

This is intended to be a crash course for a techie/programmer who needs to learn the basics ASAP. It is not intended as an introduction for managers or policy makers (I suggest looking at Tim Berners-Lee's TED talks if you want the executive summary).

RDF & Triples

RDF is a way of structuring information to be very interoperable and extendable. The most simple unit of information is called a 'triple'. A triple consists of three parts;

  1. the ID of a thing,
  2. the ID of a property that relates that thing to another thing or a value
  3. the ID of the thing it relates to OR a value, like text, a number or date.

For example:

<Person23> <hasDateOfBirth> "1969-05-23" .
<Person23> <name> "Marvin Fenderson" .
<Person23> <memberOf> <Group003> .

The first thing is called the Subject. The property is sometimes called a Predicate or Relation, the last bit is the Object. If the last bit is a value rather than the ID of a thing it's called a Literal. ID's may represent absolutely anything, but we use web addresses for them. These are called URIs. It can be confusing at first because http://webscience.org/person/6 refers to a person, not a webpage, but it's a very handy way to ensure that these IDs are globally unique.

The neat thing about this structure is that you can represent almost any other kind of data using it. It's not great at doing ordered lists of values.

A URI represents a single concept or thing, but many URIs can represent the same thing.

If you resolve a URI it's considered good practice to return some useful triples about the concept the URI represents, but don't lose sleep over doing that -- it's an optional bonus feature of RDF.

One little caveat, a literal can have a type (like integer or string, also represented by a URI, but we still call this a "Triple" (yes that's dumb)).

Lists

You can do lists by saying something like,

<Person> <hasToDoList> <List0001> .
<List0001> <label> "Marvin's TODO List" .
<List0001> rdf:_1 "Buy Milk" .
<List0001> rdf:_2 "Walk Dog" .
<List0001> rdf:_3 "Drink Milk" .

RDF Documents

RDF+XML

There are several ways of writing RDF triples into a file. The most common is called RDF+XML (which people often just called RDF). It usually looks something like this:

 <foaf:Person rdf:about='http://webscience.org/person/7'>
    <foaf:name>Christopher Gutteridge</foaf:name>
 </foaf:Person>

If you want to produce RDF+XML See this Guide. To parse RDF+XML just find and use a library, there's one in most popular langauges!

This wiki uses a simple subset of RDF+XML for examples.

Turtle (aka N3)

N3 is quite complicated so some bright person defined a cut-down version called Turtle which is really easy to read and write, but is sadly not as widely supported as RDF+XML.

Turtle looks something like this:

 <http://webscience.org/person/7> a foaf:Person ;
    foaf:name "Christopher Gutteridge" .

rdf:type and classes

The most common predicate (property) is 'rdf:type' to relate a thing to a class. For example, relating me to the fact I'm of rdf:type foaf:Person. Things csn have any number of types.

The 'object' of rdf:type is often referred to as a class.

Namespaces

For URIs it's common to define a bunch of relate concepts in the same "namespace". A namespace is bit like a directory on a filesystem; it usually ends with either "/" and "#" and the ID's in the namespace generally don't have "/" or "#" in as that confuses things.

You will probably define your own namespace for your own concepts, such as your organistations members, or the bus stops nearby, but for classes and predicates you'll often use other people's namespaces. in RDF+XML and Turtle it's common to use a namespace prefix to make things more readable. In RDF+XML you must use namespace prefixes for predicates. The following examples mean exactly the same thing: Example 1

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/">
  <rdf:Description rdf:about="http://webscience.org/person/7">
     <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person" />
     <foaf:name>Christopher Gutteridge</foaf:name>
  </rdf:Description>
</rdf:RDF>

Same as Example 2

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix wsperson: <http://webscience.org/person/> .
wsperson:7 rdf:type foaf:Person .
wsperson:7 foaf:name "Christopher Gutteridge" .

Same as Example 3

<http://webscience.org/person/7> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://webscience.org/person/7> <http://xmlns.com/foaf/0.1/name> "Christopher Gutteridge" .

http://prefix.cc/

You can get a list of the standard abbreviations for common namespaces using http://prefix.cc/

If you're really lazy, you can get the stub of an XML document out of it, for example http://prefix.cc/foaf,dcterms,geo.rdf

Common Namespaces

Here's a quick summary of the most common namespaces.

rdf - has the core parts of RDF, but usually you'll only see rdf:type.

rdfs - used for making statements about predicates and classes, also has rdfs:label and rdfs:comment which are good basic ways of giving something a label and making comments about it.

owl - used for making much more complex statements about predicates and classes. This is cool, but don't worry about it too much when you're just getting started. Also defines owl:sameAs to indicate two URIs represent the same thing (in your opinion).

dcterms - Dublin Core terms. Very useful generic properties for making statements about resources -- who created them, when, who published it, title, description etc. An older version is called 'Dublin Core Elements' and this can be confusing. In general always use dcterms. Some people use "dc" as the abbreviation, but this is confusing as it's not obvious if it's dc-terms or dc-elements, so don't do that.

foaf - Friend of a Friend. This is good for describing the facts from a person or organisations 'profile', things like their email address, phone numbers, names, what groups they are a member of etc.

geo - Useful for giving a latitude & longitude of something. If it's a big thing, then you can use this to give a useful reference point to navigate to.

dbpedia - DBPedia defines a URI for the primary of every page of wikipedia. So http://dbpedia.org/resource/Southampton is a URI representing the city of Southampton.

Semantics

The semantic bit of the semantic web is that if you resolve the URI for a class or predicate you often get back some rdfs: and/or owl: describing what it means, and some semantics about it. This lets you do clever reasoning like knowing that 'ancestor' is a transative property so the ancestor of an ancestor of X is therefore also an ancestor of X.

This is complicated, and not required to get started. If it confuses you, don't worry too much about it.

Triple Stores and SPARQL

A triple store is a bit like an SQL database, but optimised to just import, store, and query a huge pile of triples. Triple stores are queried using a language called SPARQL.

They are funky because rather than deal with triples document by document you can query over any facets of the data in the "SPARQL Endpoint". If you have the staff resources to do so, it's good practice to provide a SPARQL endpoint, but don't lose sleep if you don't.

These are useful and powerful but not required to produce and work with RDF and Open Linked Data.