SPARQL: A query platform for Web 2.0 and the Semantic Web
Takeaway: This document explains how SPARQL might be used for querying information in a Web 2.0 environment and how the SPARQL Protocol works for remote queries.
The Web 2.0 and Semantic Web are two common ideas that formulate the future of the Web. It is not yet clear which one will survive, but it is most-likely we will get a platform containing best ideas from both. Many experts claim that Web 2.0 is just a "marketing" name for Semantic Web, although some differences may still exist. The main principle outlined by both paradigms is an ability to extract and query information across the informational space which includes Web sites, documents, databases, Web services, libraries or repositories. Semantic Web has introduced a new computing paradigm based on the notion of non-ambiguous metadata descriptions that can describe not only things you can find on the Web but also things that reside in enterprise data stores and even physical objects. These metadata descriptions have been standardized by the World Wide Web Consortium as Resource Description Framework (RDF) as early as in 1999.
The SPARQL Protocol and RDF Query Language (SPARQL) [sparkle] is a query language designed to meet the requirements and design objectives described in the "RDF Data Access Use Cases". It provides facilities to:
- extract information in the form of URIs, blank nodes, plain and typed literals
- extract RDF subgraphs, and
- construct new RDF graphs based on information in the queried graphs.
As a data access language, it is suitable for both local and remote use. It's a piece of cake when we try to use SPARQL locally, but for remote use the SPARQL Protocol for RDF has been designed to be more stringent. This protocol is an interface for conveying SPARQL queries from clients to query processors, and several bindings like HTTP and SOAP have been introduced to achieve connectivity.
In this document I will explain how SPARQL might be used for querying information and how a SPARQL Protocol works for remote queries. A reader is expected to be familiar with RDF concepts.
Evolution of objectives
Although there are several standards covering RDF with regard to storing and defining data, there had not been any work done to create standards for querying or accessing RDF data. Likewise, there was no formal, publicly standardized data access protocol for interacting with remote or local RDF storage servers. There were no standards for querying RDF data when RDF storage model appeared, so many developers in commercial and in open source projects created query languages for accessing RDF data, over 20 at last count. A full list of different query language implementations can bee seen at http://www.w3.org/2001/11/13-RDF-Query-Rules/. But these languages lack both a common syntax and a common semantics. In fact, the existing query languages cover a significant semantic range: from declarative, SQL-like languages, to path languages, to rule or production-like systems. And SPARQL had to fill this gap.
SPARQL provides Web 2.0 users with a query language in much the same fashion as SQL provides relational database users with a query language.
The following requirements were taken into consideration when SPARQL was designed:
- Graph pattern matching ability – the query language must include the capability to restrict matches on a queried graph by providing a graph pattern, which consists of one or more RDF triple patterns, to be satisfied in a query;
- Variable binding results – It must be possible for queries to return zero or more bindings of variables. Each set of bindings is one way that the query can be satisfied by the queried graph;
- Subgraph results – It must be possible for query results to be returned as a subgraph of the original queried graph;
- Supportable local queries – The query language must be suitable for use in accessing local RDF data - that is, from the same machine or same system process;
- Result limits – It must be possible to specify an upper bound on the number of query results returned;
- Streaming results – It must be possible, when returning multiple unordered results, for the client to request that results be streamed. When the client requests streaming results, all the data in one result must be available to the client before all the data for the next result.
- WSDL support – The protocol – including its interfaces, their operations, results, and types – must be described using WSDL. This is essential for remote queries.
Currently SPARQL requirements have stabilized and SPARQL query language is now a Candidate Recommendation which means that it will be a standard (W3C Recommendation) at the next stage.
How to write SPARQL queries
An RDF graph is a set of triples; each triple consists of a subject, a predicate and an object. These triples can come from a variety of sources. The SPARQL query language is based on matching graph patterns. The simplest graph pattern is the triple pattern, which is like an RDF triple, but with the possibility of a variable instead of an RDF term (a simple atom in RDF structure without blank nodes) in the subject, predicate or object positions. Combining triple patterns gives a basic graph pattern, where an exact match to a graph is needed to fulfill a pattern.
The example below shows a SPARQL query to find the author of a book from the information in the given RDF graph. Let's take the following RDF information (example1.rdf):
<http://example.org/book/book1> <http://purl.org/dc/elements/1.1/author> "Peter Mikhalenko" .The query consists of two parts,
the SELECT clause and the WHERE clause. The SELECT clause identifies the variables
to appear in the query results, and the WHERE clause has one triple pattern (example1.sparql.txt):
Listing A
SELECT ?author
WHERE
{
<http://example.org/book/book1> <http://purl.org/dc/elements/1.1/author> ?author .
}
This is what we will get from this simplest query:
author ------------------“Peter Mikhalenko”The terms delimited
by "<>" are IRI references (Internationalized Resource Identifiers,
described by RFC3987).
IRIsare a generalization of
URIs and are fully compatible with URIs and URLs. SPARQL provides two abbreviation mechanisms
for IRIs, prefixed names and relative IRIs.
- Prefixed names:The
PREFIXkeyword associates a prefix label with an IRI. A prefixed name is a prefix label and a local part, separated by a colon ":". It is mapped to an IRI by concatenating the local part to the IRI corresponding to the prefix. - Relative IRIs:The
BASEkeyword defines the Base IRI used to resolve relative IRIs.
The general syntax
for literals is a string (enclosed in quotes, either double quotes "" or single quotes '' ), with either an optional language tag
(introduced by @)
or an optional datatype IRI or prefixed name (introduced by ^^).
Variables in SPARQL
queries have global scope; it is the same variable everywhere in the query that
the same name is used. Variables are indicated by "?"; the
"?" does not form part of the variable. "$" is an
alternative to "?". In a query, $varand ?var are the same variable.
Gathering all above said, let’s have a look at three examples (example2.sparql.txt, example3.sparql.txt, example4.sparql.txt) which express the same query.
The same piece of RDF data (example1.rdf) can be represented in a so-called Turtle format, which allows URIs to be abbreviated with prefixes (example1.rdf.turtle.txt):
Listing B
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix : <http://example.org/book/> .
:book1 dc:author "Peter Mikhalenko" .
The term "binding" is used as a descriptive term to refer to a pair of [variable; RDF term]. However not every binding needs to exist in every row of the table. This is how optional parts of the graph pattern may be specified syntactically with the OPTIONAL keyword applied to a graph pattern:
Let’s take a piece of data (example2.rdf.turtle.txt):
Listing C
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
_:a rdf:type foaf:Person .
_:a foaf:name "Peter" .
_:a foaf:mbox <mailto:test@peter.com> .
_:a foaf:mbox <mailto:peter@gmail.com> .
_:b rdf:type foaf:Person .
_:b foaf:name "Mary" .
The query with OPTIONAL pattern will look like this (example5.sparql.txt):
Listing D
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?mbox
WHERE { ?x foaf:name ?name .
OPTIONAL { ?x foaf:mbox ?mbox }
}
And the result will be the following:
name mbox------- -----------------------
“Peter” <mailto:test@peter.com>
“Peter” <mailto:peter@gmail.com>
“Mary”
There is no value of
mbox in the solution where the name is "Mary". It is unbound. This query finds the names
of people in the data. If there is a triple with predicate mbox and same subject, a solution will contain
the object of that triple as well. In the example, only a single triple pattern
is given in the optional match part of the query but, in general, it is any
graph pattern. The whole graph pattern of an optional graph pattern must match
for the optional graph pattern to add to the query solution.
Results can also be returned in XML using the SPARQL Variable Binding Results XML Format, we will examine it later when SPARQL Protocol will be considered.
The results of a query is the set of all pattern solutions that match the query pattern, giving all the ways a query can match the graph being queried. Each result is one solution to the query and there may be zero, one or multiple results to a query. Say, for example, we have the following data (example3.rdf.turtle.txt):
Listing E
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
_:a foaf:name "John Hijacker" .
_:a foaf:mbox <mailto:jh@example.com> .
_:b foaf:name "DmitryPovarenko" .
_:b foaf:mbox <mailto:dmitry@example.org> .
Then the query (example6.sparql.txt):
Listing F
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?mbox
WHERE
{ ?x foaf:name ?name .
?x foaf:mbox ?mbox }
Will give the following result:
name mbox---- ----
“John Hijacker” <mailto:jh@example.com>
“DmitryPovarenko” <mailto:dmitry@example.org>
The results enumerate the RDF terms to which the selected variables can be bound in the query pattern. There are also a number of syntactic forms that abbreviate some common sequences of triples, for details it’s better to turn to original SPARQL specification.
An RDF Literal is
written in SPARQL as a string containing the lexical form of the literal,
followed by an optional language tag or an optional datatype. There are
convenience forms for numeric-types literals which are of type xsd:integer,
xsd:decimal,
xsd:double and also for xsd:boolean. The data below contains a number of RDF
literals (example4.rdf.turtle.txt). The pattern in the following query has a solution :x
because 42 is syntax for "42"^^<http://www.w3.org/2001/XMLSchema#integer>:
The following query
has a solution with variable vbeing :y:
{ ?v ?p "abc"^^<http://example.org/datatype#specialDatatype> }
Graph pattern matching creates bindings of variables. It is possible to further restrict solutions by constraining the allowable bindings of variables to RDF Terms. Value constraints take the form of boolean-valued expressions; the language also allows application-specific constraints on the values in a solution. Let’s take the following data (example5.rdf.turtle.txt) and a query (example7.sparql.txt). The result of the query will be the following dataset:
title price------------------ -----
“The Semantic Web” 23
By having a
constraint on the "price" variable, only book2 matches the query because there is a restriction on the
allowable values of "price". Constraints can be given in an optional
graph pattern as this example shows (the same data, example5.rdf.turtle.txt) and a query (example8.sparql.txt). The result will be the following:
------------------ -----
“SPARQL Tutorial”
“The Semantic Web” 23
No price appears for
the book with title "SPARQL Tutorial" because the optional graph
pattern did not lead to a solution involving the variable "price".
SPARQL provides a means of combining graph patterns so that one of several alternative graph patterns may match. If more than one of the alternatives matches, all the possible pattern solutions are found. For pattern alternatives in a query you can use UNION keyword. For example, for this data (example6.rdf.turtle.txt) the query (example9.sparql.txt) will give the following result:
title--------------------------------
“SPARQL Protocol Tutorial”
“SPARQL”
“SPARQL (updated)”
“SPARQL Query Language Tutorial”
This query finds titles of the books in the data, whether the title is recorded using Dublin Core (a standardized set of document properties) properties from version 1.0 or version 1.1.
Query patterns
generate an unordered collection of solutions. These solutions are then treated
as a sequence, initially in no specific order; any sequence modifiers are then
applied to create another sequence. The solution sequence can be modified by
adding the DISTINCT keyword which ensures that every combination
of variable bindings (i.e. each solution) in the sequence is unique For
example, with data example7.rdf.turtle.txt and query example10.sparql.txtresult will be the
following:
-------“Alice”
The ORDER BY clause takes a solution sequence and applies ordering
conditions. An ordering condition can be a variable or a function call. The
direction of ordering is ascending by default. It can be explicitly set to
ascending or descending by enclosing the condition in ASC() or DESC() respectively. If multiple conditions are
given, then they are applied in turn until one gives the indication of the ordering.
The LIMIT form puts an upper bound on the
number of solutions returned. If the number of actual solutions is greater than
the limit, then at most the limit number of solutions will be returned. OFFSET causes the solutions generated to start after the
specified number of solutions.
The SELECT form of
results returns the variables directly. The syntax SELECT * is an abbreviation that selects all of the variables.
For example, for data example8.rdf.turtle.txt and query example11.sparql.txt the result will be the following:
------- ------- -----
“Alice” “Bob”
“Alice” “Clare” “CT”
Results can be thought of as a table or result set, with one row per query solution. Some cells may be empty because a variable is not bound in that particular solution. Result sets can be accessed by the local API but also can be serialized into either XML or an RDF graph. In XML format we will have the same dataset looking like this (example11.result.sparql.xml):
Listing G
<?xml version="1.0"?>
<sparqlxmlns="http://www.w3.org/2005/sparql-results#">
<head>
<variable name="
nameX"/>
<variable name="nameY"/>
<variable name="
nickY"/>
</head>
<results>
<result>
<binding name="nameX">
<literal>Alice</literal>
</binding>
<binding name="nameY">
<literal>Bob</literal>
</binding>
</result>
<result>
<binding name="nameX">
<literal>Alice</literal>
</binding>
<binding name="nameY">
<literal>Clare</literal>
</binding>
<binding name="nickY">
<literal>CT</literal>
</binding>
</result>
</results>
</sparql>
SPARQL Protocol
SPARQL Protocol is designed in two ways: first, as an abstract interface independent of any concrete realization, implementation, or binding to another protocol; second, as HTTP and SOAP bindings of this interface.
The SPARQL Protocol is described abstractly with WSDL 2.0 in terms of a Web service that implements its interface, types, faults, and operations, as well as by HTTP and SOAP bindings. Current SPARQL Protocol description is hosted by the following address and can be used by any Web service processors or other applications: http://www.w3.org/TR/rdf-sparql-protocol/sparql-protocol-query.wsdl.
Let’s take a simple query (example12.sparql.txt) and have a look how it will work through the HTTP connection. This is an HTTP GET query that SPARQL frontend will ask from the SPARQL Web service located, say, at http://sparql.service.com/sparql:
Listing H
GET /sparql/?query=PREFIX+dc:+&
lt;http://purl.org/dc/elements/1.1/>%13SELECT+?book+?who%13WHERE+
{+?book+dc:creator+?who+}
Host: sparql.service.com
User-agent: sparql-client/0.1
In the GET request there is an URL-encoded SPARQL query (spaces are replaced by '+’ symbol, newline symbols are replaced by %13, which is a hexadecimal value of newline char number). An HTTP server will return the following for a handled query:
Listing IHTTP/1.1 200 OK
Date: Fri, 06 May 2005 20:55:12 GMT
Server: Apache/1.3.29 (Unix) PHP/4.3.4 DAV/1.0.3
Connection: close
Content-Type: application/sparql-results+xml
<?xml version="1.0"?>
<sparqlxmlns="http://www.w3.org/2005/sparql-results#">
<head>
<variable name="book"/>
<variable name="who"/>
</head>
<results distinct="false" ordered="false">
<result>
<binding name="book"><uri>http://www.example/book/book5</uri></binding>
<binding name="who"><bnode>r29392923r2922</bnode></binding>
</result>
...
<result>
<binding name="book">
<uri>http://www.example/book/book6</uri></binding>
<binding name="who"><bnode>r8484882r49593</bnode></binding>
</result>
</results>
</sparql>
A query can be also sent over SOAP. The file example13.sparql.soap.txt contains an example of a SOAP query sent over HTTP POST query, and example13.sparql.result.soap.txt contains the corresponding SOAP response.
An evolving protocol
This article is just an introduction into SPARQL query language and its binding protocols, because it’s already evolved into a rich all-sufficient query language suitable for Web 2.0 and Semantic Web platforms, and it is impossible to cover all aspects of the language and protocol here. For further details please have a look at SPARQL specifications. There are a number of issues that SPARQL does not address yet; most notably, SPARQL is read-only and cannot modify an RDF dataset. SPARQL actually consists of three separate specifications: the query language specification, SPARQL data access Protocol, and XML format of query results.
Print/View all Posts Comments on this article
|
|
White Papers, Webcasts, and Downloads
- ENTERPRISE SECURITY GETS SMART Trend Micro IT security professionals are increasingly more concerned about the ... Download Now
- Through a Dell Technology Partnership, University of North Carolina Wilmington Manages Mobile Student Computing Initiative With Minimal Resources Dell University of North Carolina Wilmington is located on a 650-acre campus ... Download Now
- Can your business work smarter? IBM Today, productivity is at a premium and IT budgets are at a minimum. Work ... Download Now
- Move to SUSE Linux Enterprise get 3 years of Red Hat support Novell One unified management tool for both Linux and Windows allows your mixed ... Download Now
- Building the Virtualized Enterprise with VMware Infrastructure VMware This paper explains how adopting a virtual infrastructure -- comprised of server, storage, and networking virtualization technologies -- can help your organization build a sustainable competitive ... Download Now
Article Categories
- Security
- Security Solutions, IT Locksmith
- Networking and Communications
- E-mail Administration NetNote, Cisco Routers and Switches
- CIO and IT Management
- Project Management, CIO Issues, Strategies that Scale
- Desktops, Laptops & OS
- Windows 2000 Professional, Microsoft Word, Microsoft Excel, Microsoft Access, Windows XP,
- Data Management
- Oracle, SQL Server
- Servers
- Windows NT, Linux NetNote, Windows Server 2003
- Career Development
- Geek Trivia
- Software/Web Development
- Web Development Zone, Visual Basic, .NET

