On MovieTome: SUPERMAN getting a reboot?

SquishQL: The simplest RDF query language

Tags: XML, Peter V. Mikhalenko, RDF, query language, SquishQL Â, job

  • Save
  • Print
  • Digg This
  • 0

Takeaway: SquishQL is the simplest query language you can use for accessing RDF data. Using several code examples, this document shows you how SquishQL works.

In the introductory article about SPARQL, I mentioned that there are several query languages being developed for accessing RDF data. RDF is a main standard for metadata defining and storage in a Web 2.0 environment. However, most of the query languages are incomplete and fairly complex. People often need something simple to start their trek along the learning curve. Accessing RDF data is a simple thing when you use SquishQL – an RDF query language with SQL notation.

A Squish term can be treated as "SQL-ish", the query syntax used is designed to resemble the basic structure of SQL: we ask some database for possible values for a selection of variables given some constraining expression. In Squish, this constraining expression can be thought of as a list of RDF statements where some parts of each statement have missing values (this indicated by '?' variables in place of URIs or string literals).

It is not the only RDF query language with SQL-based syntax, the Jena RDF framework had a similar query language called RDQL, but moved to SPARQL later. Both SquishQL and RDQL were based on R.V.Guha's rdfDB query language.

As for the future of SquishQL, it seems it's not going to be an industry standard since a more powerful and complex language like SPARQL is supported in the modern full-featured Java RDF framework (have a look at Jena, Joseki, or Sesame). However it still remains the simplest query language for beginners in the RDF query world -- like Pascal in the world of programming languages.

Note: All of the examples contained in this article are available in text format from the download version.

An initial look

This is an example of a typical SquishQL query (example1.txt):

Example 1


SELECT ?item, ?job, ?orghome, ?salary, ?currency

   WHERE (job::advertises ?item ?job)
         (rdf::type ?job wordnet::Job)
         (job::salary ?job ?salary)
         (job::currency ?job ?currency)
         (job::orgHomepage ?job ?orghome)

   USING job FOR http://ilrt.org/discovery/2000/11/rss-query/jobvocab.rdf#
         rdf FOR http://www.w3.org/1999/02/22-rdf-syntax-ns#
         wordnet FOR http://xmlns.com/wordnet/1.6/

The answer to queries like this can be represented as a tabular result set, where columns correspond to the variables in the query ("?salary" etc), and rows correspond to states of affairs represented in the RSS in which the variables match values from the dataset. This is very similar to the ODBC/JDBC model familiar from the relational database world. In addition, the result set can be viewed as another RDF dataset, i.e. the data graph corresponding to all the nodes and arcs implicated in the answering of the query.

For each row in the result set, there will be a concrete value given for each named variable such as "?item" that is specified in the SELECT clause. The variable itself is a placeholder rather than a specific Web resource. Some of the properties of the resource the variable is 'standing in for' are specified by the constraints in the WHERE clause.

Here is another simple query. First we present the SQL-ish query, followed by a prose translation (example2.txt):

Example 2


SELECT ?x, ?t, ?c, ?o

WHERE   (dc::title ?x, ?t)
        (dc::creator ?x, ?c)
        (eg::homePage ?c http://purl.org/net/eric/)
        (eg::worksFor ?c, ?o)

USING dc FOR http://purl.org/dc/1.1/
      eg FOR http://example.com/vocab/foaf/

This is what we're trying to say with this query:

"find me the dc title (we'll call it 't') of any resource (we'll call it 'x') that has a dc creator 'c' with a homepage 'http://purl.org/net/eric/', and tell me who they work for ('o')".

The answer is (just as in the SQL world) a table, with columns corresponding to the things we asked for, ie. 't','x','c','o'. Each row will supply one set of values from the database that match the constraints in the 'WHERE' clause of the query. Here's a tabular representation of a possible result set from our main example (results1.txt):

Results1


item      job              orghome                 salary currency
--------- ---------------- ----------------------- ------ --------
job1.html job 1 title here http://www.ukoln.ac.uk/ 100000 USD
job2.html job 2 title here http://ilrt.org//       150000 EUR

How it works

In SquishQL, there are two classes of constraints; patterns and filter expressions. Patterns are generative, i.e. they create bindings, and the filters are restrictive, i.e. they remove possibilities. SquishQL separates these into the WHERE clause (generative) and the AND clause (restrictive). Some query systems have followed the tradition of having predicate first. SquishQL instead mimics the N-Triples syntax and specifies triple patterns as subject-predicate-object.

In SQL, a database is a closed world; the FROM clause identifies the tables in the database and the WHERE clause identifies constraints and can be extended with AND. By analogy, the Semantic Web is the database and the FROM clause identifies the RDF models. Variables are introduced by leading "?" and URIs are quoted with "<>", unquoted URIs can be used where there is no ambiguity.

These are the main elements of a query:

  • SELECT Clause: Identifies the variables to be returned to the application. If not all the variables are needed by the application, then specifying the required results can reduce the amount of memory needed for the results set as well as providing information to a query optimizer.
  • FROM Clause: The FROM clause specifies the model by URI.
  • WHERE Clause: Specifies the graph pattern as the conjunction of the list of triple patterns.
  • AND Clause: Specifies the Boolean expressions over values of URIs and literals, including arithmetic comparisons, and Boolean expressions, including disjunction and negation as well.
  • USING Clause: A way to shorten the length of URIs. As SquishQL is likely to be written by people, this mechanism helps make for an easier to understand syntax. This is not a namespace mechanism; instead, it is simply an abbreviation mechanism for long URIs by defining a string prefix.

The RDF specification defines the form of containers and of reification. There is no explicit syntax for these in SquishQL. As shown in the examples, this does not affect retrieving data from containers, but the query can become cumbersome. Similarly, with reification, the lack of syntactic support can make expressing some queries awkward.

This is how the contents of an RDF bag can be extracted (example3.txt):

Example 3


SELECT ?y
WHERE (<http://somewhere.com/aBag>, ?x, ?y)
AND ! ( ?x eq <rsyn:type> && ?y eq <rsyn:Bag>)
USING
  rsyn FOR http://www.w3.org/1999/02/22-rdf-syntax-ns#

How it works with Inkling

Inkling is a Java implementation of SquishQL created to be API and database-independent for testing the usefulness of SquishQL for comparatively small-scale projects. The aim was to have a query engine that could be used with almost any RDF database implementation written in Java, and which could be used for experimenting with the SquishQL query language.

For Inkling to be able to talk to an RDF database or service, the service just has to implement an extremely basic interface consisting of a single method. This method is a three-place search method:

queryDatabase(subject, predicate, object)

Where any argument can be null, which was the lowest common denominator of methods supported by different examined APIs.

Inkling also uses the JDBC interfaces to make SquishQL queries. This enables the implementation to be fairly independent of the database to be searched, and also means that Java programmers will be familiar with the means of accessing the queries.

The second implementation of SquishQL was RDQL, part of the Jena RDF toolkit, which combines query with manipulation of the RDF graph at a fine-grained level through the Jena RDF API. RDQL is now obsolete and replaced by SPARQL. The third implementation of SquishQL is RDFStore, which implements SquishQL to query RDF repositories directly from the Perl language.

Starter pack

It's already evident that SquishQL will not be the mainstream RDF query language in the industry, however due to its simplicity it can be used a starter pack for an RDF beginner.

  • Save
  • Print
  • Digg This
  • 0

What do you think?

Article Categories

Security
Security Solutions, IT Locksmith
Networking and Communications
E-mail Administration NetNote, Cisco Routers and Switches
CIO and IT Management
Project Management, CIO Issues, Strategies that Scale
Desktops, Laptops & OS
Windows 2000 Professional, Microsoft Word, Microsoft Excel, Microsoft Access, Windows XP,
Data Management
Oracle, SQL Server
Servers
Windows NT, Linux NetNote, Windows Server 2003
Career Development
Geek Trivia
Software/Web Development
Web Development Zone, Visual Basic, .NET
advertisement
Click Here