SquishQL: The simplest RDF query language
Takeaway: SquishQL is the simplest query language you can use for accessing RDF data. Using several code examples, this document shows you how SquishQL works.
In the introductory article about SPARQL, I mentioned that there are several query languages being developed for accessing RDF data. RDF is a main standard for metadata defining and storage in a Web 2.0 environment. However, most of the query languages are incomplete and fairly complex. People often need something simple to start their trek along the learning curve. Accessing RDF data is a simple thing when you use SquishQL – an RDF query language with SQL notation.
A Squish term can be treated as "SQL-ish", the query syntax used is designed to resemble the basic structure of SQL: we ask some database for possible values for a selection of variables given some constraining expression. In Squish, this constraining expression can be thought of as a list of RDF statements where some parts of each statement have missing values (this indicated by '?' variables in place of URIs or string literals).
It is not the only RDF query language with SQL-based syntax, the Jena RDF framework had a similar query language called RDQL, but moved to SPARQL later. Both SquishQL and RDQL were based on R.V.Guha's rdfDB query language.
As for the future of SquishQL, it seems it's not going to be an industry standard since a more powerful and complex language like SPARQL is supported in the modern full-featured Java RDF framework (have a look at Jena, Joseki, or Sesame). However it still remains the simplest query language for beginners in the RDF query world -- like Pascal in the world of programming languages.
Note: All of the examples contained in this article are available in text format from the download version.
An initial look
This is an example of a typical SquishQL query (example1.txt):
Example 1
SELECT ?item, ?job, ?orghome, ?salary, ?currency
WHERE (job::advertises ?item ?job)
(rdf::type ?job wordnet::Job)
(job::salary ?job ?salary)
(job::currency ?job ?currency)
(job::orgHomepage ?job ?orghome)
USING job FOR http://ilrt.org/discovery/2000/11/rss-query/jobvocab.rdf#
rdf FOR http://www.w3.org/1999/02/22-rdf-syntax-ns#
wordnet FOR http://xmlns.com/wordnet/1.6/
The answer to queries like this can be represented as a tabular result set, where columns correspond to the variables in the query ("?salary" etc), and rows correspond to states of affairs represented in the RSS in which the variables match values from the dataset. This is very similar to the ODBC/JDBC model familiar from the relational database world. In addition, the result set can be viewed as another RDF dataset, i.e. the data graph corresponding to all the nodes and arcs implicated in the answering of the query.
For each row in the result set, there will be a concrete value given for each named variable such as "?item" that is specified in the SELECT clause. The variable itself is a placeholder rather than a specific Web resource. Some of the properties of the resource the variable is 'standing in for' are specified by the constraints in the WHERE clause.
Here is another simple query. First we present the SQL-ish query, followed by a prose translation (example2.txt):
Example 2
SELECT ?x, ?t, ?c, ?o
WHERE (dc::title ?x, ?t)
(dc::creator ?x, ?c)
(eg::homePage ?c http://purl.org/net/eric/)
(eg::worksFor ?c, ?o)
USING dc FOR http://purl.org/dc/1.1/
eg FOR http://example.com/vocab/foaf/
This is what we're trying to say with this query:
"find me the dc title (we'll call it 't') of any resource (we'll call it 'x') that has a dc creator 'c' with a homepage 'http://purl.org/net/eric/', and tell me who they work for ('o')".
The answer is (just as in the SQL world) a table, with columns corresponding to the things we asked for, ie. 't','x','c','o'. Each row will supply one set of values from the database that match the constraints in the 'WHERE' clause of the query. Here's a tabular representation of a possible result set from our main example (results1.txt):
Results1
item job orghome salary currency
--------- ---------------- ----------------------- ------ --------
job1.html job 1 title here http://www.ukoln.ac.uk/ 100000 USD
job2.html job 2 title here http://ilrt.org// 150000 EUR
How it works
In SquishQL, there are two classes of constraints; patterns and filter expressions. Patterns are generative, i.e. they create bindings, and the filters are restrictive, i.e. they remove possibilities. SquishQL separates these into the WHERE clause (generative) and the AND clause (restrictive). Some query systems have followed the tradition of having predicate first. SquishQL instead mimics the N-Triples syntax and specifies triple patterns as subject-predicate-object.
In SQL, a database is a closed world; the FROM clause identifies the tables in the database and the WHERE clause identifies constraints and can be extended with AND. By analogy, the Semantic Web is the database and the FROM clause identifies the RDF models. Variables are introduced by leading "?" and URIs are quoted with "<>", unquoted URIs can be used where there is no ambiguity.
These are the main elements of a query:
- SELECT Clause: Identifies the variables to be returned to the application. If not all the variables are needed by the application, then specifying the required results can reduce the amount of memory needed for the results set as well as providing information to a query optimizer.
- FROM Clause: The FROM clause specifies the model by URI.
- WHERE Clause: Specifies the graph pattern as the conjunction of the list of triple patterns.
- AND Clause: Specifies the Boolean expressions over values of URIs and literals, including arithmetic comparisons, and Boolean expressions, including disjunction and negation as well.
- USING Clause: A way to shorten the length of URIs. As SquishQL is likely to be written by people, this mechanism helps make for an easier to understand syntax. This is not a namespace mechanism; instead, it is simply an abbreviation mechanism for long URIs by defining a string prefix.
The RDF specification defines the form of containers and of reification. There is no explicit syntax for these in SquishQL. As shown in the examples, this does not affect retrieving data from containers, but the query can become cumbersome. Similarly, with reification, the lack of syntactic support can make expressing some queries awkward.
This is how the contents of an RDF bag can be extracted (example3.txt):
Example 3
SELECT ?y
WHERE (<http://somewhere.com/aBag>, ?x, ?y)
AND ! ( ?x eq <rsyn:type> && ?y eq <rsyn:Bag>)
USING
rsyn FOR http://www.w3.org/1999/02/22-rdf-syntax-ns#
How it works with Inkling
Inkling is a Java implementation of SquishQL created to be API and database-independent for testing the usefulness of SquishQL for comparatively small-scale projects. The aim was to have a query engine that could be used with almost any RDF database implementation written in Java, and which could be used for experimenting with the SquishQL query language.
For Inkling to be able to talk to an RDF database or service, the service just has to implement an extremely basic interface consisting of a single method. This method is a three-place search method:
queryDatabase(subject, predicate, object)Where any argument can be null, which was the lowest common denominator of methods supported by different examined APIs.
Inkling also uses the JDBC interfaces to make SquishQL queries. This enables the implementation to be fairly independent of the database to be searched, and also means that Java programmers will be familiar with the means of accessing the queries.
The second implementation of SquishQL was RDQL, part of the Jena RDF toolkit, which combines query with manipulation of the RDF graph at a fine-grained level through the Jena RDF API. RDQL is now obsolete and replaced by SPARQL. The third implementation of SquishQL is RDFStore, which implements SquishQL to query RDF repositories directly from the Perl language.
Starter pack
It's already evident that SquishQL will not be the mainstream RDF query language in the industry, however due to its simplicity it can be used a starter pack for an RDF beginner.
SponsoredWhite Papers, Webcasts, and Downloads
- Case Study: Clackamas County Oregon's Outdated Fibre Channel Infrastructure Runs Out of Capacity Dell EqualLogic
- Sprint IPVoice Connect Fact Sheet Sprint
- IBM Master Data Management: Effective Data Governance IBM
- Next Generation Mobility Now Sprint
- Leveraging Information for Innovation and Competitive Advantage IBM
Article Categories
- Security
- Security Solutions, IT Locksmith
- Networking and Communications
- E-mail Administration NetNote, Cisco Routers and Switches
- CIO and IT Management
- Project Management, CIO Issues, Strategies that Scale
- Desktops, Laptops & OS
- Windows 2000 Professional, Microsoft Word, Microsoft Excel, Microsoft Access, Windows XP,
- Data Management
- Oracle, SQL Server
- Servers
- Windows NT, Linux NetNote, Windows Server 2003
- Career Development
- Geek Trivia
- Software/Web Development
- Web Development Zone, Visual Basic, .NET

