rdfDB Query Language

rdfDB uses a high level SQLish query language. The data is modelled as a directed labelled graph (RDF). Nodes in graph can be
  1. Resources : Every Resource is identified by a URI (e.g., foo, http://dmoz.org/#Top). Resources are written as URIs. The Resource whose URI is mailto:guha@guha.com is referred to (in the query language) as mailto:guha@guha.com.
  2. Integers. Integers are written as such (e.g., 42, 9, 18).
  3. Strings : Strings are UTF8, enclosed by single parens ('). e.g., 'foo', 'foo bar'
Other datatypes such as floats and dates are coming soon.

All operations revolve around the concept of a "triple". A triple is intended to model the concept of a object with a property value. It consists of

The triple is written using the predicate logic syntax : (<arc-label> <object> <property-value>).

A collection of triples forms a database. There are no constraints on the set of triples that constitutes the database. (Some other RDF implementation refer to the concept of database as a "model").

Database Operations are divided into the following categories:

Database Creation

Result Codes :
  1. 0 : success
  2. -10 : database could not be deleted. Most likely cause is that the file permissions were wrong. Make sure that rdfDB is allowed to write into the directory RDFDB_DIR. There is no return value.

Loading Files

rdfDB is designed to act as a cache for RDF, RSS, edge-labelled XML and other data out on the network. To facilitate this, it supports the ability to load the contents of a url (that points to an RDF, RSS ... file) into the database.

Result Codes:
  1. 0 : success
  2. -2 : syntax error
  3. -5 : database does not exist
  4. -6 : could not access the url
  5. -9 : unknown file format

Namespace Commands

RDF vocabularies may come from different namespaces. When parsing XML files, rdfDB creates URIs by concatenating the namespace uri (of an element's namespace) with the character '#' and the element name. So, if we have
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

<rdf:Description rdf:about="http://dmoz.org/Auto/CarSeats">
  <dmoz:relatedTo rdf:resource="http://dmoz.org/Children/Safety"/>

The triple that is added to the database is
(http://dmoz.org/rdf#relatedTo  http://dmoz.org/Auto/CarSeats  http://dmoz.org/Children/Safety)
In order to simplify the statement of queries, one can set a namespace prefix to correspond to a namespace uri. Result Codes:
  1. 0 : success
  2. -2 : syntax error

Inserts & Deletes

The following two commands are used to add remove triples.
  1. insert into [database_name] (arc1 source1 target1), (arc2 source2 target2)...
    e.g., insert into dmoz (narrow http://dmoz.org/Top FlyingPizzas)</>

  2. delete from [database_name] (arc1 source1 target1), (arc2 source2 target2)...
    e.g., delete from foo (narrow http://dmoz.org/Top FlyingPizzas)</>
Result Codes :
  1. 0 : success
  2. -2 : general syntax error
  3. -3 : malformed literal
  4. -5 : database does not exit
  5. -10 : wrong file permissions (could not open DB)


There is one query command which has the syntax : select [variable1, variable2, ... ... ] from {database} where [constraint1, constraint2, ...] </> which returns a set of variable bindings for [variable1, variable2 ... ] such that the triples in [database] satify [constraint1, constraint2, ... ] under those variable substitutions.

Variables are syntactically designated by symbols starting with the character '?'. e.g., ?name, ?foo.

A constraint is of the form (arc-label source target) where any one or more of arc-label, source or target can be a variable or resource and in the case of the target, also an integer or string. The same variable can appear in multiple constraints.

e.g., select ?x ?y from dmoz where (title ?x ?y), (createdBy ?x RichSkrenta), (type ?x Topic)
List the id's and titles of all objects of type Topic created by RichSkrenta.

The query can optionally specify the output format by appending output [output-format] to the query. The supported output formats are "tab-limited" and "variable-list". I hope to add "javascript" and "rdf-xml" as supported output formats. The default is "variable-list".

Result structure : The result contains zero or more lines of answers followed by the result code line. The syntax of the answer line depends on the chosen output format. In the case of the "variable-list" format (the default), there is one line per variable binding set which has the syntax
In the case of "tab-limited", there is one line per variable binding set which has the syntax
where the order of the values is in the order of the variables in the query.

Result Codes:

  1. 0 : success
  2. -2 : syntax error
  3. -3 : malformed literal error
  4. -4 : general error
  5. -5 : database does not exist
  6. -6 : could not access data
  7. -8 : unconstrained variable

Sample Session

This is a simple sample session with rdfdb. Queries are terminated with "</>. The query returns any answers (as applicable) and an error code. The error code 0 is returned for successful operations.
telnet 7001
Connected to govinda.guha.meer.net (
Escape character is '^]'.
create database test1 </>
0 </>
insert into test1 (type DanB Person), (name DanB 'Dan Brickley') </>
0 </>
insert into test1 (worksFor DanB W3C)  (worksFor DanC W3C) </>
0 </>
insert into test1 (name DanC 'Dan Connolly') </>
0 </>
select ?x from test1 where (worksFor ?x W3C) (name ?x ?y) </>
?x = DanC ?y = 'Dan Connolly'
?x = DanB ?y = 'Dan Brickley'
0 </>

Here is a simple www front end browser for the dmoz hierarchy based on rdfDB DBI. Here is the code behind that.

Result Codes

Here is the complete list of result codes.
  1. 0 : success
  2. -1 : unknown query type
  3. -2 : general syntax error
  4. -3 : malformed literal error
  5. -4 : misc
  6. -5 : database does not exist
  7. -6 : could not access data
  8. -7 : unauthorizes access
  9. -8 : unconstrained variable
  10. -9 : unknown file format
  11. -10 : wrong file permissions