Ruby and the Semantic Web
This evening, I gave a talk on using Ruby RDF.rb and assorted gems at the Lotico San Francisco Semantic Meetup. I’ve uploaded slides to Slide Share.
I also showed a simple demo using the GitHub API to create FOAF and DOAP records for accounts and repositories, and to do some simple navigation. The demo is running at http://greggkellogg.net/github-lod, and source is (of course) available on GitHub.
The demo is not intended to be a complete application, but it shows some basic capabilities [Ruby LinkedData][(http://rubygems.org/gems/linkeddata) for generating RDF in a variety of formats from Active Record models (which cache the GitHub API calls). The Web-pages are, of course, marked up with RDFa, and you can use content-negotiation, or append an appropriate extension to the URLs, to retrieve the data in alternative RDF formats.
Sea of Cortez
I just returned from a week on the Rocio del Mar diving the northern Sea of Cortez (aka the Gulf of California). We dove along the Midriff Islands and had great encounters with Sperm Whales and Whale Sharks, not to mention the often over-exhuberant Sea Lions.
Check out the photos.
SPARQL 1.0 for Ruby
I’ve just released version 0.0.2 of the Ruby sparql gem. This version is based on earlier work by Pius and Arto and incorporates from SPARQL Grammar and SPARQL Algebra. Further documentation is available here.
This gem integrates with RDF.rb and uses rdf-xsd to provide additional literal semantics.
Why release SPARQL for Ruby? Probably not because of the killer performance, at least right now. However, I believe it’s important that Ruby have a complete tool chain for manipulating Linked Data (including RDF and SPARQL), and this was the remaining piece.
In spite of the 0.0.2 release number, is is a fully functioning implementation of SPARQL 1.0 semantics and passes all the DAWG data-r2 test cases. The gem makes use of RDF::Query to perform basic BGP operations on RDF::Queryable objects (such as RDF::Repository). The gem has some support for query optimization, but this remains largely unimplemented and will be addressed in future releases. I’d also like to support SPARQL 1.1 queries and udpates at some point.
This is a pure Ruby implementation and does not directly rely on any native libraries (although, some RDF readers such as RDFa and RDF/XML presently do).
The basic strategy is to parse SPARQL and transform it into an S-Expression-based algebra, pretty close to that used by Jena ARQ (SPARQL S-Expressions, or SSE). This allows SSE to be used directly for performing queries, or to parse SPARQL grammar to SSE.
The linkeddata gem has also been updated to have a soft reference to SPARQL, in addition to new processors for RDF::Turtle, JSON::LD, and RDF::Microdata.
The gem is tested on Ruby 1.8.7, 1.9.2 and JRuby. (JRuby has some spec issues, probably due to Nokogiri differences)
Many thanks to Pius Uzamere and helping to make this release happen, and to Arto Bendiken for the work in RDF.rb, SPARQL::Algebra and SPARQL::Grammar that preceded this.
RDF.rb 0.3.4 released
After several months of gathering updates for RDF.rb, we’ve released version 0.3.4 with several new features:
- Update to BGP query model to support SPARQL semantics,
- Expandable Litereal support, to allow further implementation of XSD datatypes outside of RDF.rb (see RDF::XSD gem),
- More advanced content type detection to allow better selection of the appropriate reader from those available on the client. (Includes selecting among HTML types, such as Microdata and RDFa)
- Improved CLI with the
rdfexecutable providing access to all loaded readers and writers for cross-language serialization and deserialization.</http:>
As an example of format detection, consider the following:
require 'linkeddata'
RDF::Graph.load("http://greggkellogg.net/foaf.ttl")
should load Turtle or N3 readers if installed. This becomes more important for ambiguous file types, such as HTML, which could be either RDFa or Microdata, and application/xml, which could be TriX, RDF/XML or even RDFa.
See documentation for more specifics on this version of RDF.rb. Note that I’ve attempted to incorporate suggestions for improving the documentation.
Most of the reader/writer gems have been updated to match this release, in particular JSON::LD, RDF::Microdata, RDF::N3, RDF::RDFa, RDF::RDFXML, and RDF::Turtle.
A future update to the linkeddata gem should reference the latest versions of each, but a simple gem update will work too.
There is a slight semantic change for repositories to support SPARQL: a context of false should not match a variable context. This is straight out of SPARQL semantics. Repository implementors who have provided custom implementations of #query_pattern should check behavior against rdf-spec version 0.3.4 to verify correct operation.
Next up is a release of SPARQL implemented in pure Ruby. This gem provides full support for SPARQL 1.0 queries.
RDF::RDFa update with vocabulary expansion, RDF collections and more
I’ve updated RDF::RDFa with updates from recent changes to RDF Core:
- Deprecate explicit use of @profile
- Add rdfa:hasVocabulary when encountering @vocab
- Implemented Reader#expand to perform vocabulary expansion using RDFS rules 5, 7, 9 and 11.
Additionally, experimental support for RDF Collections (lists) has been added, based on RDF Webapps working group Wiki notes.
Remove RDFa Profiles
RDFa Profiles were a mechanism added to allow groups of terms and prefixes to be defined in an external resource and loaded to affect the processing of an RDFa document. This introduced a problem for some implementations needing to perform a cross-origin GET in order to retrieve the profiles. The working group elected to drop support for user-defined RDFa Profiles (the default profiles defined by RDFa Core and host languages still apply) and replace it with an inference regime using vocabularies. Parsing of @profile has been removed from this version.
Vocabulary Expansion
One of the issues with vocabularies was that they discourage re-use of existing vocabularies when terms from several vocabularies are used at the same time. As it is common (encouraged) for RDF vocabularies to form sub-class and/or sub-property relationships with well defined vocabularies, the RDFa vocabulary expansion mechanism takes advantage of this.
As an optional part of RDFa processing, an RDFa processor will perform limited RDFS entailment, specifically rules rdfs5, 7, 9 and 11. This causes sub-classes and sub-properties of type and property IRIs to be added to the output graph.
RDF::RDFa::Reader implements this using the #expand method, which looks for rdfa:hasVocabulary properties within the output graph and performs such expansion. See an example in the usage section.
RDF Collections (lists)
One significant RDF feature missing from RDFa was support for ordered collections, or lists. RDF supports this with special properties rdf:first, rdf:rest, and rdf:nil, but other RDF languages have first-class support for this concept. For example, in Turtle, a list can be defined as follows:
[ a schema:MusicPlayList;
schema:name "Classic Rock Playlist";
schema:numTracks 5;
schema:tracks (
[ a schema:MusicRecording; schema:name "Sweet Home Alabama"; schema:byArtist "Lynard Skynard"]
[ a schema:MusicRecording; schema:name "Shook you all Night Long"; schema:byArtist "AC/DC"]
[ a schema:MusicRecording; schema:name "Sharp Dressed Man"; schema:byArtist "ZZ Top"]
[ a schema:MusicRecording; schema:name "Old Time Rock and Roll"; schema:byArtist "Bob Seger"]
[ a schema:MusicRecording; schema:name "Hurt So Good"; schema:byArtist "John Cougar"]
)
]
defines a playlist with an ordered set of tracks. RDFa adds the @member attribute, which is used to identify values (object or literal) that are to be placed in a list. The same playlist might be defined in RDFa as follows:
<div vocab="http://schema.org/" typeof="MusicPlaylist">
<span property="name">Classic Rock Playlist</span>
<meta property="numTracks" content="5"/>
<div rel="tracks" member="">
<div typeof="MusicRecording">
1.<span property="name">Sweet Home Alabama</span> -
<span property="byArtist">Lynard Skynard</span>
</div>
<div typeof="MusicRecording">
2.<span property="name">Shook you all Night Long</span> -
<span property="byArtist">AC/DC</span>
</div>
<div typeof="MusicRecording">
3.<span property="name">Sharp Dressed Man</span> -
<span property="byArtist">ZZ Top</span>
</div>
<div typeof="MusicRecording">
4.<span property="name">Old Time Rock and Roll</span>
<span property="byArtist">Bob Seger</span>
</div>
<div typeof="MusicRecording">
5.<span property="name">Hurt So Good</span>
<span property="byArtist">John Cougar</span>
</div>
</div>
</div>
This basically does the same thing, but places each track in an rdf:List in the defined order.
You can try both these and other RDF gems a the distiller.
RDF::N3 no longer accepts text/turtle or :ttl
With the release of RDF::Turtle, starting with version 0.3.5, RDF::N3 no longer asserts that it is a reader for Turtle. This includes MIME Types text/turtle, application/turtle, application/x-turtle. Or the .ttl extension or :ttl or :turtle formats. Of course, N3 remains reasonably compatible with Turtle, but the recent RDF 1.1 Working Group publication of the Turtle Specification has caused some divergence.
Most notably, in Turtle, the empty prefix (‘:’) is no longer a synonym for <#>. In fact, the empty prefix is no longer defined by default.
RDF::Turtle defines MIME Types text/turtle, text/rdf+turtle, application/turtle and application/x-turtle. The officially submitted MIME Type for Turtle is text/turtle with default content coding of UTF-8.
As usual, you can try both these and other RDF gems a the distiller At some point, RDF::Turtle will make it into the [linkeddata gem].
Channel Islands July/August 2011
Recently returned from a 3-day trip on the Horizon out of San Diego. We had two great days at San Clemente Island, and spent the last day outside San Diego, with two dives on the Yukon, a former Canadian Destroyer sunk as an artificial reef and dive site. Unfortunately, it went down early and landed on it's side; it's now known as "Milt's Tilt".
Enjoy the photos.
Things people get wrong in RDFa markup
Lately, I’ve been looking a lot of both RDFa and Microdata formatted HTML. There are a number of things that authors (even experts) regularly get wrong:
@src and @rel attributes create reverse relation
Having code such as the following:
<img rel="image" src="image.jpg" />
...
You’d think that this would indicate that the icon for the document is
<> xhv:image <image.jpg>
but it actually says:
<image.jpg> xhv:image <> .
The why of this is lost in the haze of history, but people regularly get this wrong. To get what you need, consider something like the following markup:
<span rel="image"><img src="image.jpg" /></span>
...
@rel and @typeof and/or @about shouldn’t be on the same element
Another common mistake is format such as the following:
<div rel="mainContentOfPage" about="#me" typeof="Person">
<p>
Name: <span property="name">Gregg Kellogg</span></p>
<p>
Knows: <a href="http://greggkellogg.net/#me" rel="knows">Myself</a></p>
</div>
Placing @rel and @about or @typeof on the same element indicates that the @about/@typeof indicate the subject not the object of a relation. To get the desired effect, use @resource (or @href), however, this does not let you set the type of the object resource. Alternatively, use the following type of markup:
<div rel="mainContentOfPage">
<div about="#me" typeof="Person">
<p>
Name: <span property="name">Gregg Kellogg</span></p>
<p>
Knows: <a href="http://greggkellogg.net/#me" rel="knows">Myself</a>
</p>
</div>
</div>
Another area of common mis-understanding is that the document order of statements within an HTML document is not significant when creating a list of resources. Consider the following example from schema.org/MusicPlaylist:
<div itemscope="" itemtype="http://schema.org/MusicPlaylist">
<span itemprop="name">Classic Rock Playlist</span>
<div itemprop="tracks" itemscope="" itemtype="http://schema.org/MusicRecording">
1. <span itemprop="name">Sweet Home Alabama</span> - <span itemprop="byArtist">Lynard Skynard</span>
</div>
<div itemprop="tracks" itemscope="" itemtype="http://schema.org/MusicRecording">
2. <span itemprop="name">Shook you all Night Long</span> - <span itemprop="byArtist">AC/DC</span>
</div>
...
</div>
You would think that this describes a track ordering, but it does not (at least in RDF). Doing this requires RDF List constructs missing from both Microdata and RDFa. In Turtle, you could do it as follows:
@prefix: <http://schema.org> .
[ a :MusicPlaylist;
:name "Classic Rock Playlist";
:numTracks 5;
:tracks (
[ a :MusicRecording; :name "Sweet Home Alabama"; :byArtist "Lynard Skynard"]
[a :MusicRecording; :name "Shook you all Night Long"; :byArtist "AC/DC"]
...
)
]
It would seem obvious that an HTML ordered list could be used to generate an RDF List, but it received to achieve enough interest to make it through.
These are just a couple of things that are confusing about RDFa, and offer good fodder for Microdata proponents to complain about the complexity of RDFa markup. It’s important to note that a core goal of RDFa 1.1 is to be compatible with RDFa 1.0 (RDFa in XHTML), in which these decisions were established. Perhaps a reconciliation between Microdata and RDFa could take the best of both:
- Craft RDF friendly URIs from terms (such as schema:Person above),
- Reduce amount of document structure needed to describe common use cases,
- Better intuitive generation of RDF output,
- Ability to avoid RDF generation and go straight to JSON (perhaps JSON-LD),
- Use common URI prefixes,
- RDF Lists,
- Promote better HTML readability.
That’s my 2 cents (for now)
Update
The RDFa Working Group recently decided to change the behavior of @src in RDFa Core 1.1 to be the same as @href. This means that
<img rel="image" src="image.jpg" />
...
Actually does now generate the following:
<> xhv:image <image.jpg>
Recent updates to Microdata to RDF processing now do place multiple items in a list, but this is subject to further specification.
In RDFa, this can now be done with the @inlist attribute, which places values in an RDF Collection (rdf:List).
<div vocab="http://schema.org/" typeof="MusicPlaylist">
<span property="name">Classic Rock Playlist</span>
<div rel="tracks" inlist="">
1. <div typeof="MusicRecording">
<span property="name">Sweet Home Alabama</span> - <span property="byArtist">Lynard Skynard</span>
</div>
2. <div typeof="MusicRecording">
<span property="name">Shook you all Night Long</span> - <span property="byArtist">AC/DC</span>
</div>
...
</div>
Now generates the following Turtle:
@prefix: <http://schema.org> .
[ a :MusicPlaylist;
:name "Classic Rock Playlist";
:tracks (
[ a :MusicRecording; :name "Sweet Home Alabama"; :byArtist "Lynard Skynard"]
[ a :MusicRecording; :name "Shook you all Night Long"; :byArtist "AC/DC"]
...
)
]
Microdata parser for RDF.rb
I've just release version 0.1.0 of the RDF::Microdata gem (rubygems, github) for the RDF.rb suite. This version contains only a reader (parser). Writing is not supported at this point.
Use it pretty much like any other RDF.rb reader:
graph = RDF::Graph.load("etc/foaf.html", :format => :microdata)
Feedback either to me, or public-rdf-ruby.
Point Lobos June 2011
Two great dives at Point Lobos with the Pinnacles crew. First dives with my new Canon T2i/Aquatica set up. Pictures here.