Sunday, September 30, 2007

The semantic web is not just today's web with special sauce

Alex Iskold writes on Read/Write web that the semantic web can be achieved today, to some degree, if we create services that can do some basic semantic processing on extant web content. For example, the Spock search engine is optimised to find information about people and relationships. Iskold's basic point is that some end-user value can be derived from a semantics-driven approach without having every web site owner re-engineer their site to use RDF and OWL. While that's not false, to my mind it starts from a faulty set of assumptions.

A basic, recurring problem is that many people assume that the semantic web equals today's web, with some extra semantic goodness added on top. The WWW plus special sauce. Personally I don't think that's a helpful way to approach it. Today's web is highly optimised for human interaction. This is a good thing, but it does rather limit what we can do with machine processing to assist those human interactions: human brains are very capable of processing vague, ambiguous, sometimes noisy content that relies on social constructs to interpret. We can't do that with machine-based processing yet. Better to ask what else we can offer human users, rather than take the current interaction modalities and fiddle with them.

So if the semantic web is not about tweaking the current, human-facing, world-wide-web, why is it called the semantic web at all? I guess Tim Berners-Lee is the person to answer that definitively, since it was his term originally. To my mind, it's all about applying the metaphor of the web to machine-based information processing. To explain. The web brought about a revolution in human information handling thanks to some basic design features:

  • open and distributed were foundational design assumptions
  • simple, resilient protocols that quickly became ubiquitous
  • no central point of failure or control
  • dramatically lower barriers to entry than pre-Internet publishing

There are probably others, my point is not to try to be definitive but to draw out some of the features that produced a democratization of information publishing. Anyone can say anything now, and potentially be heard around the planet. Is this uniformly a good thing? No, there are dark corners of the web we might wish were not there. Is this on balance a good thing? Yes. OK, so the web democratized information publishing for humans. What's the relevance to the semantic web? The metaphor is that, just as the web freed human-processed information from newspapers, books and TV shows, so the semantic web aims to free machine-processed information from databases and documents. On a massive distributed scale, with no central point of control, etc.

Ultimately, though, we produce information systems for people to use, to satisfy some need or desire. So the value of the semantic web, of allowing machines to do some of the information processing legwork, is the extent to which it either helps people do the things they do today more effectively (cheaper, faster, easier, ...) or enables people to do things that they can't do today. The key, it seems to me, is automation. When I'm driving a car, changing from manual transmission to automatic gives me one less task to do, but doesn't fundamentally change my engagement with the task of driving. Whereas an automated highway would let me read the newspaper for part of my journey, even though I'm ostensibly the driver.

If it comes about, the semantic web could be as big a transition as the pre-web to the web. What's difficult to see, I suppose, is an obvious smooth transition from here to where we want to be. Iskold might be right that taking baby steps will keep the idea alive while we work on the hard problems in the lab, but there's a real danger that they dilute the vision without achieving any significant progress to the underlying goal.

Saturday, September 29, 2007

RDF in Ruby

I've been meaning to have a play with Ruby for a while now, and I have a project in mind that a dynamic language would be perfectly suited for. The trouble is, it's an ontology processing project. I don't really want to go to the trouble of building the supporting infrastructure myself (been there, done that). So I've been looking for ontology handling, or at least RDF handling libraries for Ruby. It's not exactly a large field. There are some largely moribund projects, and two active projects I could find: ActiveRDF and Redland. Redland is Dave Beckett's C API for RDF, which comes with bindings to several other languages, including Perl, Python and Ruby. It is, by design, just an RDF API: OWL processing will have to be built on top. ActiveRDF is a meta-wrapper: it provides a common Ruby API to other stores, including Redland, Sesame and Jena (in jRuby only). I probably should spend some more time with ActiveRDF, but some of the "Why do you and why don’t you …" answers on the FAQ mean that my application isn't going to fill all that well with their assumptions. So it looks like my choices are to use the Redland API from generic Ruby, or stick with jRuby and call the Jena API.

I decided to have a go with Redland, since I know Dave and it will be interesting to see up-close how another RDF API works. First hurdle, then, was sorting out the install. I'm working on Fedora 7 at home. This does ship with a version of Redland (though neither of the RPM versions available on Dave's web site), but not the Ruby bindings. Trying to install results in version incompatibility with the Fedora versions of Redland. Trying to install redland-1.0.6-1.i386.rpm results in unmet dependencies with and (Fedora 7 has .4 and .5, respectively). However, installing the source RPM's from and rebuilding the binary RPM files solved that problem. Next issue: redland-bindings would not, however, build correctly from source (reporting this problem:
error: Installed (but unpackaged) file(s) found:
However, once I'd got the updated RPM's built for redland, rasqal and raptor I could simply install Dave's pre-built redland-ruby- Phew. OK, so this is side-stepping rather than solving the underlying problem, but hey, life is short.

The good news is that the demo program example.rb worked first time, and seemed quite nippy without the overhead of starting up a JVM. Right, now time to get on with some coding! ruby, rdf, semantic-web