Monday, November 12, 2007

Jena tip: importing ontologies from the database

Today I wrote a short tutorial on using the Jena OntDocumentManager to allow OWL imports to be resolved from ontologies already stored in a local database.

del.icio.us: jena, java, semantic-web, how-to.

Wednesday, November 07, 2007

RESTful web services book

I've tended to assume that I understood the REST vs SOAP distinction in web services, though as it has turned out most of the WS work I've done up to now has been in SOAP/WSDL territory. So REST was something I'd read blog posts about, but not used seriously. Today I got a copy of Leonard Richardson and Sam Ruby's book RESTful web services. Julie our admin dropped the newly-delivered book off at my desk about 9.30. As I usually do, I flicked through the contents to get a sense of the book before parking it on The Pile of things Intended To Be Read (eventually). A few hours later I'd read the first four chapters thoroughly and skimmed the rest of the book. This is a really well-written book, striking – for me – the perfect balance between explaining the principles and showing the techniques. Too many other CS text books either give you recipes to follow slavishly and leave you to reverse-engineer the principles, or else they part company with the ground and never return. Anyway, big kudos to Richardson and Ruby for a job well done.

So, I now get REST. I don't know that I completely buy all of the arguments, but only because I have a research interest in automation and agent systems, and I think those need more scaffolding. It seems to me that REST works particularly well – and the book makes a compelling case that it does – when you have a team of human developers trying to build usable, useful applications to deploy today. That's a big and important aim, no doubt, and good luck to them. However, if you want to build an autonomous system over a web services foundation, I think you probably need some of the things that SOAP and WS-* were reaching for. The problems with Big Web Services, as I see them, lie in the imperfect execution of the design and specification processes, rather than being fundamentally in a blind alley. I haven't (yet) caught the meme that REST good, Old School web services bad-in-principle. That being said, I think it would make an interesting investigation to think about layering an agent architecture over a RESTful base. Something to work on. Oh, and WADL looks interesting – something else to follow up on.

Richardson and Ruby's book is O'Reilly, so they have an animal picture on the front cover. In this case, a vulpine phalanger. Is that obscure or what? Note to self: write book for O'Reilly before all the cute animals are taken, and you're left with a choice of slug, horsefly or amoeba!

del.icio.us: web-services, rest, book, ruby, web, architecture.

Friday, November 02, 2007

JavaFX in Eclipse: mini-HowTo

I've been meaning to explore JavaFX for some time now, and decided to spend a couple of days on it this week, as part of an investigation for an ongoing semantic web UI project. My Java development environment is Eclipse, and since I had to jump through a couple of hoops to get JavaFX working in Eclipse, I thought I'd document the steps here.

  1. Install JavaFX locally. I downloaded the .tgz archive from the OpenJFX site. Beware that there's no top level directory in the archive, so create a directory first, for example /usr/share/java/openjfx, and cd there before un-tar'ing. I haven't checked that the .zip file has the same issue for Windows users, but I'm guessing it does. I'll refer to this directory later as $JFX.
  2. Install the JavaFX plugin for Eclipse in the usual way from this auto-update site:
    http://download.java.net/general/openjfx/plugins/eclipse/site.xml
    (see also step-by-step instructions)
  3. Create a new Java project (there's no JavaFX-specific entry in the New context menu for new projects, but that's OK). In the source folder, create a new JavaFX file from the New context menu (right-click » New » Other » JavaFX » JavaFX file).
  4. Add some source code to the file. My test file is, imaginatively, test1.fx. I added one of the standard tutorial examples:
    import javafx.ui.*;
    Frame {
        title: "Hello World JavaFX"
        width: 200
        height: 50
        content: Label {
            text: "Hello World"
        }
        visible: true
    }
  5. Note the sytax error: Frame is not recognized (well it wasn't for me, anyway). Solution: add the JavaFX libraries to the build properties. The plug-in adds a library to the User libraries of Eclipse: I guess that in future releases the library will be automatically added. Also, if your downloaded copy of JavaFX is newer than the plugin's, you could reference the libs in $JFX/trunk/lib here.
    Update: Don't remove JavaFX and add a different library (for example JavaFX-local) that references different versions of the Java fx lib .jars. It turns out that the current version of the plugin will automatically add a JavaFX library if there's not one in the build path when you try to launch the app. Having two sets of JavaFX jars makes things break in an ugly way. I've reported this as a bug.
  6. To run the example, there's no 'run as JavaFX' from the context menu of test1.fx. Instead, click Open Run Dialog..., create a new entry under JavaFX Application, and enter the name of the fx file to run as an argument to the JavaFX runner:

    Save the run config, then run the application.

del.icio.us: javafx, java, eclipse, howto

Tuesday, October 30, 2007

AI programming tips from Halo 3

I don't have time to play computer games very often, and in truth I've yet to play any of the Halo games. Nevertheless, I found this list of 42 AI tricks to assist your game from the game team at Bungie responsible for Halo 3 very interesting. Each of the shortcuts the article lists could be the starting point for an agent research project: how would we really like an autonomous system to do that? Another interesting aspect was the predictability of the behaviours: in some cases, the AI system was de-tuned or simplified in order that the players would have more chance of predicting what the AI characters were going to do, both to make the game more playable and to increase the sense of realism. Interesting stuff.

Kudos to Bourbaki in #ai on freenode.irc for the link.

del.icio.us: AI, computer-games, programming

Friday, October 26, 2007

Stardust

I have a moderate number of blogs on my Bloglines subscription list (ok, 101 presently). All but one of them are technology or work related. The one blog that I read just for fun is writer extraordinaire and all-round good bloke Neil Gaiman's journal. Neil wrote an excellent novel called Stardust, which is now a movie which he co-produced. I took the family to see the movie last night, and, well, wow. I'd been expecting a good show from some of the comments floating around the blogosphere, and we weren't disappointed. Nice one. Kudos to all concerned.

Sunday, September 30, 2007

The semantic web is not just today's web with special sauce

Alex Iskold writes on Read/Write web that the semantic web can be achieved today, to some degree, if we create services that can do some basic semantic processing on extant web content. For example, the Spock search engine is optimised to find information about people and relationships. Iskold's basic point is that some end-user value can be derived from a semantics-driven approach without having every web site owner re-engineer their site to use RDF and OWL. While that's not false, to my mind it starts from a faulty set of assumptions.

A basic, recurring problem is that many people assume that the semantic web equals today's web, with some extra semantic goodness added on top. The WWW plus special sauce. Personally I don't think that's a helpful way to approach it. Today's web is highly optimised for human interaction. This is a good thing, but it does rather limit what we can do with machine processing to assist those human interactions: human brains are very capable of processing vague, ambiguous, sometimes noisy content that relies on social constructs to interpret. We can't do that with machine-based processing yet. Better to ask what else we can offer human users, rather than take the current interaction modalities and fiddle with them.

So if the semantic web is not about tweaking the current, human-facing, world-wide-web, why is it called the semantic web at all? I guess Tim Berners-Lee is the person to answer that definitively, since it was his term originally. To my mind, it's all about applying the metaphor of the web to machine-based information processing. To explain. The web brought about a revolution in human information handling thanks to some basic design features:

  • open and distributed were foundational design assumptions
  • simple, resilient protocols that quickly became ubiquitous
  • no central point of failure or control
  • dramatically lower barriers to entry than pre-Internet publishing

There are probably others, my point is not to try to be definitive but to draw out some of the features that produced a democratization of information publishing. Anyone can say anything now, and potentially be heard around the planet. Is this uniformly a good thing? No, there are dark corners of the web we might wish were not there. Is this on balance a good thing? Yes. OK, so the web democratized information publishing for humans. What's the relevance to the semantic web? The metaphor is that, just as the web freed human-processed information from newspapers, books and TV shows, so the semantic web aims to free machine-processed information from databases and documents. On a massive distributed scale, with no central point of control, etc.

Ultimately, though, we produce information systems for people to use, to satisfy some need or desire. So the value of the semantic web, of allowing machines to do some of the information processing legwork, is the extent to which it either helps people do the things they do today more effectively (cheaper, faster, easier, ...) or enables people to do things that they can't do today. The key, it seems to me, is automation. When I'm driving a car, changing from manual transmission to automatic gives me one less task to do, but doesn't fundamentally change my engagement with the task of driving. Whereas an automated highway would let me read the newspaper for part of my journey, even though I'm ostensibly the driver.

If it comes about, the semantic web could be as big a transition as the pre-web to the web. What's difficult to see, I suppose, is an obvious smooth transition from here to where we want to be. Iskold might be right that taking baby steps will keep the idea alive while we work on the hard problems in the lab, but there's a real danger that they dilute the vision without achieving any significant progress to the underlying goal.

Saturday, September 29, 2007

RDF in Ruby

I've been meaning to have a play with Ruby for a while now, and I have a project in mind that a dynamic language would be perfectly suited for. The trouble is, it's an ontology processing project. I don't really want to go to the trouble of building the supporting infrastructure myself (been there, done that). So I've been looking for ontology handling, or at least RDF handling libraries for Ruby. It's not exactly a large field. There are some largely moribund projects, and two active projects I could find: ActiveRDF and Redland. Redland is Dave Beckett's C API for RDF, which comes with bindings to several other languages, including Perl, Python and Ruby. It is, by design, just an RDF API: OWL processing will have to be built on top. ActiveRDF is a meta-wrapper: it provides a common Ruby API to other stores, including Redland, Sesame and Jena (in jRuby only). I probably should spend some more time with ActiveRDF, but some of the "Why do you and why don’t you …" answers on the FAQ mean that my application isn't going to fill all that well with their assumptions. So it looks like my choices are to use the Redland API from generic Ruby, or stick with jRuby and call the Jena API.

I decided to have a go with Redland, since I know Dave and it will be interesting to see up-close how another RDF API works. First hurdle, then, was sorting out the install. I'm working on Fedora 7 at home. This does ship with a version of Redland (though neither of the RPM versions available on Dave's web site), but not the Ruby bindings. Trying to install http://download.librdf.org/binaries/redhat/fc6/redland-ruby-1.0.6.1-1.i386.rpm results in version incompatibility with the Fedora versions of Redland. Trying to install redland-1.0.6-1.i386.rpm results in unmet dependencies with libcurl.so.3 and libpq.so.4 (Fedora 7 has .4 and .5, respectively). However, installing the source RPM's from http://download.librdf.org/source/ and rebuilding the binary RPM files solved that problem. Next issue: redland-bindings would not, however, build correctly from source (reporting this problem:
error: Installed (but unpackaged) file(s) found:
/usr/lib/python2.5/site-packages/RDF.pyc
/usr/lib/python2.5/site-packages/RDF.pyo
However, once I'd got the updated RPM's built for redland, rasqal and raptor I could simply install Dave's pre-built redland-ruby-1.0.6.1-1.i386.rpm. Phew. OK, so this is side-stepping rather than solving the underlying problem, but hey, life is short.

The good news is that the demo program example.rb worked first time, and seemed quite nippy without the overhead of starting up a JVM. Right, now time to get on with some coding!

del.icio.us: ruby, rdf, semantic-web

Tuesday, August 14, 2007

Semantic Technologies Jena Tutorial: source code

At the Jena tutorial I gave at Semantic Technologies 2007, I promised to make the tutorial source code available on my web site. My plan was to spend a little time cleaning up the archive, partly to remove duplication of the Jena libraries (they are in the tree three times, once for the main code, once for Joseki and once for Eyeball). It is still my intention to do that, but I've recognised that other priorities keep intervening. So I've – finally – released the code as-is. It's big archive, 67Mb, so don't download it via your cell phone! And apologies to anyone who has been waiting a long time for me to get around to doing this.

del.icio.us: jena, java, semantic-web, tutorial

Tuesday, August 07, 2007

When collaborative filtering goes bad

Amazon, bless them, have let their CF algorithms get a bit wayward recently. I just received this missive:

Hello, Ian Dickinson,

We've noticed that customers who have purchased or rated Harry Potter: Years 1-4 (4 Disc Box Set) have also purchased Classic Farm & Agricultural Machinery (3 x DVD) [2007] on DVD. For this reason, you might like to know that Classic Farm & Agricultural Machinery (3 x DVD) [2007] will be released on 13 August 2007.

Leaving aside the small matter that people who have bought one thing by definition haven't bought another thing that hasn't been released yet, anyone human looking at the correlation between Harry Potter (I bought the DVD's for the kids, honest) and Classic Farm Machinery is going to do what I did: laugh out loud. Which makes me think that research into the understanding of humour by computers - some of which has been in the press recently - may have a purpose beyond illuminating our understanding of human behaviour. If we create computers that can get jokes, they might be better able to spot stupid errors than automatons that just follow the numbers.

Tuesday, July 24, 2007

Subversive 1.1.3 update for Eclipse on Linux

I prefer Polarion's SVN plugin for Eclipse to the default. It's always one of the first things I install when I upgrade or reinstall Eclipse. The auto-update feature notified me last week that version 1.1.3 is available, so I updated, only to have SVN access break. Following an emailed bug report to Polarion, Alexander Gurov kindly responded pointing me to this note. Starting with 1.1.3, non-Windows installations require an extra installation step (install SVNKit) and an extra configuration step (set the SVN client to SVNKit).

del.icio.us: eclipse, java, subversion

Monday, July 23, 2007

Jena tutorial on DevX

I should have blogged this at the time, but DevX recently published a Jena tutorial article I wrote for them. Specifically, it focuses on Jena's Model abstraction, including the commonly used extensions and variants. It was fun writing for an audience other than the academic research community, which is more what I'm used to.

del.icio.us: jena, semantic-web, rdf, tutorial,

Saturday, July 14, 2007

Save the Tara complex

In 1992, I had a wonderful holiday in Ireland. The Irish are warm, welcoming people, the countryside is fantastic and they have a wealth of truly wonderful archaeological sites. Preeminent among these is the Tara complex, a huge area including dozens of prehistoric henges and other sites, including Newgrange and the Hill of Tara. It's comparable in some ways to Stonehenge, since Stonehenge is also at the centre of a wide area of barrows, henges and other early sites of settlement and human activity. The British government is looking to seal a place in history by moving or hiding the busy A-class road that runs through the Stonehenge complex. The Irish government, I found out today, is looking to seal its place in history by building a motorway (a multi-lane highway for high-speed, high volume traffic) over the Tara site. Unbelievable.

Naturally, there's a campaign to save the site, which has been running for quite a while. To those campaigners, good on you! We from outside the country can help by signing the Save Tara petition. Please do so. And go there: it's a great place to vacation.

Sunday, June 24, 2007

Upgrading to Fedora Core 7

I was forced into an emergency upgrade to FC7 this weekend. I ran my periodic 'update everything with yum' exercise on FC5. Clearly FC5 is somewhat behind the curve now, but it was working for me and I didn't want to waste the time upgrading my system when I have (many) other more urgent things to do. Having done the updates, Gnome went truly weird. All of the text labels in Gnome itself, including menus, button labels, tooltips, etc, disappeared. Gone. Which doesn't, it has to be said, make for a very usable UI. Interestingly, many of the apps themselves were OK. The peculiar thing was that I had text in terminal windows (but no menus in those windows), FireFox was working, just no Gnome text. I thought about trying to roll back the changes to find out what broke, but there's no easy way to do that (afaik). Instead, I decided to upgrade to FC7 and hope that would fix the problem. Which it did, but caused pain along the way.

I expect that some of the problems I had would not be problems if I'd installed rather than upgraded. Maybe. I found Mauriat Miranda's Personal Fedora 7 Installation Guide very helpful. Thanks Mauriat! Especially useful was the tip, which I didn't discover right away, not to use Nvidia's own installer to provide the graphics acceleration. Better to use the Livna module, as Mauriat says. The difference is this: one way works and GDM starts, the other way doesn't work and GDM does not start.

As far as I can tell, there's no equivalent to kernel-smp in FC7. In FC5 (don't know about FC6, I skipped that one), to get the benefit of a hyper-threaded CPU (e.g. Pentium 4) you have to use the multiprocessor kernel, which you install as kernel-smp. There is no kernel-smp in the FC7 repository. However, the default kernel module shows me two CPU's in gkrellm, so I guess that the two kernel modules have been folded together. Cool.

Perl blew up on me. I noticed this during the installation of some packages, and when I tried to run Perl-based applications, I'd see:

# vmware-config.pl
/usr/bin/perl: error while loading shared libraries: libperl.so: 
cannot open shared object file: No such file or directory

It turns out that the upgrade process somehow loses perl-libs, so you need to yum install perl-libs and everything is peachy.

I now have a mostly-working FC7 system again, albeit on a an unplanned schedule. Mounting my NTFS partition isn't working yet, but I'm setting that aside for the time being. First impressions of FC7 are favourable so far. It's nice finally to have FireFox 2.x installed as the default.

Thursday, June 21, 2007

Kaspersky Kudos

I've just had to rebuild the Windows XP machine that the kids use for game-playing in the rec room. I backed-up the hard drive to an external USB drive using Windows Backup, zapped, re-installed, all fine. When it came time to re-install my virus checker of choice, Kaspersky AV 6.0, I couldn't find the .key file that the downloadable version of KAV wanted in order to recognise my existing license. Windows Backup is just the worst interface (why can't you resize the window for pity's sake?!) So I called Kaspersky tech support in the UK. What a surprise! No endless phone menu or 45+ minute wait on-hold (Vodafone, I'm looking at you). In fact, I was speaking to a tech support engineer within about 15 seconds of dialling, explained the problem in a matter of minutes, and he emailed me a new activation code. No fuss, no muss. Exactly how tech support should be done.

Saturday, June 02, 2007

Jena tip: dealing with reflexive class and property iterators

Under the semantics of OWL, every class is a sub-class of itself. Let's assume we have three classes: A, B and C. C is a sub-class of B, and B is a sub-class of A. According to OWL, the sub-classes of A are therefore A, B and C.

In Jena, the reasoners, both built-in and external (like Pellet, will correctly infer the expected triples:

:A rdfs:subClassOf :A .
:B rdfs:subClassOf :A .
:C rdfs:subClassOf :A .

However, oftentimes that correct conformance to the spec can be a nuisance when programming. Suppose we are generating the TreeModel for a Swing JTree directly from our Jena triple store. We really don't want each node in the tree to have itself as a child. This was a sufficiently common user request that, in the Jena ontology API - a convenience API for handling ontology terms - the OntClass Java class doesn't report itslelf as a sub-class when listing the sub-classes through listSubClasses(). The triple is still there in the model (assuming the appropriate degree of inference is turned on), but is filtered out from the return value to the listSubClasses() method.

It has recently been pointed out to me that listSubProperties() in OntProperty does not behave the same way. The theory is the same - every property is a sub-property of itself - but the method does not automatically filter out the reflexive case. This is an accident of history: until now, very few users have requested that feature in OntProperty. But I can see the argument that the two list... methods are inconsistent in their behaviour.

Fortunately, there is an easy workaround, which applies to this case and indeed any other where filtering out the reflexive case would be handy (e.g. when listing equivalent classes). The iterator returned by listAnything in the Ont API is a Jena ExtendedIterator, which has a number of features including a filter hook. Calling filterKeep or filterDrop on an extended iterator returns a new iterator will return a new iterator whose values are limited to those that match a given Filter object (or which don't match in the case of filterDrop). So to skip over the reflexive case, and not report that a property is its own sub-property we do:

/** Filter that matches any single object by equality */
public Class EqFilter implements Filter
{
  private Object m_x;
  public EqFilter( Object x ) { m_x = x; }
  public boolean accept( Object x ) { return m_x.equals(x); }
}

// in the application code:
OntModel m = ... the Jena model ... ;
OntProperty p = ... the property of interest ... ;
Filter reflex = new EqFilter( p );

ExtendedIterator subP = p.listSubProperties()
                         .filterDrop( reflex );

I don't know whether to change the default behaviour of listSubProperties. We generally like Jena to stick to the standards it is based on, in this case the OWL semantics. On the other hand, the point of the ontology API is to be a convenience layer on top of the raw RDF triples. Convenience is in the eye of the beholder. What I definitely don't want to do is add yet another Boolean flag to the method call. I'm open to suggestions!

Sunday, May 27, 2007

Semantic Technologies 2007 wrap-up

Back home now after Semantic Technologies 2007. It was a good few days, instructive on different levels. The two Jena sessions I facilitated went reasonably well from my point of view, despite a rather shaky start to the tutorial (I was nervous, and the preparation I thought I'd done enough of turned out to be insufficient ... I'll know better next time). I've yet to receive formal feedback from the conference organizers, which I'd expect to come in the next few weeks. The delegates are asked to fill-in evaluations for each session.

This was my second year at ST, and both times it was a very well organized event. The days were perhaps a little on the long side. Sessions started at 8.00 or earlier, and continued to around 6 pm. On the plus side there were lots of breaks, which was a good chance to meet and mingle. The venue (Fairmont Hotel) is very nice and the food was actually quite good.

The conference sessions I attended were a little mixed, but included some very good talks. It's interesting contrasting this conference, which essentially has a business orientation, with the more academic conferences like ISWC. The style of presentation is different, lacking the organising pattern of 'problem-hypothesis-solution-validation' but not necessarily worse for that. I also think that I missed some good talks (there were eight parallel tracks): somehow I need to get better at mining good candidates out of the conference programme. The talks were generally longer than the common academic pattern of twenty-minutes-plus-five-for-questions which had an interesting effect: most presenters left lots of time for questions. These question sessions were often the best bits of the talks.

There was a trade show too, with (at a guess) around 25 exhibitors from large companies (Oracle) to small one-or-two person outfits. Lots of tools, including ontology development (e.g. TopBraid Composer, SandPiper) and semantic application development (Metatomix, TopBraid). Quite a few approaches to extracting structured semantics from unstructured text sources. Being a geek technically-minded person, I gravitated towards the more techy-looking stands. The good ones were very good at explaining their pitch, while at others (who shall be nameless) I met some marketing person with that rabbit-in-the-headlights smile who answered all my questions with "you need to talk to the technical guy". Sigh. Overall I think the trade-show was larger and more active than last year. There was some very good technology and tools, but no applications that made my jaw drop. Or even hinge downwards a bit.

There were several panel sessions, one of which I missed most of due to illness. On the whole, they were well organized but not very informative. I think they made a mistake in making the panels too large, which meant that each panelist had too little time to develop much of a theme. The investors and analysts panel was especially disappointing: I didn't get much insight there at all. I had been looking forward to hearing Nova Spivack speak, but he didn't get much air-time from the moderator and didn't say anything about Radar Networks except "we're in stealth mode".

Would I go again? Yes. Let's hope Wilshire invite me back next year!

Now with added RDF

Finally got around to configuring FeedBurner to deliver RDF (i.e. RSS 1.0) from this blog. Sorry it took so long!

Wednesday, May 23, 2007

SemTech conference session notes: keynote panel building the semantic industry

Notes from Building the Semantic Technology Industry: A Conversation with Entrepreneurs and Investors.

Safa Rashtchy

Analyst with Piper Jaffry (10 years). Consumer behaviour is changing. Google makes it easy to find things with minimal effort (avg. 1.2 keywords per search). People becoming "lazy"? Do consumers really give much knowledge in searches? Barriers/priorities: wave will be consumer web, money will come from advertising. People are too used to getting thigns for free. New effect is smart matching advertisers to users. But advertising musn't be too much in your face.

Russell Glass

Zoominfo semantic search engine. Drivers? Consumers don't care about semantic web. But are voracious consumers of content. Would require thousands of people to create the content that we can automatically aggregate today. Barriers/priorities: disagree about advertising only. Trends suggest ad/subscription hybrids are growing. Rich semantic models allow businesses to determine how to partition business model between subscription and advertising.

Mark Greaves

Vulcan Inc - venture capital fund founded Paul Allen. Drivers? Data integration is a long-term need. Maybe we'll be successful this time! Web 2.0 is the first computer architecture that came from the people, not from the enterprise. People are social animals, want to be heard and have their disparate needs met. Consumer drivers will accelerate past enterprise.

Jamie Taylor

Metaweb - producer of Freebase. Responsible for community building. Drivers? Have to reduce cost or increase value. Mash-ups increase cost because they are disorganised. Data will become more organized to decrease cost. Open data is the key. Data as a community good. Semantic technology will sneak in on the back. Barriers/priorities: long tail - can micro-apps have value to other people? Change in mindset: open my data, brings value both to other people and therefore back to me.

Mills Davis

Project10X. Facilitating. Has a report on the state of the market available from the semtech web site. Questions 1. what are the drivers? 2 what are the barriers and priorities?

Bradley Allen

Founder of Siderean (formerly of Inference Corp). Drivers: lots of information, vast growth. Need metadata to manage data overload. Barriers/priorities: are consumer and enterprise models merging?

Nova Spivack

Radar Networks. User facing semweb app, still in stealth mode. Derives from semantic desktop research. How can we integrate our various scattered filesystems (email, desktop, laptop, online, etc)? Drivers? Web 3.0 - coming decade of web innovation. Cyclical pattern. Pre web-1.0, making the pc usable. Web 1.0 - backend innovation. Web 2.0 - front end, wisdom of crowds etc. Web 3.0 = "dataweb". Mainstream adoption of semweb is still several years away. Advanced reasoning, OWL etc, is web 4.0? Agents etc - a decade away.

Eghosa Omoigui

Intel Capital. Largest VC company in the world. 27 countries. Has three focus areas: consumer internet, search, semantic technology (note: not nec. semantic web). Was "underwhelmed" by web 2.0 expo. Drivers? Web is becoming a social medium - making friends on the internet not in real space. Does this raise expectations. Want everything at high speed. What we do has to fit in with what people are doing anyway. Barriers/priorities: unintended consequences. Security/privacy? Who will see my open data and what will/can they do with it?

comments/questions from audience

shouldn't the first wave of development be moving existing apps to the new platforms. where are the semantic versions of existing apps? Nova: it's still very expensive to build apps - the tools just aren't there yet. Scalability. James: what's the LAMP for semantic apps. People do component replacement, e.g. semantic stores.

Is there is land grab in the semantic web? What is it? Also, what education needs to happen? Mark: land to be grabbed is content and ontologies. Content is king. Zoominfo and Freebase are building presence in knowledge about people and semi-structured content. Nova: land-grab for user's attention. Bradley: land-grab for vocabulary. FOAF etc. Education - we need to understand what's essential and what is not. Put aside reasoning, focus on block-and-tackle issues like basic vocabulary. Mills: land-grab executable knowledge.

Comment: actually the government has all the data you need (from a government representative!)

SemTech conference session notes: semantic query

Notes from conference session Semantic Query: Solving the Needs of a Net-Centric Data Sharing Environment, Matt Fisher and Mike Dean. Want to pull in information from multiple sources, federated queries. Want to deliver information as a single response, timely, trustworthy, from all relevant sources. No human assistance, don't want to have to have intimate knowledge of the data source. Data spread over more than a single repository (db, excel, files, local access db), or in multiple formats - maybe proprietary. Traditional solutions: data warehousing, multi-dimensional databases, business intelligence approaches. Risks replicating the problems but on a larger scale.

Asio distributed query solution. Bridges to existing sources, initially relational db's and soap web service endpoints. Use swrl rules to map from domain ontology to data-source ontology. For RDB's, use D2RQ to map the db schema to an ontology. Semantic query decomposition - determine which db to send which parts of the query to. Not addressing "data deconfliction" in this project - mapping rules determine the golden data source for a given query element.

Use SWRL to map individuals between datasources. Translate SWRL rules to Jena rules via SweetRules. Automapper uses JDBC to introspect the db schema. Each table becomes an OWL class. Columns become properties. Based on D2RQ, not using :join or :AdditionProperty, but added :constraint.

SBWS - Semantic Bridge for Web Services. General tendency for web services to mediate access to data sources. Gives data owners more control, comfort, but makes it hard to understand the schema and hence generate the mappings. SBWS is being (will be?) adapted to include REST, using WADL to describe REST-ful services.

SemTech conference session notes: semantic user experience

Notes from Semantic User Experience, Ross Centers, James Huckenpahler, Rob Bauman. General discussion of user experience design. Claim: semantic technologies will make it easier to maintain a brand more consistently. Web 2.0 - perpetual beta is a value. Constant updating and development, faster release cycles. So what's the role for interaction design? Designers will need to think about how components/widgets will fit in other contexts. Claim (not expanded): semantics will make it easier to build mashups.

Tags vs ontologies (built by anyone/crowds vs. built by "teams of dwarves locked in mines"). Could wiki bridge the gap? Semantic wiki as a way for crowds to refine a shared ontology. E.g: Semantic Assistants - Research Assistant (screenshot of an apparent product, but not visible via google).

Wish list for designer:

  • ontology of visual representations
  • ontology of user interaction
  • semantically-enabled design tools - label the affordances of designed elements ("this button does this ... ")
  • a detailed model of the user to relate semantic products to

Rob Bauman - "world's first semantic game". Treasure Hunt. Built on ontologies, 2FTE for 13 months (1 engineer and 1 modeler). Budget CN$300M CN$300K (thanks for the correction Rob). Built a reusable game/simulation platform. Ontologies for game play, economics, 3d modelling, game resource, ... others. Technologies: visual knowledge, open croquet. Built a model of croquet using an ontology representation, so that the game engine can drive the 3d engine. The user interface shows connections between concepts and "agents" representing active behaviours. Includes decisions and inference steps. Claim that it will scale to tens of millions of agents.

SemTech conference session notes: migrating from relational world to semantics

Notes from Migrations: Moving From a Relational World to a Semantic One, Barbara McGlamery. Case study: redesign of web site for Entertainment Weekly. Wanted to make more use of data collected, and make more scalable. Category Tool, year 2000, successful but not flexible (very hierarchical categories) and not scalable. Topics - semantic web tool 2005, built on realsimple.com.

Category Tool: 78K categories, plus relations, built on Vignette story server. Everything is a category (both classes and instances). Topics: built on Sybase DB. RDF and OWL. More flexible structure, represent individuals and relationships directly in OWL. Goals include to make the data more portable, poly-hierarchy (Will Smith is an actor and a musician), and support multipart relationships.

Ontology design driven by business needs and application, but not re-using existing ontologies. Ontology design issues

  • dates would sometimes be imprecise (May 12 2007 vs. Spring 2007)
  • multipart relationships Critic Owen gave grade B to move M. Used specific properties e.g. :gaveGradeA, :gaveGradeAMinus

Also data clean-up problems. Homonyms, relationships in the wrong direction (film is lead in actor). Done manually by interns using spreadsheets!

Did the migration in three phases, development, QA and complete. Each phase had to correct mistakes or add cleanup, but still very manual. Other sites in Time web presence now also use the same named entities.

SemTech conference session notes: semantic SOA

Notes from Semantic SOA: Aligning IT with Business Operations, Larry Lafferty. SSOA - consortium of companies to provide a semantic services framework. Autogeneration of forms from soa interfaces. Integrates various vendor components: Siderean, AgentLogic (event distribution), Kapow ("mash-up service"), ISL, SoftPro CommandLink ... (I missed some).

Claimed an insight that workflows should include human interaction, rather than be just machine based. Built two demos: pilot recovery planning for downed plane, and information fusion for identifying suspicious activity in shipping. Some of the info for the info fusion demo comes from Google and Wikipedia!

Only partial semantic descriptions of services to date. Need a process editor for end-users to create their own workflows. How well can users really cope with the complexity of real workflows?

Question from the audience: you described this talk as being about Semantic SOA, but you didn't talk about OWL-S, SAWSDL, ... , etc. Can you talk about those now? Answer: no I can't, sorry.

Tuesday, May 22, 2007

SemTech conference session notes: building the practical semantic web

Notes from Building the Practical Semantic Web With Focus on Reasoning, Lars Hard. semweb presents barriers to web workers - heavyweight documents and standards, albeit that they provide a good foundation. We need better tools to hide the complexity from ordinary developers. Other barriers include:

  • creating rdf ontologies is too hard
  • automated knowledge extraction is not usually possible
  • "usable" reasoning, scale and complexity of computation
  • can be hard to show value over non-semantic approaches

Most semweb examples are corporate and dull. Not enough cool apps for general audience (cue a list of the usual suspects).

Need new tools - simple and fun. One click publishing, includes a SOAP interface. Target demographic: 15-17 year olds! Example based programming, feeding a machine learning algorithm. Strong growth will come from many networked small-scale applications.

Example applications: tyre recommendation based on slider inputs for price, performance, etc. Wii game selection based also on slider inputs, or on games that are similar/related. Increase degree of disovery to raise the number of games in the recommendation set. FelixGames.com - flash games vertical search.

SemTech conference session notes: relational navigation

Notes from Relation Navigation: Delivering on the Promise of a Semantic Web, Bradley Allen. Missed the intro section. The user experience: context, relationships, participation. Focus on emerging de facto standard ontologies like foaf, dc, etc. "Unanticipated queries" is the key differentiator with standard navigation patterns. Scalability and integration are key issues: 10^9 triples. The question for me, though, is not can we scale the underlying store but can we scale the UI? No interaction design can cope with trying to show more data than the user can comprehend, so how do we get users to see the forest instead of the trees?

Siderean now has twenty-six customers to date; media, federal, aerospace/defence, ... Helping people know what's available to them.

SemTech conference session notes: machine-to-machine intelligence m2mi

Notes from The Semantics of Simplicity - M2M Intelligence and Complex Adaptive Systems, Geoff Brown. Can we withstand events like Katrina - what happens when civilian and command-and-control infrastructure collapses? Can we build a really resilient infrastructure that can withstand such events. Thesis: OSI seven-layer stack is brittle and static, we want more flexibility about the locus of control, and new layers. Specifically an eighth layer valued information at the right time (VIRT), and zeroth m2mi. Supposedly, this will allow a machine to reconfigure the stack and protocols dynamically.

I found it all very high level, abstract, and not very convincing [note: edited to improve the tone].

SemTech conference session notes: a hole in the ground

Notes from A Hole in the Ground: 12,476 ways to describe an oil well, d'Armond Speers. IHS is an information aggregator for various industries, with lots of separate systems and applications. E.g. 68 info processing apps for energy industry. Goal is to create a common data repository within the company (not a single container, but a consistent view), with a common data access API. There are vary many proprietary and industry standard data formats, has been some effort to come up with a common XML format. Relational model with many thousands of columns, hundreds of tables. XML format has around 1300 elements. Would equal roughly 200 billion triples.

Archaeology: "sifting through the ruins" for insights into application formats. Anthropology: getting "tribal knowledge" from old hands. Total of 12,476 attributes in collective models for describing featurs of an oil well. Problem to convert impoverished data from one region to very rich model used by other region (e.g. company data, stratigraphy).

Aim to build a domain ontology to describe common model. Tag the source data using the terms from the ontology. Some terms are common, many are very different. E.g: well codes in IRIS = 22, well codes in PIDM = 331. Many thousands of such lookup tables. Information can come from from well operators, or indirectly through government agencies, but is not consistently identified. Needs duplicate detection. Migrating the oil well model is 25% done.

Looking to apply SOA to customer applications. Anticipate a need for semantics exposed in the SOA. Some customers already mine data from delivered output to do their own integration and processing.

Started with Protege, but have now moved to using internally-developed ontology tools. How to deal with terms that are "doctrinal" rather than objectively factual. E.g: what does "deep" mean in different localities?

SemTech conference session notes: related search using semantics

Semantic Technology Conference sesssion: Related Search using Semantics: A Case Study from CNET, Tim Musgrove. CNet is the 10th largest global web site, many web brands (news.com, shopping.com, etc). Collaborative filtering recommendations can throw up anomolous recommendations (e.g. see also 'ipod' when searching for 'hp laptop'). Click through rate for alternative searches CF about 3%, but coverage only about 4% of incoming searches. Problem is lack of data for statistical approaches. Have to use whole query, since word-by-word query decomposition introduces too much ambiguity.

Word sense disambiguation, e.g. "ultralight" to "ultraportable" is an alternative to CF. Case study with CNet: integration took one day with one engineer. Has been up on CNet site for 10 days, so only early untuned performance. Results: 3.9% CF coverage, semantic equivalance 19.1%. Click-through for SE only slightly lower, but best results for click-through and coverage when combined methods.

Adding named entities to the lexical background knowledge could increase coverage, allow lowering of the quality threshold. Next step: expand search using hypernym, hyponym, etc.

Saturday, May 19, 2007

Vegetarian dining in the SF Bay Area

I'm in the Bay Area this week, preparing to give a Jena tutorial at the Semantic Technologies 2007 Conference in San Jose next week. While I was visiting with colleagues at HP Labs, the most excellent Brett Bausk put together a Google map of vegetarian and vegan restaurants in the locality. What a cool way for people with the kind of knowledgeable insight that Brett has to share it with other folks. Ace!

Thank you Brett.

del.icio.us: vegetarian, dining, restaurant, san-francisco, bay-area.

Tuesday, April 24, 2007

Open Cygwin Shell Here

I always install cygwin on every Windows machine I use. Invaluable. I found myself wishing today for a Cygwin equivalent for the "command prompt here" Windows Explorer extension. And there is one. Kudos to David A. Mellis.

del.icio.us cygwin

Tuesday, March 20, 2007

Saturday, March 17, 2007

Mozilla Thunderbird: check all IMap folders

I use Mozilla Thunderbird to access my corporate email when I'm outside our firewall, through a secure IMap gateway, and inside the firewall when I'm working on my Linux systems. It's a very nice email app. The one thing that always catches me out is that, by default, Thunderbird will only check subscribed IMap folders for new mail. I have a bunch of rules that run on the Exchange server that sort my mail into folders, so checking the folders for new messages is important to me. Every time I re-install Thunderbird, I have to re-remember how to change the default behaviour, so I'm writing it down here for future reference!

First, get the advanced config window up from the Tools | Options... menu (Windows) or Edit | Preferences... (Linux). On the Advanced options tab, click the Advanced Configuration button to bring up the about:config window. Put folder into the filter box:

Double click the mail.check_all_imap_folders_for_new to set it to true. Done.

This works on Thunderbird 1.5.x. I don't know what will change when 2.0 comes out.

del.icio.us: thunderbird, imap, configuration.

Saturday, February 10, 2007

Quantum Computer 'Orion'

I didn't realise that quantum computing was so close to being realised, but according to Dr Dobb's Journal it's going commercial now. If true, that's seriously amazing! Lots of applications in AI and reasoning are bounded by the intractability of the algorithms. D-Wave's Orion computer can solve NP-complete problems quickly (the article only says "in record time"). Clearly super-cooled quantum devices won't be fitted to desktop pc's any time soon, and anyway the current device takes time to be configured. Nevertheless, given the rate at which hardware progresses, software researchers need to start thinking now about what we're going to do as all that computational power - a paradigm changer for sure - comes on stream. Hmm.

Wednesday, February 07, 2007

CSS - Conditional comments

Like anyone who looks after any web sites, CSS rendering engine bugs are a bane in my life. Tweaking the CSS code to look OK for all visitors' browsers is a right pain. The usual way to approach the problem is either to try to get a single stylesheet that looks adequate in all browsers, while possibly not optimal in any, or to use various CSS hacks such as partially formed comments. These exploit known bugs in, e.g. the IE 5.5 CSS parser, to present different styles to different browsers. Thanks to the nice people at Virgin Radio, today I came across a rather cleaner solution: CSS - conditional comments. Very useful. Thanks ant!

Tuesday, January 30, 2007

del.icio.us bookmarks extension for FireFox

I'm an occasional, rather than compulsive, user of del.icio.us bookmarks. Nonetheless, whenever I reinstall FireFox, the del.icio.us toolbar is one of the few extensions I install automatically. I did this last week after reinstalling FireFox on one of my Linux machines. The installer seemed to have been updated from last time I went through this operation, but hey, stuff changes. Clicked OK on the buttons yada yada. Got the new del.icio.us Firefox extension, and boy has it changed.

So what does it do? Well, it uploads all of your FF bookmarks to your del.icio.us account, then replaces the FF bookmarks menu, and sidebar with its new, better version. All of your bookmarks are still in there, somewhere, but they appear as del.icio.us bookmarked entries. As a side-effect, if a given URL is already tagged by other users, you'll get those tags for free. So that's kind-of nice. What totally sucks is that the process throws away your existing category structure. Now, I'm the first to admit that I tend to the overly analytical. My Myers-Briggs is INTJ. My bookmarks are nested in folders up to four deep. The frequently used ones are near the top of the hierarchy, but, most of all, I know where they are. Once I'd del.icio.us-toolbar'ed my bookmarks, all that structure was gone. I was left with two flat lists, one of 980 bookmark URL's and one some several hundred tags that apply to those bookmarks. There is a category structure you can build (del.icio.us calls them bundles), but it only applies to tags, not bookmarks. So there was no way to manually re-create my nested structure.

From a usability point of view, the tool bar is nicely implemented, but, for me at least, completely unusable. I imagine that if you have a user whose bookmarks are already in one thumping great flat list, the auto-tagging feature will add some discernable value. For me though, it was a massive step backwards. It's a shame, because I believe in the value of metadata. The problem is that you really make people work to find stuff they've added. I don't want to have to type in search terms to locate the bookmark to my local weather forecast.

A saving grace is that uninstalling the extension gives a dialogue that offers to restore your bookmarks back to the way they were. This worked seamlessly. Apparently, the "classic" version of the toolbar is still available, so I'm going to try to track that down now.

del.icio.us: broken.

Ant problem: no class def SerializerTrace

I've started on a new project at work, where existing work has been done by colleagues doing Java development on Windows. Since I want to stick to my Linux environment, I'm trying to use the existing build scripts on Linux. The ant script works fine under Windows, but when I try to run the Ant xslt task on my FC5 machine (or RHEL 4, same thing happens), I get:

build.xml:74: The following error occurred while executing this line:
java.lang.NoClassDefFoundError: org/apache/xml/serializer/SerializerTrace

Now my Ant is installed via JPackage, and has all of its dependencies up-to-date. Double check, and, yup, xalan is installed and should provide all of the dependencies for the xslt Ant task. Google for answers, and it seems that there has been a packaging change in xalan 2.7, thereby confusing Ant (ants do have very small brains). This wiki page from the SipX project gives the necessary clue and recipe. Ant needs to be told to look for xalan-j2-serializer.jar, which can be achieved by dropping a file into /etc/ant.d. I'm not entirely clear what /etc/ant.d is doing, but this works for me:

root@rowan-8 ~
# cd /etc/ant.d

root@rowan-8 /etc/ant.d
# echo xalan-j2-serializer.jar > xslt

It doesn't matter what the file is called, it just needs to have the jar name in it. Presumably at some point the dependency issue will get sorted out cleanly. In the meantime, that's another minor but irritating roadblock out of the way.

Wednesday, January 17, 2007

Editable houses?

Consider: edit the config file ... refactor your house. Cool. May not work on if you live in a high-rise tower! I want one, especially if can do plumbing too. Via Kurzweil.

Jena tip: get the OWL class of a Resource or Individual

This is a frequently asked question that I'm going to address here and feed to future seekers after truth by the power of Google. In an OWL ontology (or, equivalently RDFS or DAML, but I'm just going to talk about OWL here for simplicity), you might find something like this:

<ex:Dog rdf:ID="deputy_dawg">
  <foaf:name>Deputy</foaf:name>
</ex:Dog>

We have an OWL class ex:Dog (not shown), and one instance of that class: an individual whose URI is the XML base with deputy_dawg appended. So, if the xml:base is http://example.com/crimefighters#, the URI will be http://example.com/crimefighters#deputy_dawg. Now, suppose using Jena we have a reference to that resource. The frequently-asked question is: "how do I get the OWL classes for the resource?". Note that it really is OWL classes plural. We'll come back to this point in a bit. First let's do some setup.

An RDF resource is some thing, identified by zero or more URI's, about which we can make statements in a binary predicate form (e.g. "thing has-name Deputy""). In Jena, RDF resources are represented by the Resource class. The main RDF API, com.hp.hpl.jena.rdf.model provides classes and methods for manipulating resources and their properties. However, when working with OWL ontologies, there's a convenience API that extends the capabilities of the core RDF API. This is in com.hp.hpl.jena.ontology, and for convenience we'll call it the OntAPI. OntAPI has a Java class OntResource for representing general resources, and specialisations (Java sub-classes) of OntResource for resources that have special roles. So an resource denoting an OWL class has a convenience Java class OntClass, while a resource denoting an individual (such as our hapless canine lawman) is represented by the Java class Individual. So our question can be reformulated: get the OntClass resources denoting the OWL classes describing a given OntResource.

The primary method for doing this listRDFTypes. Why RDF types, not listOWLClasses()? The rationale is that the link from an individual to its OWL Class, RDFS class, etc, is via the RDF property rdf:type. However, this has proved confusing to some users, so I may add listOWLClasses() as an alias in a future release of Jena. Why listRDFTypes rather than getRDFType? In fact, getRDFType does exist, but isn't as useful as we might expect, for reasons discussed below. So let's see an example:

String NS = "http://example.com/crimefighters#"
OntModel m = ...   // assume this is the Jena model we're using

Individual dd = m.getIndividual( NS + "deputy_dawg" );

for (Iterator i = dd.listRDFTypes(false); i.hasNext(); ) {
    Resource cls = (Resource) i.next();
    System.out.println( "deputy_dawg has rdf:type " + cls );
}

This produces the output:

deputy_dawg has rdf:type http://example.com/crimefighters#Dog
deputy_dawg has rdf:type http://www.w3.org/2002/07/owl#Thing

Why does owl:Thing appear there? Simply put, it's an entailment added by the reasoner. All OWL classes are subclasses of owl:Thing. So any resource that has rdf:type T also has rdf:type owl:Thing by subsumption. In general, listRDFTypes will list all of the type statements in the model, no matter whether they were asserted or inferred. Unless the model was configured without a reasoner, there will normally be more than one rdf:type per resource, sometimes quite a lot. This is the reason why listRDFTypes is preferred: getRDFType will non-deterministically pick one of the available rdf:type's if there is more than one available. The caller has no control over which will be picked.

It's often convenient to have only the most specific type of an individual. This is what the Boolean direct flag denotes: if set to true, only the direct (most immediate) type statements are returned. We'll illustrate this by modifying the example slightly, starting with the ontology itself:

<owl:Class rdf:ID="Ally" />
<owl:Class rdf:ID="Dog">
  <rdfs:subClassOf>
    <owl:Class rdf:ID="FaithfulFriend" />
  </rdfs:subClassOf>
</owl:Class>

<ex:Dog rdf:ID="deputy_dawg">
  <foaf:name>Deputy</foaf:name>
  <rdf:type rdf:resource="#Ally" />
</ex:Dog>

Now the code:

for (Iterator i = dd.listRDFTypes( false ); i.hasNext(); ) {
  Resource cls = (Resource) i.next();
  System.out.println( "deputy_dawg has non-direct rdf:type " + cls );
}
for (Iterator i = dd.listRDFTypes( true ); i.hasNext(); ) {
  Resource cls = (Resource) i.next();
  System.out.println( "deputy_dawg has direct rdf:type " + cls );
}

Which produces the following output:

deputy_dawg has non-direct rdf:type http://example.com/crimefighters#Ally
deputy_dawg has non-direct rdf:type http://example.com/crimefighters#Dog
deputy_dawg has non-direct rdf:type http://www.w3.org/2002/07/owl#Thing
deputy_dawg has non-direct rdf:type http://example.com/crimefighters#FaithfulFriend

deputy_dawg has direct rdf:type http://example.com/crimefighters#Ally
deputy_dawg has direct rdf:type http://example.com/crimefighters#Dog

because only Ally and Dog are immediate (i.e. non-subsumed) types.

OntResource defines a number of convenience methods for manipulating the OWL class of a resource. Besides listing and getting the type (listRDFType and getRDFType as above), there are also methods for adding a new type (addRDFType), replacing all existing type assertions (note: assertions, not entailments) with a new type (setRDFType) and testing the type (hasRDFType). For full details, please see the Javadoc.

del.icio.us: jena, tutorial, semanticweb.

Wednesday, January 10, 2007

WebS 2007

I'm on the programme committee for the 6th International Workshop on Web Semantics (WebS 2007), which has just made its call for papers. Submissions are due in March 2nd 2007, conference in Sept 2007. Semantic web, agents and ontologies – what more could be wanted! Submit ye all.

Thursday, January 04, 2007

jga: Generic Algorithms for Java

A useful find, via Elliote Rusty Harold: jga: Generic Algorithms for Java is a library for definining functor-style generic algorithms in Java. Examples here. I used SML on one project, years ago, and I've been enamoured of the functor-style of coding ever since. Of course, it's not nearly so fun if you can't write lambda-expressions and partial applications, but JGL does look like a useful tool that shouldn't have much, if anything, in the way of hidden costs.