Thursday, December 14, 2006

Peer review

Tuesday 12th was PhD graduation day, so that eight-year chapter of my life is finally closed. Got to shake hands with Lord Owen, Chancellor of the university and former Foreign Secretary – the first time I've had the chance to meet and greet with a Peer of the Realm. Nice chap. The Honorary Graduand was Professor Sir Alec Jeffreys FRS, inventor of genetic fingerprinting and saviour of TV cop show script-writers. He gave a very good speech, actually. Anyone who accidentally manufactures HCN gas with their home chemistry set at age 12 is alright in my book!

Thanks Bryn for taking the picture.

Friday, December 01, 2006

AAMAS '07 Industrial Track

I'm a reviewer on the programme committee for the industrial track of AAMAS'07. The Senior Programme Committee have just announced that they have extended the deadline for submissions to December 8th, so still time to get those papers in! The venue is, well, see for yourselves:

picture of Hawaii, venue for AAMAS 07

Unfortunately, I've been too busy on another big project (soon to be finished) to do much agent development this year. Resolution for 2007: fix!

Monday, November 20, 2006

Robotic introspection

One of the topics I got very interested in while I was working on my PhD, though I didn't have time to pursue it very far, was agent introspection. How does a computing device notice things about itself, and, especially, detect deviations from a stable norm. I came across this problem when trying to devise non-brittle strategies for detecting and repairing failed interactions and dialogue turns, but it features quite prominently in any attempt to visualise how a truly autonomous agent would react. It's a topic I'd like to get back to working on, some day soon. Anyway, via Ray Kurzweil's newsletter, Lipson, Zykov and Bongard of universities Cornell and Vermont have created a robot that uses a vision system to build a concept of itself, including its normal range of motion, and then detect deviations from that norm and figure out compensation actions. For example, if one of the legs is shortened, the robot apparently learns, by itself, to limp. That is very, very impressive.

One of the marks, it is said, of a really good program is that when you look at what it does you can't say, immediately, "I know how they did that". Kudos to the team. I'll have to go read their paper now! AI, robotics, introspection,

Thursday, November 02, 2006

Jena tip: renaming RDF resources

We rather frequently get asked, on the Jena support list, something along the lines of "how do I rename an RDF resource?" or "how do I change the local-name of an OWL class?" or something similar. We get asked this often enough that we made a FAQ entry about it. That FAQ entry is fairly terse, because many users are fine to read the Javadoc, figure out how to apply the renameResource in their application, and off they go. For inexperienced Jena users, especially those without much Java experience either, a more detailed explanation is probably in order however.

For a variety of good design reasons, the Java class representing RDF resources in Jena, com.hp.hpl.jena.rdf.model.Resource is immutable - once created, it can never be changed. This means that there is not, and never will be, a method on Resource called something like rename(). This also means that all the classes that extend Resource, such as OntClass also can't be renamed in place. Suppose you are developing a semantic web editor. It's a perfectly natural requirement to change the name of a class as your understanding of the ontology grows. So, what to do?

It's important to note that the Resource object in a Jena model doesn't contain any state information. All the RDF object state is held in the RDF Model) (and, internally within the Model in a Graph object, but that's another topic). Resource, therefore, isn't really an information container but an accessor for information that is contained in the model. Moreover, Model itself doesn't contain Resource's per se: Model only contains RDF subject-predicate-object triples. A resource is in a model only to the extent that it appears in one of the triples in that model. This, then, gives us a way to achieve the effect of renaming a resource: to rename a resource from A0 to A1, simply replace every triple that contains A0 with one that contains A1. A quick example should make this clearer. Suppose we have the following information in the model (I'm using N3/Turtle syntax for compactness; it would be exactly the same with RDF/XML):

@prefix owl: <>.
@prefix eg: <>.

      rdf:type owl:Class
      rdf:type owl:Class
    ; rdfs:subClassOf eg:Device
      rdf:type owl:Class
    ; rdfs:subClassOf eg:MobilePhone

This model fragment contains five triples. To rename eg:MobilePhone to eg:CellPhone, we need to replace three of those triples, to produce the following model:

      rdf:type owl:Class
      rdf:type owl:Class
    ; rdfs:subClassOf eg:Device
      rdf:type owl:Class
    ; rdfs:subClassOf eg:CellPhone

This can be done manually by the calling Jena program, and that's fine. However, we do provide a helper method to assist with this, in ResourceUtils. The following code fragment shows one way of achieving the above example. Assume that m is a Java variable containing an already-loaded Jena model which, in turn, contains the resource we want to rename:

Model m = .... ; // source model, already initialised

String NS = "";
Resource cls = m.getResource( NS + "MobilePhone" );

// this is the step where eg:MobilePhone becomes eg:CellPhone
ResourceUtils.renameResource( cls, NS + "CellPhone" );

// it's important that we forget about, or re-assign,
// cls, since as yet it's not pointing to the 
// revised definition
cls = m.getResource( NS + "CellPhone" );

// now we can, for example, look at the updated definition
for (StmtIterator i = cls.listProperties(); i.hasNext();) {
    System.out.println( " m contains triple " + i.nextStatement() );

And that's all there is to it. jena, semanticweb, java,

Tuesday, October 10, 2006

Eclipse 3.2.1 busy-wait problem

Once upon a time I used to do nearly all of my development work in xemacs. Now, I do nearly all of my development work in Eclipse. So, it's a real pain when Eclipse starts playing up. I have to say that Eclipse is a great product, and it doesn't fail very often. However, since the latest batch of auto-updates (to version 3.2.1), I've had four crashes where Eclipse has gone into a 100%-cpu "busy-wait" state, and I've had to kill the process to get it back. This is on two different computers. I have a number of Eclipse plugins (who doesn't?), and ijuma on #eclipse in IRC suggested that I try increasing the MaxPermSize setting for my JVM, since it can, apparently, cause problems if the VM runs out of a certain kind of heap space. So, my eclipse.ini now looks like:


Since it's an intermittent failure, I can't say for sure yet whether this tweak has solved my problem, but ijuma seemed to think that increasing the MaxPermSize was a good idea anyway. I'll post again if I have some further data.

Monday, September 25, 2006

Embedding one JavaCC grammar in another

As part of planning the next version of Nuin, I've been looking at embedding one JavaCC grammar in another, in this case SPARQL into my agent script parser. I've written up the results in a short article. java, javacc, software-development, sparql.

Thursday, September 14, 2006

Fedora Core 5 install woes

I've done a few installations of FC5 now, including one laptop. Most have gone smoothly. The most recent machine, though, had a NVidia graphics card (FX5200 to be precise). Woe. The installer hangs when trying to format the disk partitions. Googling, it turns out to be a fairly common problem, but none of the suggested remedies (using text mode, etc) worked for me. In the end, I installed FC4 on the machine, since the FC4 version of Anaconda is less smart about detecting and optimising for the graphics subsystem. FC4 installed fine, so then I tried to use the FC5 disks to upgrade the installation. More woe. This time, Anaconda itself crashed with a python script error, also a problem that other people have seen. In the end, doing the upgrade over the internet using yum went very smoothly, following these instructions. The end-to-end process, though, was frustrating and slow.

Fedora is a good distro: I tried, and abandoned, Ubuntu in favour of Fedora Core. But their report card should be marked "must try harder" on the QC for the install process.

Wednesday, September 06, 2006

Joel on recruitment

The perenially entertaining and makes-you-think (in a good way) Joel has a new service on his site: a rather selective jobs listing, with, most usefully, an RSS feed. See Joel's explanation of why the service is selective and how to get the great developers. Personally, I'm not actually looking for a job at the moment. But I have got Joel's new feed in my RSS reader. Why? Because it's actualy quite an interesting way to get a view on the zeitgeist in the software industry. Perhaps I'll get bored of it after a while (especially if the number of postings grows too large), but for now: cool!

Tuesday, August 22, 2006

Remote access to a PostgesQL server

A quick note to myself in case I have to do this again, or in case it helps someone else get through the same task more quickly. Gotchas to watch out for when enabling access to a PostgresQL server from another machine:

  • The line in postgres.conf that says listen_addresses = 'localhost' needs to be changed to the actual hostname of the server, because localhost only enables listening on the loopback interface (i.e. the server can listen to connections from clients running locally, but not on any other machines)
  • To allow connection from any host on the same network, add a line to the pg_hba.conf saying: host all all password
    This says: allow access from any (remote) host, where any user (first all) can access any database (second all) if identified by password, and the IP address of the remote host begins 16. - counting the first eight (from the /8) binary digits as significant. The non-matched digits have to be zero, hence the 0.0.0 at the end.

Friday, August 18, 2006

Oracle error messages: could try harder

I've not been able to find a way of loading a complete Oracle PL/SQL script via a JDBC connection (i.e. analogous to using @ in sqlplus). So, least worst thing to do (it's a small script) is to load the script into a string, and execute that string with Statement.executeUpdate. One. Statement. At. A. Time. Sigh.

Nevertheless, things are going OK until I take a line of my script which works when invoked by @, but from JDBC I see:

java.sql.SQLException: ORA-00911: invalid character
 at oracle.jdbc.driver.DatabaseError.throwSqlException(
 at oracle.jdbc.driver.T4CTTIoer.processError(
 at oracle.jdbc.driver.T4CTTIoer.processError(
 at oracle.jdbc.driver.T4C8Oall.receive(

"ORA-00911 Invalid character." That's all you get. Which character for pity's sake? And whereabouts in the string? Surely that's not asking too much?

The answer, by the way, is that the trailing semi-colon that statements require in sqlplus is an error when executing statements via JDBC.

Monday, July 31, 2006

IJCAI-07 - Workshop Programme

I hadn't realised that IJCAI was coming around again so soon. IJCAI-07 is being held in January in Hyderabad, I guess because it will be too hot there any other time of year! The main conference CFP has already closed, but the workshop programme has been announced, with some interesting topics (though my buzzword filter almost exploded when I read of agent technologies for web 2.0). Submission dates for all of these workshops appear to be September 25th, so now's the time to get writing.

Friday, June 30, 2006

Still fightin'

Most of my working life at the moment is taken up with semantic web research (well, development rather than research just at the moment). Even, so once in a while I get queries about my agent research, as happened today. By pure random coincidence, I also today came across a link to the quitessentially-timewasting-but-nonetheless-funny GoogleFight. So, the obvious question is: "Agent technology" vs. "semantic web" smackdown? The agents still have it, just:

Monday, May 22, 2006

No falls, but one submission

thesis.doc document statisticsTonight I've finally ... finally ... finished the version of my PhD thesis that will go to be examined. I am about to submit! After, ahem, more years that I care to admit. It seems unreal. I always imagined that when I got to this stage I'd feel, well if not elated at least a certain sense of relief. Sadly, right now it just feels like ticking off one checkbox from a never-ending todo list. Maybe I'll feel better when the defence is out of the way ... assuming I pass of course! Maybe I should just go and get drunk.

Thursday, April 13, 2006

Busy, busy, busy

Lots going on at the moment, hence little time to post blog entries! Ah well. The Jena User Conference programme is now up. Should be a good couple of days. We're going to try a couple of self organising sessions to run alongside parts of the main programme. It's an experiment, and I'm looking forward to seeing the results. Apart from organising the JUC, I'm also very busy on a big project at the moment. This entails, among other things, learning much more about J2EE and web services than I ever have before. One brain hardly seems enough to hold all the complex details of the platform, let alone start solving the user problem! However, one particularly positive aspect of the project is that we're working with team members outside the usual suspects at HPLabs, which has been very stimulating.

Monday, March 27, 2006

I have a question ...

I have fond memories of growing of watching the TV show Spitting Image. It was a wonderful satire show that, at its best, was poisonously apt and brilliantly, side-splittingly funny. I'm reminded of this show whenever I'm in a situation where I feel I ought to ask a question, and don't know quite what to say. In 1986, Michael Heseltine resigned from Margaret Thatcher's government over the Westland Affair, essentially an arms-deal-gone-wrong. Heseltine and Thatcher disagreed, rather publicly, and Heseltine resigned. I vividly remember scenes from the Spitting Image treatment of this event. Heseltine is shown at a news conference in front of a crowd of journalists (all, incidentally, pigs in trenchcoats and trilby hats), as he explains that his decision was quite unpremeditated and spontaneous ... as explained in his new book, which he then pulls out from under the desk. Thatcher is shown, in parliament, I think, saying "I have here Mr Heseltine's resignation letter. It reads: 'Dear Michael you're fired, be out of the building in ten minutes'." And finally, I remember the chaotic Q&A session with the journalists, one of whom asks "Mr Heseltine! Mr Heseltine! I don't have a question, but I want my editor to see me so I can claim expenses."

Saturday, March 18, 2006

Which sci-fi crew?

From Richard's blog, I discovered the which sci-fi crew would you fit in with? quiz. My results:

Babylon 5 (Babylon 5)


Nebuchadnezzar (The Matrix)


Deep Space Nine (Star Trek)


Andromeda Ascendant (Andromeda)


Moya (Farscape)


Enterprise D (Star Trek)


Serenity (Firefly)


Millennium Falcon (Star Wars)


FBI's X-Files Division (The X-Files)


Galactica (Battlestar: Galactica)


SG-1 (Stargate)


Bebop (Cowboy Bebop)


Your Ultimate Sci-Fi Profile II: which sci-fi crew would you best fit in? (pics)
created with

Very different results to Richard's. I know Richard, and I wouldn't said we were that different - I wonder what the key factors were? Fun stuff anyway. Makes me think I should some day get around to watching Bab5 all the way through - I've seen about three episodes in my life. It must be available on DVD by now.

Wednesday, March 08, 2006

Semantic Technology Conference - 2

Ben Adida's talk mentioned, a meta-search tool for tagged items. Looks interesting, this is a note to myself to look at it some more.

Tuesday, March 07, 2006

Semantic Technology Conference - 1

I'm in San Jose at the Semantic Technology Conference. Jim Hendler and Ora Lassila's keynote address this morning: reviewing five years of progress on the semantic web. Not much to disagree with. One of Ora's closing slides mentioned lack of progress in agents (see picture). It's a little hard to read, but the last line is "little progress on agents". The commentary from Jim and Ora was fairly muted about agents, even though they claimed (particularly Ora) that having agents act on the user's behalf was one of their key motivations originally. My view: a lot of the rhetoric about where we really want to go with the semantic web depends on a representation of intention, and agents are exactly the expression of intention in computational form.

Thursday, February 23, 2006

Installing ProCite 5 with MS Word 2003

I recently got myself a very nice Compaq nw8240 laptop from work. Of course, that means the tedious process of reinstalling everything. One of the tools I use a lot with MS Word documents is ProCite 5. I still prefer ProCite over the other citation tools I've tried, even though it hasn't been updated for ages. Having reinstalled the original ProCite CD on my new machine, I then applied the Office XP/WP 10 Patch. Even that, however, wasn't enough to get cite while you write (cwyw) working. However, it's an easy step from there: just copy pc5wd32.wll and from the cwyw sub-directory of the ProCite install dir (defaults to c:\program files\ProCite5) to %APPDATA%\Microsoft\Word\STARTUP. APPDATA is user-adjustable location, but the default is c:\documents and settings\<yourlogin>\Application Data. Sorted. procite, ms-word

Friday, February 17, 2006

Jena tip: optimising database load times

Loading lots of data into a persistent Jena model can often take quite a bit of time. There are, however, some tips for speeding things up.

Let's get the baseline established first. Assume that our data source is encoded in RDF/XML, and the load routine is loadData. I generally use a couple of helper methods make things a bit smoother in my database code. In particular, I use a short name or alias for each database I'm working with, and store the connection URI, model name, user name etc in a table (usually in code, but it could be loaded from a file). I'm not going to dwell on this pattern in this blog entry, since it's not the point of the article. Suffice to say that getDBUrl returns the connection URL for the database (i.e. the JDBC URL) and so on for the other methods.

Given that, the primary method here is loadData, which opens the named model from the database, then reads in the contents of a file or URI. source is the file name or URL pointing to the input document:

protected void loadData( String dbAlias, String source ) {
    ModelMaker maker =  getRDBModelMaker( dbAlias );
    ModelRDB model = (ModelRDB) maker.openModel( getDBModelName( dbAlias ) );
    FileManager.get().readModel( model, source );

private ModelMaker getRDBModelMaker( String dbAlias ) {
    return ModelFactory.createModelRDBMaker( getConnection( dbAlias ) );

private IDBConnection getConnection( String dbAlias ) {
    try {
        Class.forName( DBDRIVER_CLASS );
    catch (ClassNotFoundException e) {
        throw new RuntimeException( "Failed to load DB driver " + DBDRIVER_CLASS, e );
    return new DBConnection( getDBUrl( dbAlias ),
                             getDBUserName( dbAlias ),
                             getDBPassword( dbAlias ),
                             DB );

This works, but given any significant amount of data to read in it will usually be very slow. The first tweak is always to do the work inside a transaction. This won't hurt if the underlying DB engine doesn't handle transactions, but will greatly help if it does:

protected void loadData( String dbAlias, String source ) {
    ModelMaker maker =  getRDBModelMaker( dbAlias );
    ModelRDB model = (ModelRDB) maker.openModel( getDBModelName( dbAlias ) );
    FileManager.get().readModel( model, source );

In practice, there should be a try/catch block there to roll back the transaction if an exception occurs, but I'm leaving out clutter for educational purposes!

This probably still isn't fast enough though. One reason is that, to fulfill the Jena model contract, the database driver checks that there are no duplicate triples as the data is read in. This requires testing for the existence of the statement prior to inserting it in the triple table. Clearly this is going to be a lot of work for large sets of triples. It's possible to turn off duplicate checking:

protected void loadData( String dbAlias, String source ) {
    ModelMaker maker =  getRDBModelMaker( dbAlias );
    ModelRDB model = (ModelRDB) maker.openModel( getDBModelName( dbAlias ) );
    model.setDoDuplicateCheck( false );
    FileManager.get().readModel( model, source );

The problem with this is that it moves the responsibility for ensuring that there are no duplicates from the db driver to the calling code. Now, it may well be that this is known from the context: the data may be generated in a way that ensures that it's free of duplicates. In which case, no problem. But what if that's not certain? One solution is to scrub the data externally, using commonly available tools on Unix (or Cygwin on Windows).

First we migrate the data to the n-triple format. N-triple is a horrible format for the human reader to read, but ideal for machine processing: every triple is on one line, and there is no structure to the file. This means, for example, that cat can be used to join multiple documents together, something that can't be done with the RDF/XML or N3 formats. Jena provides a command line utility for converting between formats: rdfcat. Let's take a simple example. Here's a mini OWL file:

<?xml version="1.0"?>
<rdf:RDF xmlns:rdf=""
  <owl:Ontology rdf:about="">
    <owl:imports rdf:resource="" />
  <owl:Class rdf:ID="AClass" />

Which we then convert to n-triples:

[data]$ java jena.rdfcat -out ntriple a.owl
<> <> <> .
<> <> <> .
<> <> <> .

Assume that we have a collection of n-triple files (a.nt, b.nt, etc) and we want to remove all of the duplicate triples. Using common Unix utilities, this can be done as:

cat a.nt b.nt c.nt | sort -k 1 | uniq > nodups.nt

The sort utility sorts the input lines into lexical order, while -k 1 tells it to use the entire line not just the first field (sort splits lines into fields, using whitespace as a separator). uniq condenses adjacent duplicate lines into one, which is where the duplicate triple removal happens.

Finally, what do we need to change in the original program to load n-triples instead of RDF/XML or OWL files? Happily, nothing! The Jena FileManager uses the extension of a file to guess the content encoding. *.nt triggers the n-triple parser, so since we used that convention in naming the file we're done.

On a recent project, loading a million-triple model into a MySQL 4 database took me just about 10 minutes using these tips, while before optimisation it was taking hours.

Updated: oops - forgot to tag this entry ... , , , .

technorati: , , , .

Friday, January 20, 2006

From the ministry of confusing error messages

When JBoss says this:

Caused by: java.lang.IllegalArgumentException: URL cannot be null
at org.jboss.webservice.metadata.jaxrpcmapping.JavaWsdlMappingFactory.parse(

it may mean (well, it did in my case) that you've accidentally mis-typed the name of jaxrpc-mapping.xml. I guess the error arises whenever files referenced by webservices.xml don't exactly match the actual file names. A reasonable problem, but rather an opaque error message.

Tutorials good, understanding better

I'm working with web-services and with J2EE at present, two technologies that have a bewildering collection of components and configuration options (and configuration descriptors) by themselves. The Cartesian product of this complexity is, well, you can imagine. All is not lost though: various good souls have put together on-line tutorials (example) that show you, step-by-step, how to deploy a particular flavour of web service (say doc/lit) on a particular flavour of J2EE container (say, JBoss). Well and good, and a collective "thank-you" to all of those people. The problem I find, however, is that the tutorial gets a result (you can deploy the sample order-processing service and invoke it), but they don't help you understand the underlying technology. "Do this in webservices.xml" is all very well if you want to do the same thing in your actual code, but if you need something a bit different there's not enough depth in a walk-this-way tutorial. Actually, a reference to the specs would probably suffice in most cases. So, case in point (should I ever need it again!), webservices.xml is actually defined in JSR-000109.

Friday, January 06, 2006

JBoss Eclipse IDE - XDoclet not working

The time has come for me to get serious about learning J2EE. I've started on a new project, and the code is going to run in JBoss. So naturally, I've spent some time looking at options for tooling up Eclipse to cope with the demands of JBoss developing. First port of call was the Eclipse Web Tools Platform (WTP), which has just gone to release 1.0. There's no update URL yet (being fixed), so you have to download a mega-archive with Eclipse 3.1.1 and all the plugins. Then you have to spend time getting that to look like your old version of Eclipse, though exporting and and re-importing preferences helps. You still have to re-install your other plugins though (e.g. the very handy AnyEdit). Sigh. However, having installed it, there's a dearth of decent documentation. I tried working through the tutorial, but it was written for version 0.7 and most of the tools have changed. Still, I got to the point where I had a deployable JSP, published it to the running server instance and .... nothing. Deploying a WAR directly to JBoss with Ant works fine, but publishing from WTP fails silently. Since I don't have a clear conceptual model of what WTP is supposed to do - and the documentation doesn't help - I can't diagnose what went wrong. Spent some time fiddling, posted to their forum, then gave up around 01:30. If I ever find out what should have happened, I'll post a follow-up.

In the meantime, I've switched to JBoss's own JBoss Eclipse IDE. This is available as an Eclipse update URL. So, did that, first getting a clean new install of Eclipse 3.1.1. Fired that up, went to look at the HTML editor, and I get a bunch of spurious warnings (e.g. unrecognised HTML entities on properly escaped URL strings). This looks like cross-talk with previously installed HTML editors, so I removed my ~/workspace directory (where a lot of the local Eclipse settings are stored). Try again, this time all is working well. Spend more time getting Eclipse working how I like it (maybe I should educate myself to like the defaults, but currently I dont!). Fine. Now work through the tutorial from the JBoss web site. All seems well, except that when I run the XDoclet task, nothing visible happens. Checking the error log, there's a message:

Error logged from Ant UI: Accept timed out

Like Bug 102463, I fixed this by resetting ANT HOME in the preferences dialog, which was pointing (bizarrely) to /usr/share/eclipse. I reset it to point to the org.apache.ant_1.6.5 subdirectory in the plugins directory of my local copy of Eclipse. After restarting Eclipse, XDoclet ran correctly.

Why is this stuff so painful? , , , .