Sunday, December 11, 2005

Comparison of upper ontologies

Interesting paper comparing approaches to upper ontologies from the MITRE Corporation, via the Ontolog forum. The authors' focus is supposedly on US Government applications, but I think that the conclusions or analysis would apply to any large ontology-based information system. Particularly useful high-level summary of different modelling choices in building an upper ontology, though some of it assumes familiarity with the terminology (e.g. 3D vs 4D ontologies are mentioned but not explained). Paper abstract and pdf version.

Tuesday, November 15, 2005

JPackage - solving missing jakarta-commons-transaction

I use the excellent JPackage to get RPM's of the Java libraries I use whenever possible. Together with yum, it's a very easy way of making sure I have all the appropriate dependencies in place, and tracking updates as they occur. I'm just about to start on a new project that requires us to use JBoss, so I thought that I'd get it from JPackage and make sure it stays up-to-date. Simple? Alas, no. Installing jboss4-system, I got:

--> Running transaction check
--> Processing Dependency: jakarta-commons-transaction for package: jakarta-slide-server
--> Processing Dependency: jakarta-commons-transaction for package: jakarta-slide-webdavclient
--> Finished Dependency Resolution
Error: Missing Dependency: jakarta-commons-transaction is needed by package jakarta-slide-server
Error: Missing Dependency: jakarta-commons-transaction is needed by package jakarta-slide-webdavclient

And indeed jakarta-commons-transaction is not available for download from JPackage. Reading the mailing list archives, it seems it's in development still (so I'm not sure why it appears as a dependency in released packages - oh well). Fortunately, development snapshots of the JPackage RPM's are available. Having manually downloaded and installed the errant transactions RPM, all is now well. Which just leaves me with JBoss itself to learn. Sigh!

del.icio.us: , , ,

Monday, November 14, 2005

Fixing non-UTF8 characters in file names

I have ripped some of the CD's I own to mp3 files, which I play on both Linux and Windows. It happens that many of the artists I like (such as Bj√∂rk) have non-ascii characters in their names and hence in the corresponding filenames. On Windows, these tend to be encoded in the platform default IS0-8859-1 encoding (at least on a UK-locale Windows install). When these files are copied across to my FC3 system, Nautilus complains of illegal UTF8 characters, and some programs — such as the otherwise excellent xmms — choke. I wouldn't be posting this unless I'd found a solution, and I have in the form of convmv, "convert filenames to utf8 or any other charset". And it does too. Recommended.

del.icio.us: , , ,

Monday, November 07, 2005

Virtual presence at ISWC

I couldn't attend ISWC this year (though there are quite a few people from our group going). Among the many interesting things happening at the conference is the Semantic Desktop Workshop. Happily, it's being logged in real-time on IRC (channel #swig at freenode), logs here: Semantic Web Interest Group IRC Chat Logs for 2005-11-06.

Update: there's also a real-time log of the Semantic Web Interaction Workshop on #swiw on Freenode. Unclear yet if or when the log will appear on the web. Also, I moved this posting to my personal blog where it should have been, from the Jena announce blog where it shouldn't. D'oh!

Sunday, October 23, 2005

On design

I always enjoy it when I have the chance to work with designers. Just now we're working on a small-but-fun collaboration to design a new logo and poster for the forthcoming Jena user-conference. What I like is the way that designers approach problems. Engineers, even ones with user-centred principles in mind, tend to approach product design from a starting-point of "what should the product do?" Designers, in my experience anyway, start somewhere around "what should it feel like to use this product?" There's nothing right or wrong about either starting point, but it's fantastically refreshing to view your world from someone else's perspective. Either way round, good design comes down, in my view, to elegance, and there's always more to learn about achieving that elusive quality. The more inspiration the better.

I've been meaning to write an entry along these lines for some time. I'm reminded to do so now by a little bit of synchonicity. I've recently added to my blog aggregator core77.com's design blog, and John Maeda's simplicity blog. John has a collection of rules of simplicity (e.g. the first law) which are quite thought provoking - even if I don't entirely agree with all of them. And today, via core77, comes Intelligent Design, a delightful parody by Paul Rudnick. It brightened my Sunday morning anyway!

Thursday, October 13, 2005

New camera

So, on holiday this year (back in early August), I dropped my trusty old Nikon Coolpix 990 on a big rock ... that rock being Mt. Snowdon. Ever since then it makes a funny grinding noise whenever it should be focussing. I liked the Coolpix, but it suffered from (a) being large and heavy, and (b) eating batteries like bingo. So, rather than pay for a repair, I decided to invest in a new camera. Choice: an HP R817. I've only had it a few days, but first impressions are really nice. It's tiny compared to the Nikon, takes very nice pictures and in particular handles variable light much better than my old camera. The R817 has some interesting looking modes to play with, but I've not really tried them out much yet. With a USB cable and the sharing mode set to 'Disk Drive' (on the setup menu) it appeared instantly, with no messing around, on my Linux system. Ergonomically it's much easier than the Coolpix too, though I dare say Nikon have improved their product since the 990 came out.

Cool new gadget bliss.

Friday, September 30, 2005

Leigh on Using Jena in an Application Server

A very encouraging sign of a maturing technology is when the community grows to include experts outside of the dev team. There's a growing number of people outside our team who answer questions on the Jena list, and increasingly who write great articles on aspects of using Jena. In this case, Leigh Dodds describing using Jena in an application server such as a JSP container. Great stuff!

Sunday, September 25, 2005

JavaScript programming - resource notes

I've been working on a small project using JavaScript, and thought it would be worth recording some of the useful resources I've found:

  • The Venkman JavaScript debugger is a plugin for FireFox that allows you to single-step through scripts, set breakpoints, inspect variables, etc. Slightly quirky at times, but works pretty well. Much better than using lots of alert() calls as print statements.
  • The web tools platform (WTP) project is a set of plugins for Eclipse that extend Eclipse to handle web content - (X)HTML, JavaScript, CSS, etc. The JavaScript editor includes syntax colouring, structured browsing, and name completion &emdash; but not validation as far as I can discover. The WTP plugins can be installed using the Eclipse automatic installer, which worked pretty well. Incidentally, the WTP HTML editor wasn't very impressive, I use the Amateras html editor instead. However, the WTP does include a handy CSS editor, which again handles syntax colouring and completion.
  • David Flanagan's JavaScript: the definitive guide has been an invaluable companion. Highly recommended.

One of the things I've been using JavaScript for is client-side RDF handling. For RDF parsing and basic query, I've been using Jim Ley's JavaScript RDF parser. It's pretty basic, but functional. Found one bug with namespace handling, which I've emailed to Jim. The triples are not indexed in any way, so any moderate querying of the RDF model will get pretty inefficient pretty quickly. However, a handy trick is to author compact RDF, then use the Jena tools, if necessary with custom rules, to generate the deductive closure of the base model, then save that as a file. This makes explicit many of the relations that otherwise would require query to traverse. Some caution needs to be used, since making the source model too big would also be counter-productive for efficiency. Incidentally, a useful side-effect of doing this is that saving the model in RDF/XML format, not RDF/XML-ABBREV, means that rdf:parseType="Collection" is re-written in first/rest/nil format ... which is handy since Jim's parser doesn't handle parseType Collection.

Thursday, September 08, 2005

New Kate! Woohoo!

I was musing to myself a couple of days ago why some music artists just seem to stop producing material. What happens to their creative drive? In particular, I was wondering why Kate Bush has stopped recording. Well, by an amazing coincidence I just found out that she has a new album ("Aerial") due out on Nov 7th, her first for twelve years. Excellent! And it's a double CD. Doubly excellent! I heard the news via a link from 101cd.com. Can't wait.

Monday, September 05, 2005

Jena question: anyone using minor OntDocumentManager features?

We're gearing up for release 2.3 of Jena, and I'm doing some refactoring on the OntDocumentManager. Specifically, I'd like to deprecate, and in later releases remove, support in the ODM for declaring default prefixes and for declaring the ontology language of a document. So this blog entry (and emails to the Jena mail lists jena-dev and jena developers) is a call for people to howl if the deprecation of these features would impact their projects.

del.icio.us: , , .

Wednesday, August 31, 2005

Jena tip: using resources from different models

This question has been asked on the Jena support list a few times recently, so I guess it would be handy to have an answer that Google can find easily.

I have a quick question regarding the use of constant Property and Resource objects. If I use schemagen to generate constants, is it then safe to use a Property (like petName) created from m_model in a model that is created somewhere else? I guess this isn't really a schemagen question as much as a general Jena question: Is it safe to create a Property/Resource in one model, and then use them to query a different model (obviously conforming to the same ontology)?

Answer: yes, it's perfectly safe provided you understand what's going on (so that you don't get unexpected results).

The trick is to realise that each resource (and property, since properties are resources) has a model to which it belongs. This may be null. If you call a method on a resource to add a value to that resource, the statement that is created will be created in the resource's model. The only reason that resources have model pointers is to allow this style of convenience programming. So for example, you can have:

Resource r = m1.createResource();
r.addProperty( foo, "bar" )
 .addProperty( fu, "baz" );

Which will add two statements with r as subject to model m1. That's the convenience of the convenience API. There's no reason why you can't do:

m2.add( r, foo, bar );

even if r, foo and bar are defined in different models than m2. In particular, if you do the following:

m2.add( r, foo, bar )
  .getSubject()
  .getModel()

you'll get m2 as a result, independently of which model r was defined in.

You may also want to note that Jena's contract for resource equality is that two resources are .equals() if they have the same URI or the same bNode identifier. The models to which they are bound are not part of the equality contract.

del.icio.us: , .

Thursday, August 18, 2005

Jena tip: ont model specs

I was asked a question on jena-dev that I think bears repeating, since it's fairly frequently asked.

so, my question is, in the case where we need inference capability, if all we need to do is just
OntModel ontModel = ModelFactory.createOntologyModel( OntModelSpec.OWL_MEM_MICRO_RULE_INF );
then why does Jena still provide things like ReasonerRegistry.getOWLReasoner(), ModelFactory.createInfModel , OntModelSpec, etc.?

OK, here's the simple explanation: the first form is exactly the same as the second form, but a bit simpler to use.

Longer explanation: an OntModel has various parameters that allow it to be more flexible when handling ontologies. OntModel can handle OWL, OWL DL, OWL Lite, DAML+OIL and RDFS ontologies. So when you say OntModel.createClass(), it has to know which kind of class (OWL, DAML, etc) to create. This information is conveyed through the OntModelSpec, which contains a language profile. Similarly, an OntModel can have an attached reasoner, or none at all. Which reasoner is attached to the OntModel is also expressed in the OntModelSpec.

A user can, if desired, create an OntModelSpec entirely from scratch, setting each field (profile, reasoner, etc) to the desired value. However, there are some common combinations (e.g. an in-memory model, using OWL, with the MICRO rule reasoner) that we know are going to be re-used many times, so we've taken the trouble to pre-define some constants for commonly used OntModelSpecs. These have suggestive names, so the one I just described is OWL_MEM_MICRO_RULE_INF. But all it is is a pre-defined OntModelSpec with the fields filled-in.

Some fields cannot be filled-in in advance. For example, if the model is in a database, not in memory, then the db type, user-name and password can't be known in advance. Similarly, external reasoners can be at diffenent access URL's. However, in these cases it's still nice to be able to re-use some of the pre-defined common variants, and just adapt them. So you can create a new OntModelSpec using a pre-defined one as a template:

OntModelSpec newSpec = new OntModelSpec( OntModelSpec.OWL_MEM_RULE_INF );

and then just tweak the bits you want to change.

Likewise, the builtin reasoners have a number of configuration options, including which rulesets to use, which rule engine to use, etc. Again, we make it easy to use the common patterns, but provide access to the nuts-and-bolts for those that need them

under what situation do we ever need to use these guys?

Useful heuristic: you probably don't need to use them. When this heuristic breaks (all heuristics break eventually), you'll know that it's time to read up on the details of the low-level interfaces. Until then, don't worry about it!

del.icio.us: , .

Sunday, July 24, 2005

Off to AAMAS

Tomorrow I'll be attending two days of workshops at AAMAS. Agent-Mediated Knowledge Management on Monday, Service Oriented Computing and Agent-Based Engineering on Tuesday. Sadly I can't stay for the main conference due to conflicting family engagements. AAMAS is usually a fun conference, and it's nice that it's located practically on the doorstep this year. There's a direct flight from Bristol to Schiphol on KLM, then a 40 minute train journey to Utrecht - assuming my lack of Dutch language skills doesn't get me lost!

Friday, July 22, 2005

Java Universal Network/Graph Framework

Christoph Kiefer mentioned on jena-dev a Java graph layout package I've not come across before: JUNG - Java Universal Network/Graph Framework. The example screenshots look pretty good and the jws examples all worked (good sign!). I've been looking for a graph layout package to use with RDF and OWL data sources. JUNG could be just the ticket.

Tuesday, July 19, 2005

FAQ: ontologies in business

A colleague (thanks Dave!) spotted a rather nicely done FAQ on ontologies in a business context. Not sure that I agree with everything it says, but at least it tries to approach what is often a rather abstract, technocratic subject with a down-to-earth sensibility.

Thursday, July 14, 2005

Strategic retreat

Disappointingly, I'm giving up (for now) on continuing writing my thesis using open-source tools and DocBook. I still think that there are good reasons for having the source code to my thesis text encoded in an open standard language, and I do like the idea of being able to generate both pdf and xhtml from one source. The basic structures of DocBook - books, chapters, sections, paragraphs, lists, etc, all work fine. Where I ran into problems was with:

  • citations,
  • formulae and equations, and (to a lesser extent)
  • diagrams
Sure there are open-source tools that help with all of these requirements, but they just don't work well enough. Over the past many weeks, I've spent more time wrestling with getting software working, getting the right entries in my XML catalog, tweaking php files and MySQL tables, and grokking complex XSLT stylesheets than I have actually writing. So much so that it's getting quite desperate now, deadline-wise! I have to say, as well, that the default output from the DocBook stylesheets is pretty dull, so I was expecting some heavy-duty xslt hacking to get a reasonable end result anyway.

So, my plan is to stage a strategic retreat and use OfficeXP for the time being, which as drawing tools and equation editor built-in, and integrates nicely with EndNote and ProCite. Sure, professional bibliographers complain about EndNote, and doubtless it has its flaws. But right now, it's an attractive proposition for not being as much in my way as the existing crop of open-source tools.

I characterise this as a retreat, not surrender, on the basis that I expect OpenOffice to catch up with Word for equations and citations eventually. I have tried OpenOffice for this project, but it the 2.0 beta kept crashing on me and the 1.1.3 version is too primitive. In time though, I expect to be able to convert my document to an open XML standard, either DocBook or another comparable format.

I do thank those who've helped me get open-source programs working (or tried to) over the past couple of months. It's not that I don't appreciate the work and commitment that goes into open-source development, it's just that 80% working isn't enough that I can work with the tool rather than on the tool.

Monday, July 11, 2005

Fixing yum

I have two Fedora Core 3 systems, one in the office and one at home. Yum always works very smoothly on my home system, but at work I'm frequently bugged with:

[Errno -1] Metadata file does not match checksum
leading to
[Errno 256] No more mirrors to try
Googling around, this seems to be a caching issue. Various HTTP caches (on our network or between us and the backbone, I'm not sure) are caching either the metadata or the checksum file, leading to a mismatch. I've tried various suggested solutions for this, with no success so far. However, my colleague Steve Pearson suggested a fix which works in our office, since we have a SOCKS proxy available. I simply:
  • installed the dante socks client rpm
  • removed my http_proxy environment variable
  • run socksify yum update, or whatever yum command I need.
The reasoning is that the SOCKS protocol doesn't, I'm told, use any caches. Anyway, it works for me and hopefully may be of some use to other people faced with the same problem.

Wednesday, July 06, 2005

London olympics: should be a great event

Big congratulations to Seb Coe and the London Olympics bid team on winning the competition to stage the 2012 Olympics. My friends who went to watch the 2004 Olympics in Athens had a wonderful time, so I'm looking forward to catching some of the events live in seven years time. Wonder if it's too soon to book hotel rooms now?!

Updated: changed the title after the awful events on Thursday.

Monday, July 04, 2005

Space scientists have all the fun

NASA consistently does good work in AI. I've blogged before about the Earth Observing Satellite project. There was autononmous fault diagnosis on Deep Space One, and the project to organise ISS astronaut's daily schedules. I've heard from my friend Simon Thompson that there's a very impressive NASA paper at the AAMAS industrial track this year. Now they're installing speech interfaces on the ISS, as astronauts have trouble holding a printed manual and performing complex maintenance tasks while floating about in microgravity (no kidding!). Cool stuff.

Thursday, June 23, 2005

History from 2014

Very interesting short flash movie about future media spotted by Tim Finin on the UMBC ebiquity blog. The premise is that the movie is a short retrospective by the Museum of Media in 2014, charting the way that Internet-based personalised news and information rose to dominance over traditional news channels. Nice sting the in tail too!

What I particularly liked about this was the style of the presentation. It could have easily been presented as a whitepaper or dull-as-ditchwater Powerpoint presentation. Instead, a relatively simple Flash movie (I'm guessing - I've never authored Flash, but the the animation techniques don't look that deep) makes the subject vastly more entertaining, engaging and thought-provoking. Rhetorical question: what would a similar treatment produce for the evolution of the semantic web to 2014?

Monday, June 20, 2005

Cliff's Notes for web services

I don't know what the brands are in other parts of the world, but in the UK Cliffs Notes publish synopses of literary works with a guide to the key characters and events. Really useful for getting the rough idea, but no substitute in the end for reading the text. Via the Amazon Web Services blog I've become aware of the Pocket This Decoder for WS-Alphabet Soup. Very handy.

Friday, May 27, 2005

Launch Jetty from Eclipse - solving "Unable to find a javac compiler"

I've not used Jetty for a little while, and in the interim I've upgraded my computer and switched to Linux. Needless to say, when I came to running Jetty to do some web service work this morning, it broke. Specifically, the symptom was that when I launched Jetty from Eclipse, using the Jetty launcher plugin, Axis would fail with a nasty error message:

HTTP ERROR: 500
Unable to compile class for JSP
RequestURI=/axis/index.jsp

The problem is that the front page for Axis has changed since I last used it: it's now internationalized, which is a good thing, but it uses JSP's to do the internationalizing. JSP's are compiled dynamically, hence the need for a compiler. The HTTP error is accompanied by a stacktrace:

2005-05-27 21:35:54,421 ERROR [SocketListener0-1] compiler.Compiler (Compiler.java:412) - Javac exception 
Unable to find a javac compiler;
com.sun.tools.javac.Main is not on the classpath.
Perhaps JAVA_HOME does not point to the JDK
 at org.apache.tools.ant.taskdefs.compilers.CompilerAdapterFactory.getCompiler(CompilerAdapterFactory.java:105)
 at org.apache.tools.ant.taskdefs.Javac.compile(Javac.java:929)
 at org.apache.tools.ant.taskdefs.Javac.execute(Javac.java:758)
 at org.apache.jasper.compiler.Compiler.generateClass(Compiler.java:407)
        .....

Do not listen to the tricksy error message. He is lying! Don't waste hours messing with your JAVA_HOME setting, wondering if the problem is related to using Java SDK 1.50 or 1.4.2, and switching back and forth between like a thing posessed. No, the simple answer is to add tools.jar (found in the lib directory of your Java SDK) to the classpath given to the Jetty instance. There's a tab to set the Jetty instance's classpath as part of the Jetty Launcher. Easy when you know how.

Update Dec 2006: Thanks for the nice comments and useful suggestions folks. I've now closed this entry to further comments.

Tuesday, May 24, 2005

New service

Nice line in today's Borowitz Report:

Elsewhere, Google announced that it was taking its search technology to a new level by introducing a new service that would enable users to find their car keys.
Seemed to nicely capture the zeitgeist to me!

Monday, May 23, 2005

Back to the USA

Just back from a week long visit to Philadelphia, at rather short notice. Lots of good food, reasonable beer and excellent semantic web conversation. Had dinner one night at Marrakesh on South Leithgow St (no web site, but there are contact details). Wonderful atmosphere in a tiny and really cosy place. The proprietor told us they were trying hard to recreate the feel of authentic Morroccan family dining. Worth visiting if you're in Philly for dinner. Good news: Philadelphia has free metropolitan wifi connectivity. Bad news: unless my wireless card is failing, the free network saturates pretty easily, and the bandwidth falls off to zero. I also had problems with the free wireless network in the Holiday Inn, which was otherwise a perfectly decent hotel. The barman was very generous when pouring measures!

Tuesday, May 10, 2005

Grokker Coolness

Danny Ayers spotted Grokker, a very cool search-results organisation tool with an online demo applet. Here's an example search. After a few minutes playing with it, I thought the results were good on precision (the returned hits were relevant, and the sorted categories were all nicely cohesive), but not quite so good on recall (percentage of relevant results returned). To be fair, it's self-described as only a demo at this point. I also wonder how, using the spatial layout metaphor, you browse multiple pages of results. Nonetheless, a very well executed demo. Kudos to the developers.

Saturday, May 07, 2005

When the open-source model breaks down

I have a fairly simple requirement: I want to write a long technical document, using DocBook as the markup language, and I want to conveniently cite references to other publications. And I want it to run on Linux. That's a simple requirement to state; I've discovered, to a reasonable standard of proof, that it's by no means easy to achieve. In fact, I've spent an immensely frustrating few days just trying - and failing - to cobble together a working solution.

There are tools out there. I've been looking mostly at two: refDB, and JReferences. Both are open-source reference managers, that import a variety of formats, and can produce DocBook marked-up bibliographies for a given input document. In theory. RefDB is the more polished and complete of the two tools. JReferences is a one-man effort that seems to be moribund for the last couple of years.

I actually got a fairly long way with refDB. I have installed the software, imported my references in RIS format, can query for particular refs and generate output in DocBook format. The tool I really want to use, however, would allow me to submit a document with <citation> elements in it, and the tool would comb my database for matching references and selectively generate formatted output. This process depends on having styles imported into refDB, and this is where the problems started. When I import a style, the refDB client goes into a busy-wait state, soaking 100% of the CPU and not terminating. I tried in verbose mode, to see what the problem might be, and got a meaningless cryptic three character error code. I surmised that there might be a problem with the libdbi database drivers, so I got the latest libdbi CVS head and tried to build it. After a bit of a false start, I can build the drivers but can't install them - the make install target breaks when installing the documentation, because it doesn't recognise the docbook schema for the SGML files, for wholly non-obvious reasons. It's at this point that the "many eyeballs make all bugs shallow" maxim falls apart. For my particular configuration, there is only my pair of eyes to look at the problem. At this point, I would gladly pay for either a licensed product, or a support contract, if I felt confident that I was going to get working functionality at the end of the day. The notion that I have the source so I can fix the problem myself doesn't work here: I have limited time to spare for the task of getting references into my document. There quickly comes a point when it's easier just to code the citatations by hand than to spend time grokking someone else's code and fixing problems.

The other tool I looked at briefly was JReferences. It's a Java program, so I felt comfortable that I could ascend the learning curve a bit more quickly if I needed to. It has the right features on paper: a RIS importer (among others) and DocBook exporter (among others), and a simple editor for viewing and updating stored references. So, task one: convert my RIS-formatted collection to BibTexML, which is JReferences' preferred internal format. There's a command-line utility to do just that. Run it, and it produces ... nothing. No output and no errors. OK, so it's a program that hasn't been touched for at least two years according to the CVS log at SourceForge. Maybe running it inside a Java debugger will reveal a simple fix. So, I get the source, drop it into Eclipse and .... and it won't compile. Not even close. There seems to be two different package layouts competing with each other. Lots of inconsistencies that Eclipse barfs on. Worse, many of the problems are incompatibilities with bibtexml.jar, which is only distributed in binary form that I can find. I can't imagine how this program ever worked properly. I can't see any test code at all. There's not actually that much code all told, and I'm fairly sure I could, if I wanted, fix it up.

But why bother? It would be significantly easier for me to start over with an empty project in Eclipse than spend time understanding, fixing, and extending the existing codebase. Open source, code re-use, and so on only helps if the code is functional, comprehensible and working. Lose that, and you have worse than nothing.

Tuesday, May 03, 2005

Enterprise integration as a cultural collaboration problem

Sean McGrath posted an interesting article on ITWorld: ITworld.com - Mediators and mediatees - Enterprise integration as an industrial relations problem. The basic premise is that IT systems embody a certain view of the world, and that where these differ - e.g. the world-view of the accountancy department is fundamentally different from the goods-in department - you have conflict. Apps don't interoperate because they're seeing the world differently, and hence (Sean proposes) enterprise application integration, or EAI, is essentially an industrial relations problem. It's a nice thought, and I'm not going to get all analytical about what is essentially a thinking-aloud exercise, but I don't agree.

The basic problem, is that different doesn't necessarily equal in conflict. I know that it sometimes can, and hence enlightened companies (including HP) are very keen on promoting a diversity agenda, seeing differences between people as something to celebrate and learn from. But Sean's industrial relations metaphor seems to me to assume that the putative accounting and goods-in apps are in conflict and in need of mediation, just because they see the world in a different light. I'm no student of industrial relations, but it seems to me that such problems generally arise when the goals of two groups, workers and management typically, are in conflict over some limited resource (such as productivity, or access to the corporate jacuzzi). I don't see how this applies to EAI.

I propose a different metaphor: EAI as a problem of building collaborations between two cultures. This weekend just past, my family and I attended my wife's cousin Adrian's wedding to his Chinese bride Chunli. It was a lovely ceremony, not least because of the way that Chinese and British influences were combined to make something that was different from a traditional wedding in either culture (assuming there is such a thing).

What does cultural collaboration require? Like Sean, I'm only thinking aloud here. It requires a some common ground, and a shared purpose. It requires each participant to bring some unique values, and to make space for the uniqueness of the other. True collaboration demands a delicate balance of yielding and non-yielding of control. And, I suspect, to really bring such a thing off requires a touch of artistry. It is often paying attention to the little details that make success inevitable.

Sunday, April 24, 2005

DocBook investigation: progress update

Quick update on the DocBook investigation. I've tried a number of schema-aware XML editors for generating DocBook sources. Didn't like any of them. The problem, I think, is one of familiarity: I don't know the DocBook schema very well, so vanilla schema-assisted editing doesn't give me enough support. I tried saving OpenOffice documents as docbook. This works, but doesn't seem to offer much in the way of fine-grain control of the generated XML. Also, I couldn't get it to round-trip nicely. The best solution I've found so far is XMLMind. There's a free standard edition and a payware professional edition. I've only tried the standard edition so far. It's a Java/Swing application, with a slightly odd feel to the UI, but I quickly found myself adjusting. So far, it has been easily the most effective solution I've found for editing DocBook in XML, but with a WSIWYG (-ish) presentation. I've had to step outside the editor and directly hack the XML once or twice, for example to insert XInclude instructions to modularise my thesis into one-chapter-per-file chunks. But XMLMind was easily able to cope with the XIncludes once I had entered them. There may be a way of doing XInclude from the interface, but I couldn't see it. The standard edition of XMLMind doesn't generate PDF files: you need the professional edition for that. However, I yum install'ed fop from JPackage.org, and that works fine. It was nice to see that XMLMind is very up-to-date with the XSL stylesheets from SourceForge for transforming DocBook.

Next goal is bibliography processing. DocBook can handle references already, I just need my refs in the appropriate format. I have a large collection of existing reference data in ProCite for MS Windows. For managing the references on Linux, RefDB looks like a good choice. Unfortunately, I've not been able to install it so far due to incompatibilities with libdbi on Fedora Core 3. I've asked on the refdb list to see if anyone has a solution. It may also be that Fedora Core 4 has the more up-to-date libraries when it ships (the problem isn't just libdbi, but conflicts with the FC3 installed MySQL and PostgresQL). Fingers crossed. In the meantime, I haven't yet found a ProCite to refDB translator. ProCite can export data as a comma-delimited field, but the meanings of the fields are context-dependent on the reference type. I have a sinking feeling I may end up writing my own ProCite to XML converter. Sigh.

Final note: I've been using Bob Stayton's DocBook XSL: The Complete Guide, second edition, as one source of assistance in learning my way around DocBook's world. It's an excellent resource, thoroughly recommended.

del.icio.us: docbook

Friday, April 22, 2005

Jena tip: namespaces and the j.0 problem

A frequently asked question we get on the Jena list is paraphrased as: "help - my output contains these weird j.0 namespaces, how do I get rid of them?". In the hope that Google will save some future askers of this question some time, here's an explanation of what is happening, and what to do about it.

First, consider the following code snippet:

    public static void main( String[] args ) {
        Model m = ModelFactory.createDefaultModel();
        Property p = m.createProperty( "p" );
        Resource r = m.createResource( "r" );
        r.addProperty( p, 42 );
        m.write( System.out, "RDF/XML" );
    }

This could be expected to write a representation of the simple RDF model r p "42". But in fact, it produces a Java exception. Exactly which exception depends on the version of Jena we are using, in my current test setup I get Exception in thread "main" com.hp.hpl.jena.rdf.arp.RelativeURIException: No scheme found in URI 'p'. The problem is that RDF (and RDFS, and OWL) expect the names of things to be URI's. The symbol p isn't a URI. So let's change the example slightly:

    public static void main( String[] args ) {
        Model m = ModelFactory.createDefaultModel();
        String NS = "http://example.com/foo#";
        Property p = m.createProperty( NS + "p" );
        Resource r = m.createResource( NS + "r" );
        r.addProperty( p, 42 );
        m.write( System.out, "RDF/XML" );
    }

OK, now this runs and produces the following output:

<rdf:RDF
    xmlns:j.0="http://example.com/foo#"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" >
  <rdf:Description rdf:about="http://example.com/foo#r">
    <j.0:p>42</j.0:p>
  </rdf:Description>
</rdf:RDF>

So here's the mysterious j.0 appearing. What's going on? The j.0 is an XML namespace. It's defined in the root element of the RDF file:

    xmlns:j.0="http://example.com/foo#"

To get the full URI, just replace j.0: with the URI defined in the namespace declaration. But why was it put there at all? Consider the alternative. With RDF's striping XML syntax, elements are alternately resource and property names. Suppose we hadn't used a namespace for p:

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" >
  <rdf:Description rdf:about="http://example.com/foo#r">
    <http://example.com/foo#p>42</http://example.com/foo#p>
  </rdf:Description>
</rdf:RDF>

<http://example.com/foo#p> isn't a legal XML element name. It contains characters (such as colon) that are not syntactically permitted in an XML element name. So, using XML's namespace mechanism gets us out of a hole when we want RDF identifiers in XML to be URI's. It also has value in its own right though: semantically your p relation may denote something different to my p relation; if we put them in different namespaces there's much less chance of an accidental confusion of semantics.

So, now that we know why j.0 appears, what can we do? One solution is to not use XML output. The same example, written in N3 format instead of RDF/XML, becomes:

<http://example.com/foo#r>
      <http://example.com/foo#p>
              "42" .

No funny prefixes in sight. Alternatively, we can just ensure that we use a sensible name for the namespace instead of Jena's autogenerated j.0, j.1, etc. The key to this is the PrefixMapping interface, which is a super-interface of Model. The method setNsPrefix lets us assign a more meaningful (to human readers!) namespace:

    public static void main( String[] args ) {
        Model m = ModelFactory.createDefaultModel();
        String NS = "http://example.com/foo#";
        m.setNsPrefix( "eg", NS );
        Property p = m.createProperty( NS + "p" );
        Resource r = m.createResource( NS + "r" );
        r.addProperty( p, 42 );
        m.write( System.out, "RDF/XML" );
    }

Producing:

<rdf:RDF
    xmlns:eg="http://example.com/foo#"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" >
  <rdf:Description rdf:about="http://example.com/foo#r">
    <eg:p>42</eg:p>
  </rdf:Description>
</rdf:RDF>

del.icio.us: jena, semantic web.

Sunday, April 17, 2005

Delving into DocBook

For more years than I'm prepared to admit (even to myself ... no, especially to myself) I've been pursuing a part-time PhD at the University of Liverpool. It has taken much longer, and been much harder work than I anticipated, and I expected it to be quite hard work. Still, I met with my supervisor on Thursday last week, and we've agreed that I've done enough to start writing up now. The exact criteria for getting a PhD in the UK system are somewhat unclear (to me), but in essence the PhD is regarded as a research training, and the successful candidate needs to show a thorough understanding of the reseach area and to have made a contribution to knowledge. The primary assessment is made on the thesis, and that's what I have to produce next. Breakthrough discoveries are not required, which is good because I feel I've opened up more questions than I've answered. Actual breakthoughs: none; contributions to knowledge: well, let's see when I've finished the thesis. Emotionally I'm not sure that I've got to where I wanted to be when writing up, but I suspect that's misplaced perfectionism and hubris at work.

This brings me to the actual subject of this posting: which writing tool to use? At the office (i.e. my day job) we generally use the MSOffice suite, but I'm not going to write my thesis in Word. Reasons: (i) I've had Word crash on me and lose content or formatting information, and it's just tedious to recreate lost work, (ii) Word does this more often on long, multi-part documents which is exactly what I'm going to be writing, (iii) I want to generate decent HTML from the finished thesis so that I can put it up on the web, Word's HTML output is vile, and (iv) I don't want to be locked-in to a proprietary binary format forever. Many of my academic friends and colleagues use LaTeX for writing. I used to be a LaTeX user many, many years ago, but those skills have completely atrophied. Plus, I've seen the results of some LaTeX to HTML converters, and the results are simply horrible. So I've decided to try a brave experiment and use DocBook.

Reasons for choosing DocBook? Well, mostly the opposite of the strikes against Word. It's a well-tested, open, text-based format. DocBook is specifically designed to generate multiple output presentations (HTML, PDF, XML) from a single source. There are lots of DocBook tools, and assuming they tend towards some sort of normal distribution, some of them must be out on the right-hand tail! It is a problem, however, that there are just so many tools around. I'm finding it very daunting figuring out where to start. So, as a DocBook neophyte I'm going to try to capture some of my baby steps and discoveries as I go. Add some rocks to the way-markers as I pass.

Some of the features and issues I'm going to be looking out for:

  • Generating documents that have separate chapters, but allow me to cross reference between chapters;
  • Generating XHTML output as both page-per-chapter (or page-per-section) and one-page-per-document;
  • Inserting meta-data into the XHTML output, including id attributes so that I can cross-reference the XHTML content;
  • Using CSS stylesheets with the output XHTML;
  • Automated indexing;
  • Bibliography support (I will really miss ProCite as I move away from Word, and I need to find a way of migrating my large ProCite database to DocBook's world);
  • Equations and formulae;
  • Source editors, whether on the raw XML or a WSIWYG view onto the document content.

Usually when I learn a new technology my strategy is to hit Amazon for a good book. Not the For Dummies kind, but something that gives a pretty good map of the territory then starts getting into detail without too much flapping about. O'Reilly books are generally pretty good exemplars of the style I like. The print version of the O'Reilly DocBook book, however, is seriously out-of-date. You can read the up-to-date version online, but that ranks a poor second in my experience. As I find better learning resources, I'll try to remember to blog them here.

del.icio.us: docbook

Monday, April 11, 2005

Passionate about research?

I don't normally bother with the whole blog-rolling thing. There are plenty of smart people out there discovering and linking interesting things far more assiduously than me. An exception today, though.

Shelley recently linked to Kathy Sierra's creating passionate users blog. I've been reading for a few days now. Great stuff! I haven't read any of Kathy's team's books yet, but I certainly plan to. If they're anything like the quality of writing in the blog, they should be very, very good.

A recent post on Kathy's blog was You and your users: casual dating or marriage?. I won't repeat the stories here (go read them for yourself, you'll enjoy the experience), but the take-home is that making your users passionate about your product turns them from just customers to active advocates for your business. Great! I can really see how that applies in a commercial context. But. But I work in research. Corporate research, to be sure, not academia, but nonetheless I've been wondering how Kathy's ideas might apply in a research context. Because the essence of an academic-style research training is to be dispassionate. To take ideas, pull them apart with the surgeon's tools of statistics, peer-review and analytical cynicism, and lay them out on the slab for inspection. Reports written in the third-person passive-voice, striving for the measured tones of the respected sage.

In one way the comparison is clear: when we try to get other parts of the company interested in the ideas we're working on in the lab (what my friends at BT Labs call down-streaming), I can see that our internal customers could get passionate about the research we're pushing. Even then it's slightly different because the stuff in the lab is usually not finished. There isn't a great pair of skis to try out, though we may have a completely new and half-finished shoe clamp (NB I know absolutely nothing about skiing, except that it involves gravity in some way). Worse, often what we're actually asked for is a set of slides to summarise our work. I don't believe that anyone ever got passionate about a PowerPoint slideset. Ever.

But even more of a puzzle is how to get passionate in peer-reviewed research. Or even if that would be a good thing. I have to say, though, that most of the conferences and workshops I attend, even the good ones, are pretty dreary things. Maybe it's the format. Maybe it's the type of people who attend. Maybe it's some kind of cultural meme we all get innoculated with. But I often wonder how much value the delegates really get from such events. Especially when, as is all too common, the audience sits mutely through a presentation, says little in the Q&A, and then carps in the corridor afterwards about the poor assumptions the presenter made.

It certainly is possible to get excited about research ideas. A number of times I've had the mind-expanding experience of reading a paper and getting a real sense of new avenues of exploration, or products, being opened up. It's a rush, but it doesn't happen very often, more's the pity. Something to work on.

Friday, April 08, 2005

Jena tip: navigating a union or intersection class expression

One of the things I spend a lot of my time doing is answering Jena questions. Historically, the search capability at YahooGroups has been atrocious. For some time, I've been thinking that blogging some Jena tips for Google to find would be a good idea. I'm told that YahooGroups' search capability has been improved recently, nonetheless I'm going to try blogging some of the more common issues and FAQ's as they come up. Maybe it will save someone some time, and me some email effort.

One frequently asked question is how to get classes out of a union or intersection class expression. Suppose you have some OWL like this:

  <owl:Class rdf:ID='StateMachine'> 
  <owl:equivalentClass> 
    <owl:Class> 
      <owl:intersectionOf rdf:parseType='Collection'> 
        <owl:Restriction> 
          <owl:onProperty> 
            <owl:ObjectProperty rdf:about='#state'/> 
          </owl:onProperty> 
          <owl:someValuesFrom> 
            <owl:Class rdf:about='#State'/> 
          </owl:someValuesFrom> 
        </owl:Restriction> 
        <owl:Class rdf:about='#Automaton'/> 
      </owl:intersectionOf> 
    </owl:Class> 
  </owl:equivalentClass> 

A state machine is the intersection of class Automaton with things that have a state property. It's just a synthetic example, don't sweat the details! First, here's some code to list the elements of the intersection:

  OntClass nfp = m.getOntClass( NS + "StateMachine" );
  IntersectionClass ec = nfp.getEquivalentClass()
                            .asIntersectionClass();

  for (Iterator i = ec.listOperands(); i.hasNext(); ) {
      OntClass op = (OntClass) i.next();

      if (op.isRestriction()) {
          System.out.println( "Restriction on property " + 
                              op.asRestriction().getOnProperty() );
      }
      else {
          System.out.println( "Named class " + op );
      }
  }

Two key points here: first, Jena uses .as() to convert between views, or facets of RDF resources in the model. Since RDF resources can change type according to what's asserted in the model, ordinary Java casting doesn't work because it's too static. The .as() mechanism is really a form of dynamic polymorphism. The generic form of .as() takes the facet's Java class as a parameter, but the Ontology API classes provide various convenience methods with the pattern .asXYX. Hence .asIntersectionClass().

The second key point, and the one that many people don't seem to notice in the documentation, is that UnionClass and IntersectionClass are both instances of BooleanClassDescription, which presents a variety of means for accessing the members of the intersection or union. listOperands() returns an iterator whose values are the class in the intersection or union.

del.icio.us: jena semanticweb

Sunday, April 03, 2005

owl:minCardinality is not minUtility

The open world assumption can cause initial confusion to people trying to get to grips with the semantic web. The OWA states, in essence, that just because you don't know something to be true, you can't assume it to be false. For example, let's assume that Mary says her father is Fred (call this S1). She also says that her father is George (S2). If Fred and George actually referred to two different people, Mary's statements would be inconsistent because people only have one father (well, under normal conditions). But, if we knew that Fred was known as George to his work-mates, for whatever reason, so that Fred owl:sameAs George is true, Mary is being consistent. Let's call that equality S3. The Open World Assumption states that knowing only S1 and S2, we can't assume the negation of S3 (written ¬S3). The Closed World Assumption (CWA) allows us to infer ¬S3 if we don't actually know whether S3 is true or false. The CWA is also referred to as negation as failure, and will be familiar to anyone who has ever programmed in Prolog. Note that there's a separate-but-related idea, also well-discussed ontology design, called the unique names assumption. The UNA means that things with different names are always different, even under the open world assumption. If the UNA applied, statement S3 would automatically be a contradiction. OWL explicitly makes the open world assumption and not the unique names assumption. This entry, however, is about the OWA.

So far, so good. Now, let's suppose the following:

  <owl:Class rdf:ID="Person">
    <rdfs:subClassOf>
      <owl:Restriction>
        <owl:onProperty rdf:ID="hasParent"/>
        <owl:minCardinality rdf:datatype="&xsd;int">1</owl:minCardinality>
      </owl:Restriction>
    </rdfs:subClassOf>
  </owl:Class>
  <Person rdf:ID="mary"/>

For readers not familiar with OWL, this says, roughly, "the class Person is a sub-class of the class of all things that have at least one parent". That is, all Persons have at least one parent, but there may be some things that have at least one parent that are not Persons. Moreover, we note that Mary is a person. Many people, particularly those used to XML-Schema validation, would expect an OWL validator to complain that Mary doesn't have a declared parent, in violation of the class description. Indeed, this is a frequently asked question on the jena-dev list. But the OWA means that just because we don't know, in this local fragment of the knowledge base, that Mary has a parent we can't assume that she doesn't have one at all. Mary's parent might be declared in some other KB that isn't current visible to whoever or whatever is doing the reasoning. In fact, OWL reasoners (including Jena's built-in rule reasoner) will deduce that Mary does have at least one parent, we just don't know the identity of that parent yet.

Consequently, owl:minCardinality will rarely cause a validation error, and never on its own. So, does this mean that min cardinality has no value, or was put in by mistake, as some have suggested? No. The key point, I think, is that ontologies are not schema languages. Thinking of OWL as a complex data-description language leads to the wrong assumptions. One use for an OWL ontology is to let you make additional deductions about your instance data. In this case, min cardinality allows reasoners to infer class membership by classifying the instance data using the ontology. For example, in one of my ontologies I have:

  <owl:Class rdf:ID="AnyGoalStrategy">
    <rdfs:comment>A goal strategy in which any sub-goals can succeed</rdfs:comment>
    <owl:equivalentClass>
      <owl:Class>
        <owl:intersectionOf rdf:parseType="Collection">
          <owl:Class rdf:about="#GoalStrategy" />
          <owl:Restriction>
            <owl:onProperty rdf:resource="#any" />
            <owl:minCardinality rdf:datatype="&xsd;int">1</owl:minCardinality>
          </owl:Restriction>
        </owl:intersectionOf>
      </owl:Class>
    </owl:equivalentClass>
    <owl:disjointWith rdf:resource="#SequenceGoalStrategy" />
    <owl:disjointWith rdf:resource="#PerformGoalStrategy" />
  </owl:Class>

An AnyGoalStrategy instance is recognised as a GoalStrategy resource that has an any relation to a sub-goal. So I don't have to explicitly declare the types of my strategy objects, I just let the reasoner figure them out for me. It's just a small example, but I think it points the way to the utility of owl:minCardinality and other constructs, even in the presence of the open world assumption.

[Updated to correct a syntax error in the second example].

Call for papers: AWeSOMe '05

Among other workshops, I'm on the programme committee for AWeSOMe 2005 this year: The First International Workshop on Agents, Web Services and Ontologies Merging. Can't say I'm overly taken with the name, but I had no part in choosing it! I agreed to be on the PC because I do strongly believe (a) that the agent-computing model has something to offer, over and above SOA, and (b) that attempting to build business-focused agent systems today on anything other than a web-services platform is futile. Not that all is well in WS-* land either; other writers have covered in detail the current confusion of WS standards and goals. However, web services do have a strong degree of momentum, tool support, and mind-share. A goodly part of many agent toolkits is distributed-systems infrastructure: precisely the kind of capability that WS-* covers. While agent platforms may have invented some of those wheels, the SOA people have re-invented them, and better in some cases, and have better glossy brochures for their wheels. Time to accept it and move on.

In fact, I'd go so far as to say that a good research topic for someone is to pick a coherent subset of the WS-* collection that addresses the needs of multiagent systems, and write it up with a view to promoting a firm architectural foundation and principles for interop. Similar to what the Web Services Interoperability Organization has done for vanilla web services. Maybe this would be one role for FIPA as it re-invents itself with its new management structure. We'll see.

Saturday, April 02, 2005

Windows backup to DVD+RW

My main workhorse is a Linux workstation, but I have an HP Pavilion running WinXP that the kids use for games and homework. I've been feeling guilty that I don't have a decent backup strategy for that computer. Fortunately, a simple solution has presented itself: via Joel on Software came a link to The Daily Grind at Larkware.com, via Daily Grind no. 591 comes news of FireStreamer DVD which makes DVD+RW devices appear as tape drives that Windows Backup can see. Haven't tried it (yet), but it looks like a perfect solution. I have tried other payware tools for backing up my laptop to DVD+RW, but couldn't actually get them to work. At all. Fingers crossed for FireStreamer. I have a good feeling however: they haven't tried to solve the whole problem of doing backup, they're just fixing the obvious breakage that the builtin backup tool can't write to DVD. Elegant.

Coming up for air

Boy, it has been a busy few months. I've been completely buried in work, so no time to blog. Not sure that I'm any less busy now, but I think I need to resurrect this blog anyway. Nuin presses on, the current CVS head has improved RDF and web-service integration, and I've re-written the interpreter to improve backtracking behaviour. Plus I've added a whole new section on storing goals and strategies, encoded in RDF. No documentation yet, sorry. Soon - promise.

What else? There's a new release of Jena coming very shortly, so that's keeping me busy too. Plus we have a new boss, though it's fair to say that all of the drama in the HP boardroom hasn't directly affected our group much - we just keep working on.

Other 'what else?' changes include migrating my development environment over to Fedora Core 3. I've been using RedHat/Fedora on and off for about three years now, but I've always used Windows as my main coding platform. No more. Windows has got less stable for me, and Fedora 3 is just the business compared to earlier incarnations of RH Linux. Of course, FireFox helps ... I never did get on with plain Mozilla, or Galeon. Since I work from home a lot, I also wanted to upgrade my home office computer. Since I don't have a huge budget for indulging in new technology, price/performance was important. I ended up ordering a custom build from the fine folks at Phoenix PC's in Jarrow. Graeme at Phoenix PC's was great at helping me select a configuration, and did a good job on the build. My new Pentium IV beast is humming along just fine, though it took a surprisingly long time to re-create my user environment. It's perhaps just a little bit too easy to drop in new RPM's and not quite remember where they came from! Words of praise are due to the good people of jpackage.org for all their efforts in providing a consistent set of yum-able Java library RPM's, and to NVidia for their graphics drivers for Linux. My new machine has a GeForce FX5200; before I installed the NVidia drivers I was getting around 330 frames per second on glxgears, and Tux Racer was just unusable. After installing, which was a breeze, btw, I'm getting 760 fps in glxgears, and Tux is sliding his little penguin tummy into icy oblivion in impressive style. Much to the amusement of my kids.

Monday, January 03, 2005

Whale pictures

I've always been a sucker for good whale pictures, and Norio Matsumoto shoots very good whale pictures. Just beautiful! Note: needs Macromedia Flash to show the slideset.