Saturday, June 02, 2007

Jena tip: dealing with reflexive class and property iterators

Under the semantics of OWL, every class is a sub-class of itself. Let's assume we have three classes: A, B and C. C is a sub-class of B, and B is a sub-class of A. According to OWL, the sub-classes of A are therefore A, B and C.

In Jena, the reasoners, both built-in and external (like Pellet, will correctly infer the expected triples:

:A rdfs:subClassOf :A .
:B rdfs:subClassOf :A .
:C rdfs:subClassOf :A .

However, oftentimes that correct conformance to the spec can be a nuisance when programming. Suppose we are generating the TreeModel for a Swing JTree directly from our Jena triple store. We really don't want each node in the tree to have itself as a child. This was a sufficiently common user request that, in the Jena ontology API - a convenience API for handling ontology terms - the OntClass Java class doesn't report itslelf as a sub-class when listing the sub-classes through listSubClasses(). The triple is still there in the model (assuming the appropriate degree of inference is turned on), but is filtered out from the return value to the listSubClasses() method.

It has recently been pointed out to me that listSubProperties() in OntProperty does not behave the same way. The theory is the same - every property is a sub-property of itself - but the method does not automatically filter out the reflexive case. This is an accident of history: until now, very few users have requested that feature in OntProperty. But I can see the argument that the two list... methods are inconsistent in their behaviour.

Fortunately, there is an easy workaround, which applies to this case and indeed any other where filtering out the reflexive case would be handy (e.g. when listing equivalent classes). The iterator returned by listAnything in the Ont API is a Jena ExtendedIterator, which has a number of features including a filter hook. Calling filterKeep or filterDrop on an extended iterator returns a new iterator will return a new iterator whose values are limited to those that match a given Filter object (or which don't match in the case of filterDrop). So to skip over the reflexive case, and not report that a property is its own sub-property we do:

/** Filter that matches any single object by equality */
public Class EqFilter implements Filter
  private Object m_x;
  public EqFilter( Object x ) { m_x = x; }
  public boolean accept( Object x ) { return m_x.equals(x); }

// in the application code:
OntModel m = ... the Jena model ... ;
OntProperty p = ... the property of interest ... ;
Filter reflex = new EqFilter( p );

ExtendedIterator subP = p.listSubProperties()
                         .filterDrop( reflex );

I don't know whether to change the default behaviour of listSubProperties. We generally like Jena to stick to the standards it is based on, in this case the OWL semantics. On the other hand, the point of the ontology API is to be a convenience layer on top of the raw RDF triples. Convenience is in the eye of the beholder. What I definitely don't want to do is add yet another Boolean flag to the method call. I'm open to suggestions!

No comments: