Tuesday, October 27, 2009

ISWC research track - raw notes 2

Graph based ontology construction from heterogenous sources - Boehm et al

Gene ontology: 28K concepts, 42K relations, takes human experts years to make a new release. Would like automatic ontology bootstrapping. Four steps: concept definition, concept discovery, relationship extraction, ontology extraction

Contribution: combination of heterogenous information sources. Given a set of concepts and a large text corpus, create directed weighted concept graph, find a sub-graph that is consistent (cycle free), valid and balanced.

List of desirable topological properties, tree form, balance, etc.

Solution 1: greedy edge inclusion. copy nodes first, then copy edges one at a time discarding any that add a cycle.

solution 2: find set of nodes that are strongly likely to be a super-concept of other concepts. recursively add children, using a fan-out limit.

Evaluation. Text corpuse PhenomicDB. Compare to Mammalian Phenoeype. Weighted dominating set approach had highest precision.

[Author did not report on the human-acceptability of the auto-generated ontologies.]

Q: what's the basis of the desirability of the topological properties? A: introspection from own inspection. Comment from audience: tangled hierarchies can be shown to be better for browsing.

[other questions, rather hard to hear since there was no microphone]

No comments: