Graph based ontology construction from heterogenous sources - Boehm et al
Gene ontology: 28K concepts, 42K relations, takes human experts years to make a new release. Would like automatic ontology bootstrapping. Four steps: concept definition, concept discovery, relationship extraction, ontology extraction
Contribution: combination of heterogenous information sources. Given a set of concepts and a large text corpus, create directed weighted concept graph, find a sub-graph that is consistent (cycle free), valid and balanced.
List of desirable topological properties, tree form, balance, etc.
Solution 1: greedy edge inclusion. copy nodes first, then copy edges one at a time discarding any that add a cycle.
solution 2: find set of nodes that are strongly likely to be a super-concept of other concepts. recursively add children, using a fan-out limit.
Evaluation. Text corpuse PhenomicDB. Compare to Mammalian Phenoeype. Weighted dominating set approach had highest precision.
[Author did not report on the human-acceptability of the auto-generated ontologies.]
Q: what's the basis of the desirability of the topological properties? A: introspection from own inspection. Comment from audience: tangled hierarchies can be shown to be better for browsing.
[other questions, rather hard to hear since there was no microphone]