Tuesday, May 22, 2007

SemTech conference session notes: a hole in the ground

Notes from A Hole in the Ground: 12,476 ways to describe an oil well, d'Armond Speers. IHS is an information aggregator for various industries, with lots of separate systems and applications. E.g. 68 info processing apps for energy industry. Goal is to create a common data repository within the company (not a single container, but a consistent view), with a common data access API. There are vary many proprietary and industry standard data formats, has been some effort to come up with a common XML format. Relational model with many thousands of columns, hundreds of tables. XML format has around 1300 elements. Would equal roughly 200 billion triples.

Archaeology: "sifting through the ruins" for insights into application formats. Anthropology: getting "tribal knowledge" from old hands. Total of 12,476 attributes in collective models for describing featurs of an oil well. Problem to convert impoverished data from one region to very rich model used by other region (e.g. company data, stratigraphy).

Aim to build a domain ontology to describe common model. Tag the source data using the terms from the ontology. Some terms are common, many are very different. E.g: well codes in IRIS = 22, well codes in PIDM = 331. Many thousands of such lookup tables. Information can come from from well operators, or indirectly through government agencies, but is not consistently identified. Needs duplicate detection. Migrating the oil well model is 25% done.

Looking to apply SOA to customer applications. Anticipate a need for semantics exposed in the SOA. Some customers already mine data from delivered output to do their own integration and processing.

Started with Protege, but have now moved to using internally-developed ontology tools. How to deal with terms that are "doctrinal" rather than objectively factual. E.g: what does "deep" mean in different localities?

No comments: