Sunday, May 27, 2007

Semantic Technologies 2007 wrap-up

Back home now after Semantic Technologies 2007. It was a good few days, instructive on different levels. The two Jena sessions I facilitated went reasonably well from my point of view, despite a rather shaky start to the tutorial (I was nervous, and the preparation I thought I'd done enough of turned out to be insufficient ... I'll know better next time). I've yet to receive formal feedback from the conference organizers, which I'd expect to come in the next few weeks. The delegates are asked to fill-in evaluations for each session.

This was my second year at ST, and both times it was a very well organized event. The days were perhaps a little on the long side. Sessions started at 8.00 or earlier, and continued to around 6 pm. On the plus side there were lots of breaks, which was a good chance to meet and mingle. The venue (Fairmont Hotel) is very nice and the food was actually quite good.

The conference sessions I attended were a little mixed, but included some very good talks. It's interesting contrasting this conference, which essentially has a business orientation, with the more academic conferences like ISWC. The style of presentation is different, lacking the organising pattern of 'problem-hypothesis-solution-validation' but not necessarily worse for that. I also think that I missed some good talks (there were eight parallel tracks): somehow I need to get better at mining good candidates out of the conference programme. The talks were generally longer than the common academic pattern of twenty-minutes-plus-five-for-questions which had an interesting effect: most presenters left lots of time for questions. These question sessions were often the best bits of the talks.

There was a trade show too, with (at a guess) around 25 exhibitors from large companies (Oracle) to small one-or-two person outfits. Lots of tools, including ontology development (e.g. TopBraid Composer, SandPiper) and semantic application development (Metatomix, TopBraid). Quite a few approaches to extracting structured semantics from unstructured text sources. Being a geek technically-minded person, I gravitated towards the more techy-looking stands. The good ones were very good at explaining their pitch, while at others (who shall be nameless) I met some marketing person with that rabbit-in-the-headlights smile who answered all my questions with "you need to talk to the technical guy". Sigh. Overall I think the trade-show was larger and more active than last year. There was some very good technology and tools, but no applications that made my jaw drop. Or even hinge downwards a bit.

There were several panel sessions, one of which I missed most of due to illness. On the whole, they were well organized but not very informative. I think they made a mistake in making the panels too large, which meant that each panelist had too little time to develop much of a theme. The investors and analysts panel was especially disappointing: I didn't get much insight there at all. I had been looking forward to hearing Nova Spivack speak, but he didn't get much air-time from the moderator and didn't say anything about Radar Networks except "we're in stealth mode".

Would I go again? Yes. Let's hope Wilshire invite me back next year!

Now with added RDF

Finally got around to configuring FeedBurner to deliver RDF (i.e. RSS 1.0) from this blog. Sorry it took so long!

Wednesday, May 23, 2007

SemTech conference session notes: keynote panel building the semantic industry

Notes from Building the Semantic Technology Industry: A Conversation with Entrepreneurs and Investors.

Safa Rashtchy

Analyst with Piper Jaffry (10 years). Consumer behaviour is changing. Google makes it easy to find things with minimal effort (avg. 1.2 keywords per search). People becoming "lazy"? Do consumers really give much knowledge in searches? Barriers/priorities: wave will be consumer web, money will come from advertising. People are too used to getting thigns for free. New effect is smart matching advertisers to users. But advertising musn't be too much in your face.

Russell Glass

Zoominfo semantic search engine. Drivers? Consumers don't care about semantic web. But are voracious consumers of content. Would require thousands of people to create the content that we can automatically aggregate today. Barriers/priorities: disagree about advertising only. Trends suggest ad/subscription hybrids are growing. Rich semantic models allow businesses to determine how to partition business model between subscription and advertising.

Mark Greaves

Vulcan Inc - venture capital fund founded Paul Allen. Drivers? Data integration is a long-term need. Maybe we'll be successful this time! Web 2.0 is the first computer architecture that came from the people, not from the enterprise. People are social animals, want to be heard and have their disparate needs met. Consumer drivers will accelerate past enterprise.

Jamie Taylor

Metaweb - producer of Freebase. Responsible for community building. Drivers? Have to reduce cost or increase value. Mash-ups increase cost because they are disorganised. Data will become more organized to decrease cost. Open data is the key. Data as a community good. Semantic technology will sneak in on the back. Barriers/priorities: long tail - can micro-apps have value to other people? Change in mindset: open my data, brings value both to other people and therefore back to me.

Mills Davis

Project10X. Facilitating. Has a report on the state of the market available from the semtech web site. Questions 1. what are the drivers? 2 what are the barriers and priorities?

Bradley Allen

Founder of Siderean (formerly of Inference Corp). Drivers: lots of information, vast growth. Need metadata to manage data overload. Barriers/priorities: are consumer and enterprise models merging?

Nova Spivack

Radar Networks. User facing semweb app, still in stealth mode. Derives from semantic desktop research. How can we integrate our various scattered filesystems (email, desktop, laptop, online, etc)? Drivers? Web 3.0 - coming decade of web innovation. Cyclical pattern. Pre web-1.0, making the pc usable. Web 1.0 - backend innovation. Web 2.0 - front end, wisdom of crowds etc. Web 3.0 = "dataweb". Mainstream adoption of semweb is still several years away. Advanced reasoning, OWL etc, is web 4.0? Agents etc - a decade away.

Eghosa Omoigui

Intel Capital. Largest VC company in the world. 27 countries. Has three focus areas: consumer internet, search, semantic technology (note: not nec. semantic web). Was "underwhelmed" by web 2.0 expo. Drivers? Web is becoming a social medium - making friends on the internet not in real space. Does this raise expectations. Want everything at high speed. What we do has to fit in with what people are doing anyway. Barriers/priorities: unintended consequences. Security/privacy? Who will see my open data and what will/can they do with it?

comments/questions from audience

shouldn't the first wave of development be moving existing apps to the new platforms. where are the semantic versions of existing apps? Nova: it's still very expensive to build apps - the tools just aren't there yet. Scalability. James: what's the LAMP for semantic apps. People do component replacement, e.g. semantic stores.

Is there is land grab in the semantic web? What is it? Also, what education needs to happen? Mark: land to be grabbed is content and ontologies. Content is king. Zoominfo and Freebase are building presence in knowledge about people and semi-structured content. Nova: land-grab for user's attention. Bradley: land-grab for vocabulary. FOAF etc. Education - we need to understand what's essential and what is not. Put aside reasoning, focus on block-and-tackle issues like basic vocabulary. Mills: land-grab executable knowledge.

Comment: actually the government has all the data you need (from a government representative!)

SemTech conference session notes: semantic query

Notes from conference session Semantic Query: Solving the Needs of a Net-Centric Data Sharing Environment, Matt Fisher and Mike Dean. Want to pull in information from multiple sources, federated queries. Want to deliver information as a single response, timely, trustworthy, from all relevant sources. No human assistance, don't want to have to have intimate knowledge of the data source. Data spread over more than a single repository (db, excel, files, local access db), or in multiple formats - maybe proprietary. Traditional solutions: data warehousing, multi-dimensional databases, business intelligence approaches. Risks replicating the problems but on a larger scale.

Asio distributed query solution. Bridges to existing sources, initially relational db's and soap web service endpoints. Use swrl rules to map from domain ontology to data-source ontology. For RDB's, use D2RQ to map the db schema to an ontology. Semantic query decomposition - determine which db to send which parts of the query to. Not addressing "data deconfliction" in this project - mapping rules determine the golden data source for a given query element.

Use SWRL to map individuals between datasources. Translate SWRL rules to Jena rules via SweetRules. Automapper uses JDBC to introspect the db schema. Each table becomes an OWL class. Columns become properties. Based on D2RQ, not using :join or :AdditionProperty, but added :constraint.

SBWS - Semantic Bridge for Web Services. General tendency for web services to mediate access to data sources. Gives data owners more control, comfort, but makes it hard to understand the schema and hence generate the mappings. SBWS is being (will be?) adapted to include REST, using WADL to describe REST-ful services.

SemTech conference session notes: semantic user experience

Notes from Semantic User Experience, Ross Centers, James Huckenpahler, Rob Bauman. General discussion of user experience design. Claim: semantic technologies will make it easier to maintain a brand more consistently. Web 2.0 - perpetual beta is a value. Constant updating and development, faster release cycles. So what's the role for interaction design? Designers will need to think about how components/widgets will fit in other contexts. Claim (not expanded): semantics will make it easier to build mashups.

Tags vs ontologies (built by anyone/crowds vs. built by "teams of dwarves locked in mines"). Could wiki bridge the gap? Semantic wiki as a way for crowds to refine a shared ontology. E.g: Semantic Assistants - Research Assistant (screenshot of an apparent product, but not visible via google).

Wish list for designer:

  • ontology of visual representations
  • ontology of user interaction
  • semantically-enabled design tools - label the affordances of designed elements ("this button does this ... ")
  • a detailed model of the user to relate semantic products to

Rob Bauman - "world's first semantic game". Treasure Hunt. Built on ontologies, 2FTE for 13 months (1 engineer and 1 modeler). Budget CN$300M CN$300K (thanks for the correction Rob). Built a reusable game/simulation platform. Ontologies for game play, economics, 3d modelling, game resource, ... others. Technologies: visual knowledge, open croquet. Built a model of croquet using an ontology representation, so that the game engine can drive the 3d engine. The user interface shows connections between concepts and "agents" representing active behaviours. Includes decisions and inference steps. Claim that it will scale to tens of millions of agents.

SemTech conference session notes: migrating from relational world to semantics

Notes from Migrations: Moving From a Relational World to a Semantic One, Barbara McGlamery. Case study: redesign of web site for Entertainment Weekly. Wanted to make more use of data collected, and make more scalable. Category Tool, year 2000, successful but not flexible (very hierarchical categories) and not scalable. Topics - semantic web tool 2005, built on realsimple.com.

Category Tool: 78K categories, plus relations, built on Vignette story server. Everything is a category (both classes and instances). Topics: built on Sybase DB. RDF and OWL. More flexible structure, represent individuals and relationships directly in OWL. Goals include to make the data more portable, poly-hierarchy (Will Smith is an actor and a musician), and support multipart relationships.

Ontology design driven by business needs and application, but not re-using existing ontologies. Ontology design issues

  • dates would sometimes be imprecise (May 12 2007 vs. Spring 2007)
  • multipart relationships Critic Owen gave grade B to move M. Used specific properties e.g. :gaveGradeA, :gaveGradeAMinus

Also data clean-up problems. Homonyms, relationships in the wrong direction (film is lead in actor). Done manually by interns using spreadsheets!

Did the migration in three phases, development, QA and complete. Each phase had to correct mistakes or add cleanup, but still very manual. Other sites in Time web presence now also use the same named entities.

SemTech conference session notes: semantic SOA

Notes from Semantic SOA: Aligning IT with Business Operations, Larry Lafferty. SSOA - consortium of companies to provide a semantic services framework. Autogeneration of forms from soa interfaces. Integrates various vendor components: Siderean, AgentLogic (event distribution), Kapow ("mash-up service"), ISL, SoftPro CommandLink ... (I missed some).

Claimed an insight that workflows should include human interaction, rather than be just machine based. Built two demos: pilot recovery planning for downed plane, and information fusion for identifying suspicious activity in shipping. Some of the info for the info fusion demo comes from Google and Wikipedia!

Only partial semantic descriptions of services to date. Need a process editor for end-users to create their own workflows. How well can users really cope with the complexity of real workflows?

Question from the audience: you described this talk as being about Semantic SOA, but you didn't talk about OWL-S, SAWSDL, ... , etc. Can you talk about those now? Answer: no I can't, sorry.

Tuesday, May 22, 2007

SemTech conference session notes: building the practical semantic web

Notes from Building the Practical Semantic Web With Focus on Reasoning, Lars Hard. semweb presents barriers to web workers - heavyweight documents and standards, albeit that they provide a good foundation. We need better tools to hide the complexity from ordinary developers. Other barriers include:

  • creating rdf ontologies is too hard
  • automated knowledge extraction is not usually possible
  • "usable" reasoning, scale and complexity of computation
  • can be hard to show value over non-semantic approaches

Most semweb examples are corporate and dull. Not enough cool apps for general audience (cue a list of the usual suspects).

Need new tools - simple and fun. One click publishing, includes a SOAP interface. Target demographic: 15-17 year olds! Example based programming, feeding a machine learning algorithm. Strong growth will come from many networked small-scale applications.

Example applications: tyre recommendation based on slider inputs for price, performance, etc. Wii game selection based also on slider inputs, or on games that are similar/related. Increase degree of disovery to raise the number of games in the recommendation set. FelixGames.com - flash games vertical search.

SemTech conference session notes: relational navigation

Notes from Relation Navigation: Delivering on the Promise of a Semantic Web, Bradley Allen. Missed the intro section. The user experience: context, relationships, participation. Focus on emerging de facto standard ontologies like foaf, dc, etc. "Unanticipated queries" is the key differentiator with standard navigation patterns. Scalability and integration are key issues: 10^9 triples. The question for me, though, is not can we scale the underlying store but can we scale the UI? No interaction design can cope with trying to show more data than the user can comprehend, so how do we get users to see the forest instead of the trees?

Siderean now has twenty-six customers to date; media, federal, aerospace/defence, ... Helping people know what's available to them.

SemTech conference session notes: machine-to-machine intelligence m2mi

Notes from The Semantics of Simplicity - M2M Intelligence and Complex Adaptive Systems, Geoff Brown. Can we withstand events like Katrina - what happens when civilian and command-and-control infrastructure collapses? Can we build a really resilient infrastructure that can withstand such events. Thesis: OSI seven-layer stack is brittle and static, we want more flexibility about the locus of control, and new layers. Specifically an eighth layer valued information at the right time (VIRT), and zeroth m2mi. Supposedly, this will allow a machine to reconfigure the stack and protocols dynamically.

I found it all very high level, abstract, and not very convincing [note: edited to improve the tone].

SemTech conference session notes: a hole in the ground

Notes from A Hole in the Ground: 12,476 ways to describe an oil well, d'Armond Speers. IHS is an information aggregator for various industries, with lots of separate systems and applications. E.g. 68 info processing apps for energy industry. Goal is to create a common data repository within the company (not a single container, but a consistent view), with a common data access API. There are vary many proprietary and industry standard data formats, has been some effort to come up with a common XML format. Relational model with many thousands of columns, hundreds of tables. XML format has around 1300 elements. Would equal roughly 200 billion triples.

Archaeology: "sifting through the ruins" for insights into application formats. Anthropology: getting "tribal knowledge" from old hands. Total of 12,476 attributes in collective models for describing featurs of an oil well. Problem to convert impoverished data from one region to very rich model used by other region (e.g. company data, stratigraphy).

Aim to build a domain ontology to describe common model. Tag the source data using the terms from the ontology. Some terms are common, many are very different. E.g: well codes in IRIS = 22, well codes in PIDM = 331. Many thousands of such lookup tables. Information can come from from well operators, or indirectly through government agencies, but is not consistently identified. Needs duplicate detection. Migrating the oil well model is 25% done.

Looking to apply SOA to customer applications. Anticipate a need for semantics exposed in the SOA. Some customers already mine data from delivered output to do their own integration and processing.

Started with Protege, but have now moved to using internally-developed ontology tools. How to deal with terms that are "doctrinal" rather than objectively factual. E.g: what does "deep" mean in different localities?

SemTech conference session notes: related search using semantics

Semantic Technology Conference sesssion: Related Search using Semantics: A Case Study from CNET, Tim Musgrove. CNet is the 10th largest global web site, many web brands (news.com, shopping.com, etc). Collaborative filtering recommendations can throw up anomolous recommendations (e.g. see also 'ipod' when searching for 'hp laptop'). Click through rate for alternative searches CF about 3%, but coverage only about 4% of incoming searches. Problem is lack of data for statistical approaches. Have to use whole query, since word-by-word query decomposition introduces too much ambiguity.

Word sense disambiguation, e.g. "ultralight" to "ultraportable" is an alternative to CF. Case study with CNet: integration took one day with one engineer. Has been up on CNet site for 10 days, so only early untuned performance. Results: 3.9% CF coverage, semantic equivalance 19.1%. Click-through for SE only slightly lower, but best results for click-through and coverage when combined methods.

Adding named entities to the lexical background knowledge could increase coverage, allow lowering of the quality threshold. Next step: expand search using hypernym, hyponym, etc.

Saturday, May 19, 2007

Vegetarian dining in the SF Bay Area

I'm in the Bay Area this week, preparing to give a Jena tutorial at the Semantic Technologies 2007 Conference in San Jose next week. While I was visiting with colleagues at HP Labs, the most excellent Brett Bausk put together a Google map of vegetarian and vegan restaurants in the locality. What a cool way for people with the kind of knowledgeable insight that Brett has to share it with other folks. Ace!

Thank you Brett.

del.icio.us: vegetarian, dining, restaurant, san-francisco, bay-area.