Saturday, November 28, 2009

Great article: learning from hostage negotiators

Really interesting article on Boxes and arrows: what design researchers can learn from hostage negotiators. I've used something akin to the coach role in user-studies that I have conducted in the past, though in our case the coach also had the role of note-taker, freeing the lead interviewer to be able to listen carefully to the interviewee without being distracted by getting the key points down on paper. Nonetheless, an essential part of the note-taker's role was to notice if the interviewer had missed some interesting avenue to follow, and prompt (but not take over the dialogue).

del.icio.us: user-studies, best-practice.

Wednesday, November 04, 2009

Moving along

Friday October 30th was my last working day at HPLabs after 20+ years with HP, most of that in the research labs. I'm not going to introspect too much on the event itself – the reasons for the large-scale changes in staffing are for HPL management not me to discuss. Suffice to say that Jena will continue, and indeed become more open, and the current Jena team members will continue to contribute to the platform, albeit from different host organizations. For me personally, alongside a number of ex-HP colleagues I'll be moving to a new Linked Open Data startup named Epimorphics. More details on what that involves in due course! However, after a long time in corporate R&D for a very large organization, I'm very much looking forward to working for a company that is smaller (but with big ambitions) and more agile.

In the meantime, it does mean that I can no longer be reached on my old email address: ian.dickinson@hp.com. For people who used to use that address, please update your contacts list to point to i.j.dickinson@gmail.com.

Wednesday, October 28, 2009

ISWC research track - raw notes 4

Lifting events in RDF from interactions with annotated web pages - Stuhmer et al

Want to model complex events - multiple mouse clicks. Use case: online advertisement. Contextual advertisement, behavioural advertising. Context, eg Adsense, based on ip etc. Behavioural - based on history of user's web pages, using cookies or web-bugs.

Drawbacks: context - similarity matching not robust. Behavioural - old history may not be relevant. To remedy: build complex events as short-term profiles, model these in an OWL ontology. Schema seems to encode a basic ontology of event expressions: conjunction, sequence, etc. Simple events: DOM (incl. mouse clicks) and clock.

Add some context to simple events. More than just the DOM location (that would be just syntax). Annotate pages with RDFa, use those annotations to enrich simple events. If the event happens on one node which is an RDF subject, use that to constrain the choice of subjects. Otherwise, go up the DOM tree to the dominator node, which may be the document root.

Contributions: the technical implementatio, the event model itself.

Server side event processing - not done yet. [which makes it hard to see what the value is, since they don't illustrate the interpretation of the events]

ISWC - SPARQL WG panel

WG has just published a first set of six working drafts that indicate what's coming in SPARQL 1.1. Caveat: no decisions yet, just indications. Naming: SPARQL 1.1 query, update and service description. Picked about 10 out of about 50 proposed extensions. Stable by Spring'10, Completed in Aug'10.

Project expressions - select something in a query that is not just a simple variable. Aggregates - min, max, count, etc. Subqueries - embed one query in another. Negation - sparql 1.0 makes it difficult to ask what is not known, fix this in 1.1. Service description - language for describing common extensions in a given sparql end point. Update language - 1.0 is read-only, member submission of rough draft of update language, will be in 1.1. Update protocol - use of HTTP POST for update. Map basic RDF operations to core HTTP operations (RESTful RDF via sparql). Following are 'time permitting', will be done if there is time ... Property paths - arbitrary length paths through the graph; regex-alike expressions. Basic federated query - based on ARQ's SERVICE keyword. Entailment regimes - what does it mean to query SPARQL in the face of RDFS or OWL, or RIF rulesets. Common functions - commonly used built-ins.

Slides here.

Brief intro from WG members. Moved on to Q&A, but I had to leave to check out of the hotel.

ISWC Tom Mitchell keynote - raw notes

How will we populate the semantic web on a vast scale? - Tom Mitchell keynote

Three answers: humans will enter structured info; database owners will publish; computers will read unstructured web data.

Read the Web project. Inputs: initial ontology, handful of training examples, the web (!), occasional access to a human trainer. Goals: (1) system running 24x7, each day extract more facts from the web to populate the initial ontology, (2) each day learn to perform #1 better than the day before.

Natural language understanding is hard. How to make it more plausible for machines to read? ways:

  • leverage redundancy on the web (many facts are repeated often, in different forms)
  • target reading to populate a given ontology, restrict focus of attention
  • Use new semi-supervised learning algorithms
  • Seed learning from Freebase, DbPedia, etc...

State of project today: ontology of 10^2 classes, 10-20 seed examples of each, 100 million web pages. Running on yahoo m45 cluster. Examples include both relations and categories.

All code is open-source, available on web site. Currently XML, working on RDF.

Impressive demo of determining academic fields: 20 input examples, looked like hundreds of learned examples, good quality results. Output includes the learned patterns and alternate interpretations considered. approx 20K entities, approx 40K extracted beliefs

Semi-supervised learning starts to diverge after a few iterations. Under-constrained. Making the task apparently more complex by learning many classes and relations simultaneously. Adds constraints. Unlabeled examples become constraints. Nested, coupled constraints. "Kryzewski coaches for the Devils" have to simulatenously classify coach name and team name.

"Luke is mayor of Pittsburgh" - learn functions for classifying Pittsburgh as a city based on (a) "Pittsburgh" and separately (but coupled) (b) "Luke is mayor of"

Information from the ontology provides constraints to couple classifiers together: e.g disjointness between concepts. Also provides for consistency of arguments in noun phrases (domain and range constraints).

Coupled bootstrap learner. Given ontology O and corpus C. Assign positive and negative examples to classifiers (e.g. cities are negative examples of teams). Extract candidate (conservative), filter, train instance and pattern classifiers, assess, promote high confidence candidats, share examples back to coupled classifiers using ontology (including using the subsumption hierarchy)

Rather than focussing on single sentences in single docs, system looks across many sentences to look for co-occurrence statistics. Macro-read many documents, rather than micro-read single document.

Example of IBM learned facts. Rejected candidates might be good input to a human doing manual ontology design.

If some coupling is good, how to get even more? One answer: look at html structure, not just plain text. If some cars are li elements in a list, then likely the other li's are cars as well. PhD student Richard Wang at CMU - SEAL system. Combine SEAL and CBL. Combined system generally gets good results, though performance is poor in some categories (e.g. sports equipment). To address performance issues, extend ontologies to include nearby but distinct categories.

System runs for about a week, before needing restart. Some categories saturate fairly quickly.

Want a system that learns other categories of knowledge. Tries to learn rules by mining the extracted KB. Need positive examples - get from KB. Where to get negative examples? Not stored in KB. Get help from ontology. For restricted cardinality properties (e.g. functional), can infer negative examples.

Examples of learned rules - conditional horn clauses with weights. Showed some of the failed rules as well, e.g. skewed resuls due to partial availability of data. Good rules can be very useful, but bad rules are very bad - need human inspection to filter out bad rules.

Future work: add modules to inner loop of extractor, e.g. use morphology of noun phrases. Making independent errors is good! Also: tap in to Freebase and DbPedia to provide many more examples during bootstrap.

Q: can system correct previous mistakes? A: current system marches forward without making any retractions. Internally, evidence of error builds up. Should be possible to correct previously made assertions.

Q: how to deal with ambiguity? e.g. Cleveland is a city and a baseball team. A: current system is a KB of noun phrases non entities in the world. Knows that things can't be multiple categories. Leads to inconsistency. Need to change program to have distinct terms for the word and its separate senses.

Q: what about probabilistic KB? A: currently store probs, but hard part is how to integrate 10^5 probabilistic assertions. How to do prob reasoning at scale? Not known.

Q: can you learn rules with rare exceptions? A: can have exceptions, but not different types. Could understand the counter-examples to an otherwise good rule. Could generate new knowledge (example of 'continuing education students').

Q: how to deal with dynamic changes to the world? A: yes, it's a problem. Second most common cause of errors. Would need to add some temporal reasoning to the KB.

Q: what can we do from semweb to encourage ML researchers to contribute? A: it will happen, if you [sw community] can build big databases. Very excited about DbPedia. Suggest pushing on natural language processing conferences. They are not aware of these [semweb] kinds of resources. Btw, there are other linguistic resources you can draw on as well as wordnet, e.g verbnet, propnet (?).

Tuesday, October 27, 2009

Open architectures for open government - raw notes

Cory Casanave

Many different architectures and solutions from many vendors. Want to link them together in open architectures. Integrate data as part of the LOD cloud. Flexible, standards based, flexible, info managed at source, .... Not perfect, but what other technology choice is there?

Architectures as data. Pictorial form: powerpoint, visio, sometimes UML. Architectures should be seen as data we can manipulate and share. Where the data is an how to use it. Example: process for decommisioning a nuclear reactor [is that an architecture?]. Current architecture assets are trapped in stovepipes.

Goal: linked open architectures. Want business and technology view of data. Need different viewpoints for different stakeholders, but over the same data.

Motivations include: federated data, collaboration, cost reduction, planning & governance, agile, drive IT solutions.

How? Publish architecture models as LoD, in their current vocabularies. [how to publish a visio diagram?] Roadmap for enhancing value: adapt and develop well-defined "semantic hub" models, map new architectures to these hubs, define stakeholder viewpoints, tools and techniques, external comment & input. Working groups at OMG and W3C.

Want standards based. e.g XMI, UML, BPMN, SoaML. Can publish these today. Exmample: data model with addresses, contact details, etc. Can convert XMI form into RDF [though the example given didn't look like valid RDF to me], and publish automatically by checking into a special SVN repo.

Demo at portal.modeldriven.org/project/EKB. Resources: GAIN initiative (open government, open linked data and architecture portal.modeldriven.org/project/GAIN.

ISWC research track - raw notes 3

OntoCase: Automatic ontology enrichment based on ontology design patterns - Blomqvist

Ontology design patterns: www.ontologydesignpatterns.org. This talk focus on content pattens: small ontologies with a speficic design rationale

Semantic web: want more lightweight ontologies from non-logicians, e.g. web developers. Start with domain specification (e.g. texts) and task specs (e.g. competency questions). Ontology learning is possible, but accuracy is low and relational information is especially hard. Problems with background information.

OntoCase aims to add some explicit background knowledge and enrichment, built on top of exsiting ontology learning tools. Adds ontology design patterns. Input: learned OWL ontology and set of patterns. Output: ontology enriched by patterns. By: matching, cloning and integrating patterns. Two modes: true enrichment and pruning mode. In pruning, only include the parts of the input that match a pattern.

Example: input concepts person, hero (subclass of hero), stage. Match to agent pattern.

Evaluation ... missed some details ... can add background knowledge, and increase in accuracy of added relationships.

Future work: general improvements, does not use task inputs at the moment.

ISWC research track - raw notes 2

Graph based ontology construction from heterogenous sources - Boehm et al

Gene ontology: 28K concepts, 42K relations, takes human experts years to make a new release. Would like automatic ontology bootstrapping. Four steps: concept definition, concept discovery, relationship extraction, ontology extraction

Contribution: combination of heterogenous information sources. Given a set of concepts and a large text corpus, create directed weighted concept graph, find a sub-graph that is consistent (cycle free), valid and balanced.

List of desirable topological properties, tree form, balance, etc.

Solution 1: greedy edge inclusion. copy nodes first, then copy edges one at a time discarding any that add a cycle.

solution 2: find set of nodes that are strongly likely to be a super-concept of other concepts. recursively add children, using a fan-out limit.

Evaluation. Text corpuse PhenomicDB. Compare to Mammalian Phenoeype. Weighted dominating set approach had highest precision.

[Author did not report on the human-acceptability of the auto-generated ontologies.]

Q: what's the basis of the desirability of the topological properties? A: introspection from own inspection. Comment from audience: tangled hierarchies can be shown to be better for browsing.

[other questions, rather hard to hear since there was no microphone]

ISWC research track - raw notes 1

Detecting high-level changes in RDF/S

Want to detect significant changes. Low-level language: report adds and removes of triples. Hight Level languages define classes of event: change_superclass, pull_up_class, etc. High level changes are closer to the intent of the change maker. High level changes are more concise.

Challenges: granularity (not too high or low), must be able to support a deterministic algorithm to assign triples to high-level changes.

Language defines triples added/removed, and semantic conditions on the triple set either before or after. For determinism, language must be complete and unambiguous in the consumption of changes.

Heuristic changes are those that require matchers, e.g. rename class.

[I have a nagging doubt that their algorithm won't work on a model that includes inference, not just raw triples. don't have a counterexample yet, though]

Algorithm is in theory quadratic, in practice the results are better than that.

Q: applies to OWL as well? A: no, only RDFS

Q: how do you know whether these are the right set of high-level applications? tested by introspection with human experts.

Q: related to refactoring languages in SW eng? A: probably, haven't looked at that.

ISWC In Use Track - raw notes 4

Rapid: enabling scalable ad-hoc analytics on the semantic web - Sridhar et al

Motivation: rapid growth in RDF data. Progress on storage, but not analytics.

analytical queries include multiple groupings and aggregations. E.g for each month of the year, the average sales vs the sales in the preceding month. Hard to do in databases, even hard in RDF because: absence of schema, combine data and metadata.

goal: using map-reduce to do RDF analytics. High-level dataflow languages e.g. pig, latin, etc, but these languages expect structured not semi-structured

RAPID uses pig as a basis. Extend pig latin with RDF primitives. showed raw pig latin program - about 10 steps. Q: how to automate/abstract this, to avoid chance of user errors? [missed a bit here]

expression types: class expression, path expression. Three key functions: generate fact dataset, generate base dataset, multi-dimensional join. GFD re-assembles n-ary relationsships from triples. GBD - container tuples for each group for which aggregation is required. MDJ find match between base and fact tuples, and update base dataset.

Reasonable results compared to non-optimised MapReduce applications. Comment from the audience: very slow (five orders of magnitude) compared to traditional data-warehousing.

[Saw comments on IRC via Twitter that this is just like early 90's BI applications. The example wasn't well chosen from that pov, but I think this is quite interesting. Doing analytics on large scale datasets is going to be a huge problem in my opinion]

ISWC In Use Track - raw notes 3

Tudor Groza et al - Bridging the gap between linked open data and the semantic desktop.

web - problem finding and linking relevant work. desktop - publication silo, problem finding and linking relevant files.

Linked open data on the web - linking. Semantic desktop - linking. Can we connect them?

Incremental enrichment process. Extract shallow metadata, expand using linked data, integrate into semantic desktop.

Extraction of metadata: shallow or deep. [Author going extremely quickly though his material, very hard to take notes ... have to read the paper]

Good results from small-scale user study.

ISWC In Use Track - raw notes 2

Kalyanpur et al - extracting enterprise vocabularies

IBM and Gartner. Enterprises need semantic vocabularies. Can they be generated bottom-up from source documents? Tried using NLP tools and off-the-shelf named entity recognizers, but poor recall (50% of possible terms identified by domain expert).

Summary of solution: algorithm to discover domain-specific terms and types; techniques to improve quality and coverage of LOD; statistical domain-specific NER's using LOD.

discovering domain-specific terms. use part-of-speech tagger to identify all nouns as possible terms, then filter using tfidf , then infer types using LOD, then use types to further filter the terms. Result in 896 terms, estimated probable terms would be 3000 in full dataset.

Improving recall: improved type mappings between dbpedia and freebase using conditional probs. New mappings included in dbpedia downlaod since Aug'09. Improved LOD: add instance types. Get entity disambiguation for free using term URI's. Generate candidate patterns using super-types from ontology, let machine learning system score each candidate.

Final result: start with precision - recall of 80-23, raised it to 78-46 with all improvements. Conclusion: lots of benefits of using LOD as input for vocabulary extraction.

ISWC In Use Track - raw notes 1

Auer & Lehmann - Spatially Linked Geodata

Many real-world tasks use spatial data. Current LOD datasets only have large-scale geographic structures, not bakeries, recycling bins, etc. How to get geo data for small scale objects? OpenStreetMap.com - provides a crystallization point for spatial web data integration. stats on current size of database, growth rates 7-11% montly in various categories. collaborative process, data stored in RDB but available as periodic dumps or incremental update feeds. Can add arbitrary key-value pairs to any element, can be used to add semweb annotations.

Authors' project converts OSM models and properties to RDF/OWL. Result: 500 classes, 50 object properties, 15K data properties (which seems like a lot)

Use triplify to generate RDF from relational data. Dump at linkedgeodata.org/Datasets, sparql endpoint hosted by OpenLink. Other REST interfaces: points within a circular radius of a given point (cool!), points within a radius belonging to a class, points in a radius with a given property value.

Want to link to other LOD datasets, e.g DbPedia. Some owl:sameAs links in schema are obvious. Also use DL-learner to match categories. For instance data, three matching criteria: name, location, type. Some problems matching locations, since no consensus on where to place location markers for large entities like cities. For large countries, e.g. Russia, centroids can be 1000km apart between OSM and Wikipeida. needed some string matching metrics to get name matches, but set threshold fairly high. Generated 50K matches to DbPedia objects, mostly cities.

Demo - very nice. Facet browsing can be used to narrow selections. Much effort to index data for efficient facet lookup. Quadtile indexing - 2 bits per quad, recurse. 18 zoom levels, producing discrete hypercube.

Future work: link to other datasets. Refine LGD schema. Refine browser. Apply best practices from other Geo projects.

ISWC keynote: Pat Hayes - raw notes

Two talks in one. Blogic (web log = blog, so web logic = blogic). RDF Redux - how we could easily revise RDF to make it more expressive, without changing the meaning of existing RDF.

Principles of blogic. Web portability: logic and entailments can be accessed elsewhere, should commute. RDF is portable, ISO common logic is portable, OWL-DL, classical FOL are not. OWL-2 is better, but not quite there.

Names. IRI's have structure and meaning, can be owned and controlled, etc. However, in logic names are opaque tokens. Big disconnect, but not sure how to address it. RDF semantic interpretations are mappings from a given vocabulary, but it would be better to state 'from all possible names'

Horatio principle: truly universal quantification not a good idea. OWL is mostly OK, but complement is problematic.

SameAs not the same as. We need a way to describe co-reference without equating the conceptualisations. E.g DbPedia and CYC have different conceptualisations for sodium, but are denoted owl:sameAs.

Death by layering. Layer cake diagram is a good computer architecture layer but a really bad approach for semantics. E.g term URI's from OWL have different meanings depending on whether the triples are seen as basic RDF or as OWL.

Part 2: RDF redux

There are many things wrong with RDF that should be done better. [List]. However, there is a more basic problem: blank nodes in RDF are broken. Basic issue is that it is not obvious how to describe a bNode mathematically. Approach was to use set theory, but this was wrong. Using a Platonic idea to describe syntax. Fix would be to view graphs as drawn on some surface, then bNodes are marks on that surface. RDF redefined to be a graph + a surface, doesn't operationally change any existing RDF. No graph can be on more than one surface. Fixes lots of problems: copy vs. merge, named graphs, etc. Provides a syntactic scope for RDF nodes.

Surfaces themselves can have meaning. E.g: positive surfaces assert contents are true, negative surfaces assert contents are false, neutral surface, deprecated surface.

Would have to allow surfaces to nest, would require changes to RDF syntax. Allowing this, RDF would get full first-order semantics a la CS Pierce. Thus RDFS would not be a layer on RDF, but an abbreviation for assertions that are already expressible in (revised) RDF.

Question on tractability. Aren't the layers there for tractability? Ans: no, can still use languages with defined characteristics. Anyway layers don't do that either. This proposal is about metatheory, not practice.

Question: does it support other hard extensions like fuzzy langs, temporality? Ans: Doesn't solve, but gives it a clear point to start.

Question: (TimBL) isn't this what N3 has with curly bracket contexts? Ans: maybe, but Pierce was first

Q: so why not just fix RDF? A: would love to, what's the process?

Q: this borrows from conceptual graphs, but they aren't widely used, why would this succeed? No, just suggesting a refinement of the foundations of RDF. Don't overemphasise Pierce.

Q: we want family of nearly-same-as relations. What does logic offer? A: good question, wish I knew the ans! Context is important - success of communication depends on choosing the right interpretation of names. Lynne Stein argues this is a much more fundamental problem.

Monday, October 26, 2009

Semantic Sensor Networks - raw notes 4

Semantic management of streaming data - Rodriguez et al

Extension to RDF with a triple store and query engine to bridge triple stores and streaming stores. RDF does not have a built-in concept of time or dynamic data. Virtual sensors project views out of streamed radar data.

Resources, not triples, are timestamped. Add an annotation to the resource URI containing the timestamp. Finding the latest value of a stream requires optional and filter-not-bound in regular SPARQL. In TA-RDF, this reduces to annotation "[LAST]". Implementation based on Tupelo over Sesame.

Semantic Sensor Networks - raw notes 3

Generating Data Wrapping ontologies from sensor networks - Sequeda et al

goal to learn wrapper ontologies from sensor networks, analogous to data source wrappers. Straight road problem: cars on a toll road have sensors, tolls are set to control flow. Seems to depend on a network of derived queries that are given for this domain. Was able to observe relationships between entities, but reducing the relationships to a recongnisable simple form remains future work.

Semantic Sensor Networks - raw notes 2

A survey of the semantic specification of sensors. Compton et al, CSIRO.

W3C has an incubator group on sem sensors - SSN-XG. Develop a reference OWL model for describing sensors.

Table of 12 existing semantic sensor ontologies, some active, some not. At least one agent-centric. Two perspectives: data and sensor. All input ontologies agree that a central concept is Sensor, but with different descriptions. Many aspects to model, including structure, network, physical attributes, accuracy, energy use etc. Different ontologies differ in depth and expressive power.

Supporting technologies: DL reasoners, SPARQL, rules, task assignment. [I Wonder how much of this would be relevant to the W3C reference ontology?]

Rich range of input ontology concepts, but there remain some concepts that do not yet appear in any of the precursor ontologies.

Semantic Sensor Networks - raw notes 1

Sensory Semantic User Interfaces SensUI – Bell et al

A context-aware system should be able to adapt to circumstances. Sensors are the means to inform the context. Categorised ontologies into four classes: device, context, data and [one other]. Seems very early stage work.

Sunday, October 25, 2009

SWUI'09 workshop - raw notes 4

Final discussion. What do we want to be able to show in +1 year? How to demonstrate utility of SWUI designs. Examples. Building a catalogue of patterns, interesting examples. Duane has a private collection of interesting examples from recent ISWC presentations and other publications. Should we try to build a collaborative version? Possibly create a standardised problem set as an evaluation/benchmark? How to show more leadership to the rest of the community? Perhaps through the semantic challenge? What are the successful examples we can publicise?

SWUI'09 workshop - raw notes 3

Max Wilson: Sii (search interface inspector) – a usability evaluation method. Especially designed for search interfaces. Part of Max's recently defended PhD thesis. Inspections vs. user studies: inspections are fast, lower cost and give valid results. IBM improved search in 1999: sales up 400 percent, support requests 84 percent, but only took 10 weeks and cost $1 million. Sii aims to evaluate ideas early in the design process. 3 steps: choose designs to evaluate, identify th tools/features, count the moves to perform the tactics. Build a list of features across all evaluated systems. For each feature, count moves for 32 search tactics. Look at features across evaluated systems, or view feature provision per tool, or compare to standard user types ({scan,search} x {specify,recognize} x {data,metadata} x {learn,find}). Sii tool allows 'what-if' explorations.

Application of method to Sii. Tasks retrieving a known document, understand a specific event. Table view comes out lower than map and calendar views, since those views have a natural affordance for zooming in to particular regions. Overall, learning task was better supported than document retrieval task.

See mspace.fm/sii

Question: doesn't this encourage just adding more functionality? Ans: complementary metric based on cogitive load theory to measure cost/benefit of new features.

SWUI'09 workshop - raw notes 2

Andrea Splendiani. Towards a new paradigm for user interaction on the semantic web to support life sciences investigation. Revolution in biology: more and more data, dna sequencing proteomics, etc. But number of new drugs is decreasing, number of active antibiotics is decreasing, food crises. Life science has become an information intensive discipline, info is naturally distributed and heterogenous. Many relevant resources in RDF - need to share and integrate. Need a new language: ontologies. Key problem: studying the connections between parts. Developed some interactive visualisation tools for biological data – users finding interesting patterns in data, even without understanding all of the nuances of the visualisation elements. RDFscape: browsing and visual queries over a triple store. Visualisation gives context, can then browse at finer detail using RDF. Can start with queries from query pattern, then modify to a specific user's need using the underlying ontology (and instance data?) Want to present the same content using different presentations, which give the user different intuitions. E.g don't always show a single URI as a single node in a graph, since it can give an inaccurate intuition of the importance of a node in a structure.

Questions we want to look at:

  • Who are UI's for? level of understanding of "model" needed
  • How to specify labels
  • Does the "traditional" model of UCD still work? How much prior knowledge of RDF should be required?
  • visualiations: control, manipulation, management, navigation
  • role of middle layer in integrating different viewpoints, multiple ontologies
  • who has the rights and responsibilites to craft the representation?
  • role of the visualisation as part of the interface, or becoming the whole UI

Andrea: next steps. Tag clouds are visually more interesting than tables of occurrence. Can we use colour code and location to show statistical significance. Problem: how can I know which selection will lead where I want? Perhaps show neighbourhood using colour/shade to indicate costs/distances. Daniel: allow 'pioneer' users to save successful explorations for other users. Duane: how to allow users to articulate goals, and others to profit from that? How to map natural language structures to the structures of the data?

Eric: visual designers will make use of shape, colour, line weight, location, distance etc to convey meaning to the user and to direct user's attention. Duance: where do the decisions get made that influence interpretation?

We don't have to visualise everything in the dataset, maybe just focus on standout elements and summaries/statistics. Some users may resist loss of information, even if displaying all data is not useful or even feasible.

SWUI'09 workshop - raw notes 1

Small workshop: about 10 people. Should be lots of interesting discussion. Question (every year): what is different about semantic web interfaces? This workshop has been running since 2004. Last year's workshop was at CHI - drew in more expertise from traditional UI researchers. This year: do we now have a critical mass of available data. So: what to do with it? How do we deal with scale? How to link with other communities that are dealing with similar or related questions?

Participants intros. Duane – runs a small consultancy, interested in semantics as a means to enable user effectiveness. Li Ding – semantic web in GIS, swoogle - search tools to help users identify presence (or absence) of ontologies of interest, now working on government data publication. Max Wilson - Univ Swansea, UK. Mspace, use of semantic web in education, how to enrich user interface without exposing semantic data directly to user? Andreas Splendiani, Rothamstead Research: ontologies in life science domain, indirect route through life sciences to OWL, data integration problems – know-how is in users' heads. Eric Miller - Squishymedia, user interaction design consultancy, focus on usability and user testing, IA, launched a semweb enabled application last week. Jen Golbeck director of Human Interaction Lab, social networks and trust - how to take advantage of the data on the web, recommender systems. Daniel Schwabe - University of Rio, been involved in hypermedia since before the web, model-based approaches, moved to semantic web when that appeared, use of data to enrich UI, how to separate interface from application, how can user use linked open data in an interesting way? - current approaches are analogues of HTML browsers but is that the right approach?, how to design rich behaviours that actually help the communicability of the application owner's intent to the user, use of non-speech sound. Vadim Soskin - software for museums, museums moving from proprietary to openly-shared information, no standards for information sharing, use semantic technologies for data interchange, how to explain benefits of moving from relational model to RDF/OWL?, how to cope with large scale data in UI. Zhenning ?, RPI - research in user-interaction with linked data, how to help users to formulate appropriate queries and how to enable users to communicate back to semantic web. Rostistlav ? - OntoText, aim to enhance company's product UI's, linked open data application, extensible UI toolkits. Andreas Harth - semantic search as part of PhD, broadened out into general UI interaction, data from the web is very noisy - how to deal with?, difficulty in comparing applications because no standard datasets or tasks. Suggestion to follow conversation at ixda.org.

Friday, September 18, 2009

Fedora 11: high CPU utilization, logrotate

I had a problem with my Fedora 11 system suddenly shooting into 100% CPU utilization. Looking at top, the culprit was a perl script, and tracing back up the process tree its parent was logrotate. After some searching, it turned out that the size of my /var/log/messages file had got way out of control – 5.5Gb. I thought the whole point of logrotate was to stop that sort of thing happening. Anyhoo, removing /var/log/messages, and /var/cache/logrotate/*, then kill -hup'ing the wayward perl process has restored CPU sanity to my system. I've added a size limit to /etc/logrotate.conf to try to ensure that it doesn't recur, though I'm still not quite sure how it got so bad in the first place.

del.icio.us: fedora, linux, cpu, system-maintenance

In Pursuit of Elegance

I've always liked and been fascinated by the concept of elegance. One of my favourite programming aphorisms comes from Richard A. O'Keefe: elegance is not optional. So I enjoyed reading this interview with Matthew E. May: In Pursuit of Elegance: 12 Indispensable Tips.

del.icio.us: design, inspiration

Update: fixed broken link (thanks Dave)

Wednesday, July 22, 2009

m2Eclipse and JUnit4: 'no tests found' - solved

Just a quick note: I have a project in Eclipse which is maven-based, and generated initially using the "new maven project" wizard in Eclipse. I'm using JUnit 4.5 for the unit tests, and could quite happily run the tests from the command line using maven, and individual tests from Eclipse using run as JUnit test.... However, when I tried to run all of the tests in the project by invoking run as JUnit test... on the project root node, Eclipse complained "no tests found with test runner junit 4". Solved by upgrading m2eclipse to the latest stable development build from the m2eclipse update site (specifically, I upgraded from version 0.9.8.200905041414 to version 0.9.9.200907201116 in Eclipse Galileo).

Wednesday, July 15, 2009

Eclipse/maven: solution to cannot nest folders problem

I had the following problem with Eclipse 3.5 (Galileo) and m2Eclipse 0.9.8, but from searching I believe that it occurs with other combinations as well. Situation: I have a maven-ized project in subversion, and I want to check it out into my Eclipse workspace. Method: bring up the subversion repositories view, navigate to the repository folder with the project in it, right click, select the "Check out as Maven Project ..." context-menu option. All seems to be OK – the project gets created, maven dependencies are resolved, etc, except Problem symptom: Eclipse complains that Cannot nest 'rdfutil/src/test/java' inside 'rdfutil/src'. To enable the nesting exclude 'test/' from 'rdfutil/src' (your project name will obviously differ). And there's a red 'compilation failed' triangle in the problems list. Check the Java build path, all is fine. Check the pom.xml, all is fine. What to do? Solution Right-click the project node in the project explorer window to first close it, and then again to delete the project, but without removing the project contents on disk. Then, right-click in the project explorer window to "Import ... > Existing project into workspace ...". Navigate to the location of the project, import, and it comes back without the nested file warning.

Wednesday, June 24, 2009

Amazon mp3 store: Linux kudos

Heh, just bought an mp3 album from Amazon (having discovered Synaesthesia on last.fm), and expected the usual "Linux? Sorry, is that Windows or Mac?" trouble. Was very pleasantly surprised to be presented with a page to grab the download manager for mp3 albums for any one of four Linux distros, including Fedora. Cool. Thank you Amazon!

Edit to say that nice though it is to have some Linux support, I've just upgraded to Fedora 11 and the Amazon player doesn't work on F11. Not only that, Amazon customer services have yet to respond to the email I sent them three days ago. Sigh. You win some, ....

Updated again to say that I have now had a very polite reply from Peter at Amazon customer services, but, no, they're not going to say when or if a new version will be released. Since it's just a matter of recompiling with with the new version of the libraries, this is a little disappointing. Sigh.

Tuesday, June 09, 2009

Blind search comparison

I generally default to just using one search engine (yes, that one). Blind Search is a service for trying the three main search engines currently in use (Google, Yahoo and the new one with the silly name). Given some search terms, Blind Search shows you the top 10 results from each of the three engines, anonymously, and asks which set got closest to your expectations. Interestingly, I found that honours were spread fairly evenly, and, even more interestingly (given my normal search behaviour) Yahoo edged ahead.

It's also a nice bit of UX engineering: I found myself very curious to know which engine was which, but you can only get to find out by selecting the engine that best matches your expectations, thus feeding useful stats back to the service. Neat.

Tip 'o the hat to Tim at UMBC for the pointer.

Sunday, June 07, 2009

Interesting: KI 2009 Mashup Challenge

This looks like an interesting competition: use some sort of AI technology of your choice (including semantic web reasoning) to do something cool in the KI 2009 AI Mashup Challenge. Could be fun!

Saturday, June 06, 2009

Cupboard ontology collaboration space

Collaboration around ontology development – sharing, discovering, re-using, augmenting ontologies – is one of those tasks that keeps coming up as a missing part of the puzzle that "someone should do something about". A recent blog posting from Mathieu d'Aquin in which he pre-announces the Cupboard system (or announces the invitation-only beta if you prefer) sounds pretty intriguing. It apparently garnered some good feedback at ESWC, and I'm looking forward to having a go when it's publicly available.

del.icio.us: semantic-web

Friday, June 05, 2009

Fedora 10 upgrade

In the past, I've been rather wary of upgrading my Fedora-running PC's to the latest spin of the OS. I've had problems with the upgrade path, and found it easier to blow away everything except the /home partition and start over with a clean install. This does work, but I then have to spend a long time re-installing all of the customisations I make to the way things work, add back in my preferred packages, etc etc. Very dull, takes a long time. Feeling brave, I thought I'd give upgrading another throw of the dice. I burned a DVD from the torrent of Fedora 10, rebooted my Fedora 9 pc, and selected the upgrade existing OS option. First thing that I noticed is that it only asked me one question: what kind of keyboard did I have? Easy enough. Everything after that ran unattended. At the end, I reboot, and nervously watch the bootlog for problems. Nothing obvious, apart from CUPS not starting. Log back in, find a few things that seem a bit borked, but in general it's looking good. Run yumex to get the latest updates: over a thousand of them, 1.5G of downloads. Ouch. I had to uninstall cinelerra-cv to solve a dependency issue, but I'm not actually sure what that package is. After that, and a long, long wait, I reboot the system and everything seems to be working smoothly. The only glitch so far is that my grub.conf got overwritten, so I have to recover the alternate OS boot options. Easy.

Summary: kudos to the Fedora team, this has so far been a painless upgrade. Thank you!

And, yes, I know that Fedora 11 is about to drop. While I'm frequently a bleeding-edge junkie, maintaining Linux is one place I feel more comfortable lagging a bit behind the curve!

Tuesday, May 19, 2009

BlazeDS error: Unsupported AMF version 17,491

Doing some Flex development today using BlazeDS for the Java side, I came across a problem where the server raised an exception Unsupported AMF version 17,491. Googling did not provide a solution, but other people have had the same problem. While I can't say for certain how this might affect other users, in my case it was entirely self-inflicted. I was trying out different combinations of channels for connecting the client to the server, and at one point I changed the end-point of the AMF channel without changing the handler class. In my Services.mxml I had:

<cairngorm:ServiceLocator 
  xmlns:cairngorm="http://www.adobe.com/2006/cairngorm" 
  xmlns:mx="http://www.adobe.com/2006/mxml">
    <mx:RemoteObject
        id="commandDispatcher"
        destination="CommandDispatcher"
        showBusyCursor="false">
      <mx:channelSet>
          <mx:ChannelSet>
              <mx:channels>
                  <mx:AMFChannel 
                    url="http://localhost:8080/iserver/messagebroker/amf"/>
              </mx:channels>
          </mx:ChannelSet>
      </mx:channelSet>
    </mx:RemoteObject>
</cairngorm:ServiceLocator>

At one point, I was using the streaming channel instead:

<mx:StreamingAMFChannel 
        url="http://localhost:8080/iserver/messagebroker/streamingamf"/>

The problem above arose when I accidentally put my code into an inconsistent state:

<mx:StreamingAMFChannel 
        url="http://localhost:8080/iserver/messagebroker/amf"/>

The URL points to the non-streaming channel, but the wrapper class is StreamingAMFChannel. Moral: don't do that.

del.icio.us: java, flex, blazeds, error

Friday, March 13, 2009

Tim Berners-Lee talk on Linked Open Data

Quite a nice presentation from Tim on the basic ideas of Linked Open Data. Here's the video of his talk at TED. Doesn't say much new to people who already get LoD, but it's a handy thing to be able to refer to when explaining the idea. Shame about the rather cheesy audience participation segment around 11:15, but it's mercifully short and immediately followed by a nice example of protein identification.

del.icio.us: semantic-web, linked-open-data

Tuesday, January 06, 2009

No, actually, I did mean POST

I'm working on a Flex UI which is driving a RESTful service backend. Well, OK, the current implementation uses standard HTTP verbs but isn't fully bought in to hypermedia-as-the-engine-of-application-state, so I guess it's RESTful-ish. Sosumi.

Naturally, when I'm creating a new resource on the server side from the client, I want to use POST. That's what POST is for, after all. Suppose I want to create a sub-resource of some given URL. It might be a new state resource, such as a record of a first-class representation of a user action. Crucially, I may want to fill in the details of the resource later, with PUT. Initially, it's enough that the resource exists at all. Here's my flex code:

protected function createOnServer( url:String, value:Object = null ):HTTPService
{
    var _req:HTTPService = new HTTPService();
    _req.url = url;
    _req.resultFormat = "text";
    _req.addEventListener( ResultEvent.RESULT, resourceCreationDecodeCallback );
    _req.addEventListener( FaultEvent.FAULT, resourceErrorCallback );
    _req.method = "POST";

    trace( "created post request object, about to send()" );
    _req.send(value);

    return _req;
}

Pretty straightforward stuff. Create an HTTP service, set the method to POST, send() and wait for the callback. Invokes HTTP POST on the service endpoint, right? Wrong. Inexplicably, if the value of the call is null – the default value – Flex will silently transform the method from POST to GET. Huh? How does that work? Even if, in some way that's entirely not obvious to me, it's Bad to send an empty HTTP POST, the client should signal an error. Not silently change the semantics of the HTTP request to something completely different.

The solution, or rather, workaround, for this is to always invoke the above method with a value, even if that's not called for in the design of the resource interaction:

createOnServer( "http://localhost:8080/demo/api/some/resource", 
                Object({dummy:"dummy"}) );

del.icio.us: flex3, web-service.