An OWL: Experiences and Directions W3C Community Group

At the OWL: Experiences and Directions workshop last weekend, Bijan Parsia proposed a W3C Community Group for the upkeep and advancing of OWL, the Web Ontology Language of our choice. The aim of the group is to

” support the activist part of the OWLED mission”

A more elaborate explanation and the proposal slides can be found on our owl.cs pages.

Update: The group has found sufficient support and now exists with 21 members.

Videolectures from ISWC 2011 online

The video lectures from ISWC 2011 have been online for a while, so if you’re interested in checking out our talks, you can do this here: http://videolectures.net/iswc2011_research/

The Manchester talks are:

Try and sit through the first 5 minutes of my talk, I *will* slow down eventually.

List of helpful OWL API Tools

500px-glazier_tools

The OWL API is a Java interface for creating and modifying OWL ontologies, an essential (or, the essential)  component of any OWL tools. Mike Bergman compiled a list of tools that make use of the OWL API, including the popular ontology editor Protégé with its plugins, various OWL reasoners, and software of the more exotic variety: Thirty OWL API Tools – www.mkbergman.com

[Photo by Hans Bernhard (Schnobby) (Own work) [CC-BY-SA-3.0 or GFDL], via Wikimedia Commons]

Reactions to schema.org

Last week schema.org was launched, which started a fairly active discussion on the semantic-web W3C mailing list. The three major search engine providers, Google, Bing and Yahoo, teamed up to produce the website, which they describe as “a joint effort […] to improve the web by creating a structured data markup schema supported by major search engines.” [1]

What is it?

The main contribution of the platform is a set of schemas manifested as a simple vocabulary which contains entities (or “Types”) such as Event, Person, Place, Product, etc., and the respective sub-entities (“more specific types”). The technology chosen for this approach is the HTML Microdata format (currently a W3C working draft), which introduces a handful of HTML attributes that can be added to standard HTML tags such as span and div. The most prevalent ones are itemscope and itemtype, which indicate that our HTML element starts a new “item” of a particular type (e.g. itemtype=”http://schema.org/Book”) , and itemprop, whose values specify what it is we’re enclosing with the element, i.e. the properties of the item (e.g. itemprop=”name”). And this is where schema.org comes into play: the types ‘defined’ (the terms aren’t defined in the semantic sense, as the vocabulary follows a simple hierarchical ‘is-a’ tree structure) in their vocabulary can be conveniently used to fill in the values for itemtype and itemprop.

Reactions

And, schema.org being about the web and on the web, the SemWeb community responded quickly. Here are some of the reactions that started active discussions in their comment sections:

  • Michael Bergman’s has posted a rather elaborate and enthusiastic article on his blog, focusing mainly on the benefits of a shared vocabulary and the advocation of the straightforward Microdata format (as opposed to RDFa) to further the ‘Structured Web’: “Google and the search engine triumvirate understand well — much better than many of the researchers and academics that dominate mailing list discussions — that use and adoption trump elegance and sophistication.”
  • Adrian Gschwend wrote a slightly more critical post, explaining why he sees neglecting RDFa as a mistake, and criticizing the top down approach of schema.org: “If you are strong in a specific domain, you should create the vocabulary for it, not some experts at Google/Yahoo/Bing which try to figure out how they can squeeze the whole universe in 300 or so tags.
  • Manu Sporny, chair of the RDFa working group, has titled his review ‘The False Choice of schema.org’, claiming that “the freedom of choice on the web is being threatened”. The post has also attracted a large number of commenters.
  • Benjamin Nowack summarises some of the reactions on his blog and considers schema.org to be a “nice starting point” for the Semantic Web.
  • schema.rdfs.org was started (looking suspiciously like schema.org) as a “community project” to provide a mapping from the schema.org vocabulary types into different RDF formats.

So…?

I’m neither an enthusiastic SEO person, nor a pessimistic RDF advocate, so I’m keeping my comments to a few key issues: 1) From a very basic ‘all structure is good’ perspective, the schema.org approach to a shared vocabulary for the most common things on the internet is a good step away from the ‘web of documents’, regardless of the data format used. 2) I am, however, not particularly fond of the vagueness of the statements the site makes about extensions and use of other formats such as RDFa and Microformats – the documentation uses a lot of ‘ifs’, ‘mays’ and ‘coulds’. 3) I’m also uncomfortable about the ‘my way or the highway’ stance on other formats that has been discussed on other blogs. Of course, you don’t have to use schema.org and Microdata, but the search engines might not find you if you use something else (wink wink nudge nudge). 4) The vocabulary is extremely restricted and high-level – it will probably do the job for many things, but the dubious way of extending it (use your own types as sub-types of the existing ones and schema.org ‘may’ adopt it if the extension is used by enough people on the web) just makes it clear that this isn’t the place for community efforts. For now it seems that Google, Bing and Yahoo promise people that they can paint the rainbow if they use schema.org, but give them only a few grey and black pens (and if you go and buy some nice coloured ones, they’ll steal them back for their own pencil case…).

The bottom line: schema.org is just YAV – yet another vocabulary in a rather restricted and restrictive format, thrown in front of SEO-hungry ‘webmasters’ by, well, search engine companies. A handful of terms in a tree, no relations, no definitions for the types – in a word (or three), it’s boring, boring, boring. There’s no need to go crazy about it or call it the end of the Semantic Web as we know it. I’d much rather focus on the more exciting, highly expressive formats that provide a platform for complex modelling and let me say things like “a Riesling is a wine with a white colour which is made from only one grape which is a Riesling grape”, with all its implications.

[1] http://schema.org/docs/faq.html#0

Axiomatic Richness – is your ontology full fat or skimmed?

The term ‘axiomatic richness’ is used in various places to talk about a certain property of an OWL ontology, mostly meaning ‘how much do we say about a particular concept’. Axiomatically rich ontologies are in some way considered better and more interesting than axiomatically lean ones. There is, however, no clear definition of the term. A quick google search for ‘axiomatic richness’ throws up only a few distinct sources that attempt to answer the question ‘what makes an (OWL) ontology axiomatically rich?’. In what follows, I discuss some of the main points of the papers and blog posts I have found.

‘Possibility of Deriving Inferences’

Robert Stevens and Sean Bechhofer discuss the term in their post on the OntoGenesis blog:

The axiomatic richness of an [ontology] refers to the level of axiomatisation that is present. […] A lack of axiomatic richness limits the possibility of deriving inferences from an [ontology]. […] Axiomatic richness could be measured in a number of ways. Hayes for example, in the Naive Physics Manifesto, discusses density. […]

(from http://ontogenesis.knowledgeblog.org/257, 2010)

It also states that in order to be axiomatically rich, the information in the ontology has to be “in a form amenable to machine processing”; plain text descriptions, such as in a SKOS vocabulary, are not sufficient.

This states that axiomatic richness is somehow related to the inferential potential in the ontology, but doesn’t give any further hints as to how we can measure axiomatic richness, or how we can tell whether ontology A is ‘richer’ than ontology B.

‘Large Number of Justifications’

Further down the list of search results, I happened to stumble across my own paper about the Justificatory Structure of OWL ontologies (OWLED 2010), where I state that

[…] taxonomic ontologies containing only trivial axioms of the form (A SubClassOf: B) are commonly regarded as axiomatically weak. A simple indicator for axiomatic richness could be a large average number of justications for entailments.

(from http://owl.cs.manchester.ac.uk/explanation/owled2010/JustStructure_OWLED2010.pdf, 2010)

“Could be” – nothing definitive here either. Many justifications (on average) for the entailments in the ontology simply means that there are many reasons why a certain entailment holds (entailment in the sense of asserted and inferred axioms that satisfy the entailment relationship with the ontology – blog post on this issue to follow soon, potentially including and discussing reviews from my DL workshop paper). While this might be an indicator of redundancy in the ontology (for which we haven’t got a definition either), the number of justifications alone doesn’t tell us much about how much we say about a particular concept, which is usually the focus when talking about axiomatic richness.

We could probably extend this guess to say “a concept A is axiomatically rich if there are 1) many justifications for 2) entailments of the form A SubClassOf B or EquivalentClasses(A,B)”, i.e. entailments that somehow define the concept. (Counter) examples might follow.

Using ‘Expressive’ Constructors

Mikel Egana Aranguren‘s thesis is a rich (haha) source of information about axiomatic richness. I found this quote quite interesting:

The OWL version of the Gene Ontology […] is implemented exploiting a rigorous formalism (OWL), but a limited fragment of the expressivity of OWL is used in axioms. On the other hand, the OBO version of the Sequence Ontology […] is axiomatically rich (e.g. symmetric properties and intersections of classes can be found in the ontology).

(from http://www.sindominio.net/~pik/thesis.pdf, 2009)

He also claims that “bio-ontologies represent biological knowledge in a limited, lean and not rigorous manner”.

A similar assumption is made in Martin Hepp’s description of “A Methodology for Deriving OWL Ontologies from Products and Services Categorization Standards”

[…] the semantic richness needed for most business scenarios will come from the usage of the huge collection of properties.

(from http://is2.lse.ac.uk/asp/aspecis/20050152.pdf, 2005)

Well. I see the point in this argument (similar to the one I made above, i.e. we can’t really say much if we only use atomic subsumptions in our ontology), but I disagree with the statement that expressivity=axiomatic richness. In many of our experiments, we have found that expressivity doesn’t really tell us much about how ‘complex’ the ontology is – reasoner performance, number and size of justifications, etc., do not correlate with the types of constructors found in the ontology (to a certain extent, obviously). Just using the constructors in some way to define a concept doesn’t necessarily make the ontology ‘richer’. Trust me, Son, I’ve seen some of those allegedly weak EL++ ontologies that could have made “the strongest man on earth whimper like a frightened kitten”.

Ontology Design Patterns

Robert Stevens and Mikel Egana Aranguren mention the term again in their paper “Applying Ontology Design Patterns in Bio-Ontologies”. They claim that Ontology Design Patterns (ODP)

[…] have already brought benefits in terms of axiomatic richness and maintainability […]

(from http://www.springerlink.com/content/d2lp476v0p281q73, 2008)

They refer to two more papers dealing with ODP in bio-ontologies, which I won’t cover here.

Locality Based Modules (LBM)

My (current and past) office neighbour Chiara Del Vescovo and Thomas Schneider drop a hint at defining axiomatic richness in a WoMo workshop talk:

[…] extract all (relevant) LBMs in order to […] draw conclusions on characteristics of an ontology:[…] What is the axiomatic richness of O?

(from http://www.informatik.uni-bremen.de/~ts/talks/1005_dl+womo.pdf, 2010)

Unfortunately, the slides don’t go into detail, and I don’t remember any discussions from the talk, so I can’t say much about this.

Non-Trivial Entailments

Yet another explanation from Manchester can be found in Pavel Klinov’s and Bijan Parsia’s paper on “Implementing an Efficient SAT Solver for Probabilistic DL“:

For axiomatically weak TBoxes, where almost all subsumptions can be discovered by traversing the concept hierarchy […]. More complex TBoxes may have non-trivial entailments involving concept expressions on both left-hand and right-hand sides […]

(from http://www4.in.tum.de/~schulz/PAPERS/STS-IWIL-2010.pdf, 2010)

To clarify, I assume the ‘non-trivial entailments’ means subsumptions that are inferred, not asserted, whose justifications involve GCIs. This sounds similar to my statement above about ‘many complex’ justifications for entailments.

Conclusion

Scio me nihil scire. I do however quite like the idea of relating axiomatic richness to the number and type of reasons (i.e. justifications) I have for an (some, all?) entailment of the ontology. We could certainly use some formal definition (or multiple, depending on which aspect is most relevant to the developer, the domain, the application…) which allows us to think of the same things when talking about ‘axiomatic richness’ and comparing ontologies. To be continued…

The Semantic Web Layer Cake

I just came across this 3-dimensional representation of the Semantic Web Layer Stack – or cake-  handcrafted by Benjamin Nowack. The author also links to Jim Hendler’s talk on the Semantic Web Layercake (from 2009), possible the world’s first Semantic Web talk completely in rhyme. Jim’s talk gives a good overview of the evolution of the Semantic Web and how an incredible number of icing, sprinkles and candles was added to the layercake over the years.

semantic_web_technology_stack

The complexity of the stack – both the ‘simplified’ version and the more elaborate one in Jim’s talk – makes me wonder how usable the Semantic Web approach really is. Will there be a point where the technologies converge, some die, others emerge as winners? Or will we live happily with a big messy cake that’s got a little something for everyone?

(via http://mikeleganaaranguren.wordpress.com/2011/03/11/semantic-web-layer-cake-3d/)