Last week schema.org was launched, which started a fairly active discussion on the semantic-web W3C mailing list. The three major search engine providers, Google, Bing and Yahoo, teamed up to produce the website, which they describe as “a joint effort […] to improve the web by creating a structured data markup schema supported by major search engines.” 
What is it?
The main contribution of the platform is a set of schemas manifested as a simple vocabulary which contains entities (or “Types”) such as Event, Person, Place, Product, etc., and the respective sub-entities (“more specific types”). The technology chosen for this approach is the HTML Microdata format (currently a W3C working draft), which introduces a handful of HTML attributes that can be added to standard HTML tags such as span and div. The most prevalent ones are itemscope and itemtype, which indicate that our HTML element starts a new “item” of a particular type (e.g. itemtype=”http://schema.org/Book”) , and itemprop, whose values specify what it is we’re enclosing with the element, i.e. the properties of the item (e.g. itemprop=”name”). And this is where schema.org comes into play: the types ‘defined’ (the terms aren’t defined in the semantic sense, as the vocabulary follows a simple hierarchical ‘is-a’ tree structure) in their vocabulary can be conveniently used to fill in the values for itemtype and itemprop.
And, schema.org being about the web and on the web, the SemWeb community responded quickly. Here are some of the reactions that started active discussions in their comment sections:
- Michael Bergman’s has posted a rather elaborate and enthusiastic article on his blog, focusing mainly on the benefits of a shared vocabulary and the advocation of the straightforward Microdata format (as opposed to RDFa) to further the ‘Structured Web’: “Google and the search engine triumvirate understand well — much better than many of the researchers and academics that dominate mailing list discussions — that use and adoption trump elegance and sophistication.”
- Adrian Gschwend wrote a slightly more critical post, explaining why he sees neglecting RDFa as a mistake, and criticizing the top down approach of schema.org: “If you are strong in a specific domain, you should create the vocabulary for it, not some experts at Google/Yahoo/Bing which try to figure out how they can squeeze the whole universe in 300 or so tags.“
- Manu Sporny, chair of the RDFa working group, has titled his review ‘The False Choice of schema.org’, claiming that “the freedom of choice on the web is being threatened”. The post has also attracted a large number of commenters.
- Benjamin Nowack summarises some of the reactions on his blog and considers schema.org to be a “nice starting point” for the Semantic Web.
- schema.rdfs.org was started (looking suspiciously like schema.org) as a “community project” to provide a mapping from the schema.org vocabulary types into different RDF formats.
I’m neither an enthusiastic SEO person, nor a pessimistic RDF advocate, so I’m keeping my comments to a few key issues: 1) From a very basic ‘all structure is good’ perspective, the schema.org approach to a shared vocabulary for the most common things on the internet is a good step away from the ‘web of documents’, regardless of the data format used. 2) I am, however, not particularly fond of the vagueness of the statements the site makes about extensions and use of other formats such as RDFa and Microformats – the documentation uses a lot of ‘ifs’, ‘mays’ and ‘coulds’. 3) I’m also uncomfortable about the ‘my way or the highway’ stance on other formats that has been discussed on other blogs. Of course, you don’t have to use schema.org and Microdata, but the search engines might not find you if you use something else (wink wink nudge nudge). 4) The vocabulary is extremely restricted and high-level – it will probably do the job for many things, but the dubious way of extending it (use your own types as sub-types of the existing ones and schema.org ‘may’ adopt it if the extension is used by enough people on the web) just makes it clear that this isn’t the place for community efforts. For now it seems that Google, Bing and Yahoo promise people that they can paint the rainbow if they use schema.org, but give them only a few grey and black pens (and if you go and buy some nice coloured ones, they’ll steal them back for their own pencil case…).
The bottom line: schema.org is just YAV – yet another vocabulary in a rather restricted and restrictive format, thrown in front of SEO-hungry ‘webmasters’ by, well, search engine companies. A handful of terms in a tree, no relations, no definitions for the types – in a word (or three), it’s boring, boring, boring. There’s no need to go crazy about it or call it the end of the Semantic Web as we know it. I’d much rather focus on the more exciting, highly expressive formats that provide a platform for complex modelling and let me say things like “a Riesling is a wine with a white colour which is made from only one grape which is a Riesling grape”, with all its implications.