Tuesday 30 October 2012

Future Ontology - Five year predictions past and future

I made a number of ontology predictions five years ago to this very day. Here's a review of those and a few more for the next five years.

On 30th October 2007 I gave my first public presentation on the ontology work I'd been doing since starting work at EBI in May that same year. Today, 30th October 2012, I gave a talk during which I reflected back on things that have passed over those five years and, in order to do so, ended up looking at that old set of slides from 2007. It made for interesting reading. At the end of the talk I made some predictions about what we might do and where we might end up as a community. I thought it might be nice to share those now and to make a few more public, five year predictions. If I'm still working and the world has not ended in 2017 maybe I'll do it all again.
I also predicted that leather pants would make
a comeback. I was wrong on that one, thank God.

October 2007 - My Future Ontology Predictions


We will rely on and reuse external URIs in our work, rather than minting our own, as ontologies become more populous and stabilise.

This is certainly true of the work we do at EBI. Some ontologies, such as Gene Ontology, have been stable for quite a while and a couple of others have also followed a fairly stable road to persisting ontology URIs over time (including our own EFO which follows the GO's practice). What I really wanted to see was that once a URI is minted in an ontology it persists unless there is a very good reason for it to go away. Many more bio-ontologies do this than used to and in some ways, this is a measure of maturity of the community.

We will add dereferencable URIs for our data and put metadata behind it.

Partially. This happens for most ontologies now which is a definite step in the right direction. For example, OntoBee does a lot of content negotiation for OBO ontologies which is nice. The data side still lags behind but we are looking into that internally now. I suppose overall the data part is less true than I would have wished but this is a classic chicken and egg. The promise comes later, when everyone does it, but until everyone does it, it is not immediately obvious why you should. We need to be bold.

As ontology numbers increase and overlap, mapping between them and data described with them, will become our biggest challenge.

I think this is true and I am still concerned by it. Where this is not true is that there is not a huge amount of data published using all of these ontologies as I perhaps envisaged in 2007. But I still maintain that when more is published, the mapping problem will be difficult. Having said all of that, I am also unconvinced that building one ontology for each domain by attempting to get all communities to agree to every definition of every biological concept is the answer either. I've seen ontologies grind to a halt over analysis-paralysis over the last five years and this is also not the way to go. Sacrificing one critical problem for another is not a good solution.

Agent technology will help in our mappings and in the way we discover data.
Pretty much didn't happen - in bioinformatics anyway. But I think this was because I was overly-optimistic about how much of the infrastructure would exist in this semantic web world. It is worth though that Google's agents (web crawlers) do use rich snippets tags which includes an RDFa version about products on web pages to help populate the 'shopping' search you see. So I was a bit off the mark but not completely.

That biological triples would be championed by all.
I think this has definitely been wrong up until very recently and in some ways I am guilty of being sucked in by the hype - by the promise that integrating all of our RDF data would bring. This year though, the EBI has started an RDF Frontier group to trial this work. You can see the work Simon and I have be doing at our FGPT Atlas RDF page to see how this is progressing - well so far I'd say. I was a bit premature on this prediction. Which leads me on nicely to...

October 2012 - My future ontology predictions for the next five years


The number of reference ontologies will level off (and some will disappear) and natural 'winner' will emerge.
I make a deliberate distinction here by saying 'reference' ontologies as I think the number of ontologies put together for applications will likely increase but they are unlikely to be considered as references for a domain. I think funding for building these reference ontologies will fall and some may even become moribund sadly. But a lot of the important ones will live on and continue to develop. The natural winners - ontologies that become the de facto choice for a domain will emerge. We will need to find ways of using these ontologies that does not necessitate building a whole new reference ontology.

Upper ontologies will play a less important role in the community.
Some might say 'about time' but I do think they've had a role to play and have helped with some things even if the approach of those involved has been, shall we say, less than endearing. But I think their domination in every discussion about whether an ontology is 'good' and how an ontology fits into an upper ontology will decline in favour of focusing on how we can use the ontologies to describe our data and do biology.

Use of ontologies and semantic web technologies in Bioinformatics will become ubiquitous.
This is ambitious but I think it should and will happen. I'm convinced, even from some of the early prototyping work we've been doing, there is enough data out there now that warrant applications for biologists to use.

Publishers will curate literature using ontologies and make the API to these annotations public.
I think the great work that happens already in GoPubMed should happen for more ontologies and for more publishers. It's just great. For more on this I refer you to Phil Lord who wages a one-man war for more semantics in publications (amongst other improvements in the industry).

Google will endorse the semantic web.
Or they will at least admit that it's useful. They already use semantics this with rich snippets. I'd like to see them support this area of web science and I think they, eventually, will. If they outwardly endorsed this, then who knows, certainly I think more people would use semantics when publishing their data on the web.

I'd love to hear more from those in the community who are willing to stick a stake in the ground.