Thursday 14 June 2012

URIGen the URI generation service

A quick post on a tool Simon Jupp has been developing in our group - URIGen. It's a small thing but very useful for anyone involved in concurrently editing ontologies or attempting to automate the minting of new URIs to avoid conflicts. We've been using this for EFO and SWO and we think there are plenty of others could benefit. OBI in particular comes to mind because URIs created there are not confirmed until an official release; URIGen could make the URI available immediately for use. Here are the basics.

Problem
URIs are used to uniquely identify resources within an ontology. If two resources share the same URI they are considered the same thing - if they are different things they should not share the same URI. In a lot of bio-ontologies, 'semantic free' identifiers are used  when creating URIs to ensure meaningful content is not embedded within the URI (for reasons I won't go into here, that's another post). This often takes the form of a simple accession number, i.e. a number that is simply incremented each time a new class is created GO:0000001, GO:0000002 and so on. To ensure unique resources are not accidentally allocated the same URI (when they should be different) we need a method of manging what new URIs are created (often called minting) when we hit the 'new' button in a tool like the ontology editor Protege or other tools.
The URIGen console controlling and monitoring URI creation by multiple people. Duplicate URIs can never be created. And you can also watch what people are doing. Your boss is gonna love it. (click to enlarge)

Solution
URIGen is a client-server tool which controls how URIs are generated when used in tools such as Protege. The tool is installed on a server which can be connected to from Protege, or via an API call, by a client (user) and will take over the generation of new URIs when a new class or property is created. A user is given a unique API key which is required to connect to the server, ensuring a level of security. The form of the URI can be configured by URIGen, such as deciding where the numbers should start, what sort of prefixes might be used (e.g. 'GO' in our previous example) and so on. The base URI of the ontology is used to tie to URIGen to a set of these preferences. So, for example in the figure above, we can see that the ontology (3rd column) is SWO core and this uses the preferences for that ontology but further down the SWO version ontology adds a slightly different form of URI. You can see these differences in the left most column. The server is synchronised to avoid deadlock and to ensure that a URI is only ever allocated once.

Availability
Find the tool and documentation at http://code.google.com/p/urigen/


1 comment: