Biomedical ontologies are being widely used to annotate biological data in a computer-accessible, consistent and well-defined manner. However, due to their size and complexity, annotating data with appropriate terms from an ontology is often challenging for experts and non-experts alike, because there exist few tools that allow one to quickly find relevant ontology terms to easily populate a web form.
Although there are many potential applications, we developed OntologyWidget to help microarray researchers use ontologies and controlled vocabularies to describe and annotate their experiments. Using biomedical ontologies to annotate experimental designs, experimental variables and biological samples helps to achieve consistency, to provide the opportunity for complex searches and to allow computational access to the annotations. However, in the process of annotating a single investigation, a researcher might require several large and complex ontologies. For example, the NCBI Taxonomy might be used to describe the organism used as the source of the biological sample being studied, while an anatomy ontology might be used to describe the tissue that was sampled. Because most experimental annotations are best provided by the researcher(s) who conducted the experiments, and it is unlikely that bench biologists would have had the time to become power users of several ontologies, we wanted to provide an easier method to use ontologies that would be rapid and require no expert knowledge of the ontology. Thus, in an attempt to ease the experimental annotation process and decrease the burden of entering high-quality annotations, we developed the OntologyWidget. OntologyWidget fulfilled our requirements to (1) provide a graphical, web-based tool, (2) have a fast auto-complete term search, (3) enable users to view and browse ontology structures, (4) provide a simple means to search more than one ontology, (5) provide the means of collecting ontology terms to post to another web page or to another program and (6) be deployed by our collaborators and others with a minimum of effort.
As of August 2007, there were 66 ontologies listed on the Open Biomedical Ontologies web site , describing biological concepts that range from development to anatomy to taxonomy to experimental approaches. Several tools already exist to search and browse ontologies, such as the Amigo Browser and the Ontology Lookup Service (OLS) from the EBI. The Amigo browser  is a web-based tool that allows a user to browse, query and visualize terms from any ontology formatted in the OBO format. It can be installed as a Perl Application, provides a normal text search as well as a convenient means to browse and explore trees. However, Amigo does not allow one to select terms that can then be posted to another web form, its installation is fairly complex, and it does not allow users to dynamically find terms within the ontology while typing. The Ontology Lookup Service  from the EBI, which was published while this work was in progress, supports many of the same ontologies that we do and has similar browse and auto-complete features. The OLS also identifies the relationship type between terms, gives associated information and descriptions for the terms where available. It fulfills many of our needs, except that is part of a much larger software package. Its installation has a large footprint requiring around 40 Java jar files. Therefore, the OLS did not fulfill one of our requirements, that it is easy for our collaborators to install and use on their websites. Thus, we developed the OntologyWidget, which can be embedded in a web page in a relatively straightforward fashion, and used with other web-based annotation applications.
Results and discussion
Features of OntologyWidget
An instance of the OntologyWidget in a web page can be used to search and browse a single ontology, but multiple instances of the OntologyWidget can also be embedded in a single web page, allowing a user to rapidly annotate a biological entity with terms from several ontologies using the same web page. In addition, the OntologyWidget can be configured to use only a specific subsection of an ontology, by specifying a particular node to be used as a root. In the example provided in Figure 1, three different ontologies are presented to the user: part of the MGED Ontology , the Biological Process portion of the Gene Ontology  and the NCBI taxonomy . In Figure 1, both the MGED Ontology (MO) and the Gene Ontology are shown with different terms presented as the root terms. In the case of the MGED Ontology, we have limited the widget to use "BiologicalProperty" as the root term. Thus, the top widget could be used to annotate one aspect of the experimental approach that was used, as this subgraph of MO lists various experimental designs. Limiting terms provides accuracy and ease of use – users will not accidentally enter terms that are incorrect for a given context. We can also specify a default selection for the ontology terms in the cases where a single term is commonly used in many situations (for example, using "Homo sapiens" as the default term when most of the research being conducted is based on human samples. As of this writing, SMD Ontology Web Services provide access to over 40 ontologies from the OBO site  for use with the OntologyWidget, though we have plans to incorporate all the ontologies on the OBO website.
Figure 1. Ontology widgets on an html page. Three instances of the OntologyWidget are shown on a single html page. Each widget contains both an autocomplete box, into which text can be typed, and an ontology browser, which can either be browsed from the top level, or can be opened at a selected node, based on the entered term. Each instance of the widget is connected to a different ontology.
Sometimes, a user of an ontology merely needs to find the correct term and does not need to know the context of an ontology term. In this case, OntologyWidget's text search and auto complete functions will be sufficient. The screenshot provided in Figure 2 illustrates OntologyWidget's auto-complete function in an example using the NCBI taxonomy. Note that the letters "esch" were typed into the text box and a long list of taxonomic terms that start with those letters are available for selection. The OntologyWidget text search also accepts the wildcard character, *, and is case-insensitive. When a user identifies the correct term in the list of results, it can be selected to populate the text box. We limit the number of matching terms to 50 for the sake of performance. The last item on the list is "More terms available ..." in this case to indicate that more than 50 terms match the current typed-in sequence of characters. The user is required to refine their search by typing in more characters to see a more specific subset of the terms.
Figure 2. Auto-complete functionality. A user can begin typing part of a term (including wild-card characters), and potential completions of the typed in phrase are displayed in the drop down menu. In this case, potential completions of 'esch' from the NCBI taxonomy are shown.
At other times, it is important for a user to explore the context of an ontology term to determine if a different term (for example, one that is more specific or one that is less specific) would be more accurate for the given annotation. Figure 3 shows a screenshot of the ontology tree browser being displayed after a user entered a term into the text box and then clicked on the "Search" button. In this case, the term "cellular metabolic process" is highlighted among a nested set of parent terms all the way up to the Gene Ontology's Biological Process term. In this ontology browser display, children of a given node can either be revealed using the "+" icon or hidden using the "-" icon. Note that while the ontology tree browser display shows the direct path (or paths) from the root term to the input term, other child terms of other nodes within those paths are not displayed. Instead, a "..." icon is used to indicate that additional children exist for that parent node. When the user clicks the "..." icon, all the immediate child terms are displayed and the "..." icon disappears. The use of the "..." icon allows the initial ontology tree browser display to remain relatively uncluttered, and easier to understand and navigate. At this time, the widget does not display the details or description of an ontology term. We maintain this information in our database, and it is provided via our web services API, so future updates to our software will make it possible to dynamically provide this information.
Figure 3. The ontology browser display. When an autocompleted term is selected and the 'Search' button is clicked, all the paths from the selected term ("cellular metabolic process") to the root of the ontology ("Biological Process") are shown. Instances of the search term are highlighted as a visual aid. Selection of a more or less specific term from the ontology browser will populate the text box with that term.
OntologyWidget has been tested on both MacOSX and Windows XP, and is known to run successfully in Safari 2.0.4, Firefox 1.5.12, Firefox 188.8.131.52, Opera 9.02, and Mozilla 1.7.13 on MacOSX, and Internet Explorer 6 and 7, Firefox 1.5.8, and Opera 9.1 on Windows XP.
Client user interface via html
To incorporate the OntologyWidget on one's own web page, an html file must be created that correctly implements the OntologyWidget. Detailed instructions for incorporating the OntologyWidget into web pages are provided in the help documentation at our website, but here we briefly describe some of the necessary steps, and their purposes. First, the html file must initialize the SMDService module (described below in more detail). Second, the html file must define OntologyWidget parameters, and third, the html file must provide page locations for each OntologyWidget used on the page, and provide the hidden fields which report the data selected upon submission. Lastly, the URL for the location of the servlet proxy must be defined.
An instance of an OntologyWidget simply requires six pieces of data to be initialized:
• title that will be displayed on the OntologyWidget (string)
• ontology ID (database identifier, supplied by function call)
• ontology name (string)
• term name (string)
• label (string)
• unique identifier for the instance of the OntologyWidget in the html file (string or number)
The title and label provide context both to the user and to the author of the html file. Because one can put multiple OntologyWidgets on a single web page, the unique identifier (uid) is needed to keep queries and data directed to the correct instance of the OntologyWidget.
In the body of the html file, the form and the URL of the widget's action function are defined. Within the html form element, the location of each OntologyWidget instance is defined. When the page is posted, that is when a post or submit button is clicked, these data associated with each widget instance (and any other fields on the html page that the page implementer has chosen to be posted) will be submitted via an implementer-written submit function. As an illustration, the example html file included in the OntologyWidget download contains an example submit function that simply reports these fields in a table.
SMD Ontology Server
The SMD Ontology Server comprises a package of Java classes that receive http requests, query a database that contains the supported ontologies, format the data into XML and return the XML to the interface. The simplest way to use the service is through the proxy servlet that we supply as a .war file. Using this simple method, the proxy servlet is installed within a Tomcat server running on the implementer's server and the location of the servlet is referenced in all html pages that contain an instance of the OntologyWidget. Our Web Serviceapplication programming interfaces (APIs) are available, should the implementer wish to create their own software to use our web services. While we do not allow unrestricted programmatic access to our implementation because our database houses a lot of other data, the code can be obtained in the SMD code release.
The SMD Ontology Service is implemented as a servlet and responds to eight requests: getOntologyList, getChildren, getParents, getTermsByName, getRootTerms, getTerm, getAncestors and getDescendents. A complete API description is available on our website. Each request is processed by parsing the arguments, performing the database query necessary to fulfill the function, converting the query result into an XML string, and returning this string as the reply to the request.
Since performance is a paramount issue for this application, the Ontology Service diverges from typical web services in that SOAP marshalling and unmarshalling software are not used. These systems, while generally applicable, typically incur a large amount of overhead and a performance cost. We discovered that our ontology schemas that flattened the data caused unreasonable performance response for the type-in suggestion box. The performance hit was particularly evident when we were displaying a partial ontology. All terms that were suggested matches had to fall within the subtree of the selected root term. To address this problem, we have implemented a data caching scheme for the SMD Ontology Service on the server side. We created a data structure that holds all the paths from an ontology's root term to all its child terms. We serialized the path data structure into a Binary Large Object (BLOB). When Tomcat starts, a context listener class is notified that reads the BLOBs from the database and holds the data in memory for rapid access. Thus, some queries to the SMD Ontology Service do not need to query the database directly, resulting in a significant performance improvement.
OntologyWidget has already been implemented by collaborators maintaining the NIH Neuroscience Microarray Consortium website . As implemented there, OntologyWidget is used to help researchers describe the anatomical regions used to generate samples from mammals.
Availability and requirements
Project name: The Ontology Widget;
Operating system: Platform independent;
License: MIT Open Source License;
CCB implemented the OntologyWidget, wrote the documentation, provided the example files and contributed to the writing of the manuscript. FW implemented the SMD Ontology Service, created the .war file and helped troubleshoot throughout the project. MN and ZKZ worked on the server side caching and optimization of the OntologyWidget performance. HJ created software that enters ontologies from the OBO site into SMD. JHPS provided use cases and evaluated several versions of the developing software. CAB and GS created the software specifications, created the testing document, tested software throughout the project and contributed to the writing of the manuscript and help document. All authors read and approved the final manuscript.
We would like to acknowledge our collaborators at the Neuroscience Microarray Consortium who provided advice and testing, especially Allen Day (UCLA), Steve Day (5 AM Solutions) and Brent Gendleman (5 AM Solutions). We also thank Isaac Kohane (Harvard Medical School an Children's Hospital), who initially suggested we embark on this software project. Lastly, we thank our co-workers Donald Maier for his critical review of the OntologyWidget software and documentation and Janos Demeter for his thoughtful editing of the manuscript. This work was supported by a subcontract to P. Skene's grant (5U24NS051870-02) from NIH's National Institute of Neurological Disorders and Stroke.
Whetzel PL, Parkinson H, Causton HC, Fan L, Fostel J, Fragoso G, Game L, Heiskanen M, Morrison N, Rocca-Serra P, Sansone SA, Taylor C, White J, Stoeckert CJ Jr.: The MGED Ontology: a resource for semantics-based description of microarray experiments.