AGUIA is a front-end web application originally developed to manage clinical, demographic and biomolecular patient data collected during clinical trials at MD Anderson Cancer Center. The diversity of methods involved in patient screening and sample processing generates a variety of data types that require a resource-oriented architecture to capture the associations between the heterogeneous data elements. AGUIA uses a semantic web formalism, resource description framework (RDF), and a bottom-up design of knowledge bases that employ the S3DB tool as the starting point for the client's interface assembly.
The data web service, S3DB, meets the necessary requirements of generating the RDF and of explicitly distinguishing the description of the domain from its instantiation, while allowing for continuous editing of both. Furthermore, it uses an HTTP-REST protocol, has a SPARQL endpoint, and has open source availability in the public domain, which facilitates the development and dissemination of this application. However, S3DB alone does not address the issue of representing content in a form that makes sense for domain experts.
The heterogeneity of data produced by biomedical research creates a serious challenge to the interoperability and consistent aggregation of data , which renders the development and maintainance of web applications correspondingly more time consuming and resource intensive . This reinforces a preference for front-end applications that are automated and web-based as much as possible. The semantic web resource description framework (RDF) offers particular advantages in this regard, as its data structure can contain a combination of domain ontology components as well as the graphic rules ontology. The domain ontology predates the work described here in the sense that it was previously identified and is regularly and independently changed by the domain experts. The regular changes in the domain ontology typically reflect new data sources, but may also correspond to a novel understanding of an old relationship between data elements. We proposed such an ontology incubation process in 2006  and a formal model in 2010 ; an example of its application to lung cancer research illustrates the use of the simple sloppy semantic database (S3DB) tool to identify and maintain the correspondingly fluid RDF stores . By contrast, the AGUIA ontology described in this report is fixed: it is designed to mediate the automated presentation of the fluid domain ontology and is therefore independent from the domain ontology. The instantiation of the AGUIA ontology involves the specification of the links between the graphical user interface (GUI) ontology and the domain ontology. Where this association is not specified, AGUIA will use default values to produce a default and a rather dull interface that will respond to the topology of the domain ontology. The dependency between S3DB collections and items (tabs), and rules and statements (rows) for the domain ontology is graphically captured as tabbed navigation. Therefore, the context for the automated interface assembly work described in this report is that of accounting for user bias/context in the representation of a knowledge base. In other words, the domain ontology serves as an underlying structure to describe the data content of the web application that describes the concepts and their relationships in the specified domain. This ontology was created to store the data from clinical trials for gastrointestinal cancer. The graphic model is based on graphic rules that describe the graphic structure of the web application. This model contains the graphic components of the web application and their association with the domain ontology. The web application is able to assemble a GUI through this RDF data structure. The supplementary material contains more details about the structure of data entry and the construction/instantiation of each component of the model. It includes a video demonstrating how the same AGUIA application is alternatively pointed to different S3DB deployments (see Materials).
The web application AGUIA was developed to manage clinical data through clinical trials semantic data services. Semantic data services are data objects with domain-specific semantics and technology standards that are used to provide secure real-time access to existing data sources. In addition to providing the context for sharing information based on program needs, semantic data services also support the dispersed data ownership requirement that generally exists for these programs. In a nutshell, semantic data services allow for the discovery and management of semantic relationships across information systems in a timely manner and on a large scale . In this context, the clinical trials semantic data services (CTSDS) developed within S3DB deployments are semantic data services (and include a SPARQL endpoint) directed toward clinical trials. Through the CTSDS, it is possible to capture common characteristics of a particular group of patients and generate results that can best be examined by the physician. For example, it is possible to capture the data for all patients who smoke and have the same tumor type, such as a grade 2, moderately differentiated tumor.
The goal of developing a specification and standard that autonomously assembles a graphical user interface for a data source has a tradition in web technologies that can be traced back as far as hypertext in the use of HTML elements to manipulate the web browser's document object model.
The emergence of semantic web technologies has not entirely overlooked the assembly of user interfaces. This is particularly clear in RDF elements such as rdfs:label or rdfs:comment . These elements anticipate the need for interface descriptors by other semantic models. Upon close scrutiny of the specifications, one finds that such elements are indeed widely used. For example, in W3C's simple knowledge organization systems (SKOS), the definition of the relationship skos:broader  comes with the indication that its rdfs:label is "has broader." SKOS  is a simple RDF schema for knowledge organization systems (KOS) such as thesauri, classification schemes, subject heading systems and taxonomies within the framework of the semantic web. Like many other RDFS-Plus  modeling efforts, encoding this information in RDF is ultimately geared toward the facilitation of interoperability between computer applications by creating a data infrastructure ecosystem in which the semantics of data generation and usage are both explicit in the representation .
The initial motivation for the work described here was the development of a web application to manage clinical data within the Gastrointestinal Medical Oncology Department of MD Anderson Cancer Center. It was clear from the beginning of the project that the diversity of users/usages and the fluidity of the underlying data model were not compatible with a conventional web application with a static layout. This is because the underlying schema are not only continuously undergoing significant changes, but those changes are not always the same for all involved. Therefore, it was decided that the ideal interface would be an autonomous browser-based web application that would respond to one or more independent descriptions of the data schema and graphic model in order for it to be used in multiple projects/representations. To enable the automatic assembly of the web application, a set of RDF descriptors was created by re-sorting the knowledge model schema  of an S3DB database . The web application would then be able to use these descriptors together with the ontology that is the clinical data project, which is also stored on S3DB, to assembly itself. This rationale follows a proposal by other researchers that a knowledge base requires more than terminology and assertion components (TBox and ABox, respectively). Translated to correspond with a modern emphasis on semantic web formalisms, the RDF descriptors could also include a GBox component. Whereas ABox and TBox are terms coming from description logic , GBox was proposed by Motik  to describe how that information should be displayed.
In the work described here, instead of seeking to identify high-level graphic relationships to maximize reusability, we have taken the opposite route. First, a suite of graphical user interfaces was developed in response to specific requests from clinical researchers, and only then was an effort made to identify design patterns that could be captured as a specialized RDF protocol. The justification for this reverse approach is that starting from a graphic configuration that was already meaningful in relation to a specific domain will lead to a set of descriptors that are more easily recognized - and used - by those domain experts when they engage information management systems.
In spite of the domain-rooted approach followed at the beginning of this work, the clinical-trial-driven interfaces also represent a starting point for a more generic formalism-oriented approach that was originally proposed for intelligent information presentation systems (IIPS) . That early work found a natural extension in the emergence of extended markup languages and subsequently in semantic web formalisms, leading to efforts to model web site interfaces such asWebML  and OntoWebber . Furthermore, as noted by Lei et al. , those models could be enhanced by also considering key user interface issues such as page layouts and graphical user interfaces. AGUIA considers these interface issues, such that an interface generated by AGUIA should allow for multiple user-specific layouts.
Methods: browser-based application development
Testing and evaluation was performed by tracking usage and response times. Screencasts of typical usage were recorded and are provided with this report (see Results). Note that in the S3DB system (see Materials), data modeling is performed by the users themselves through the definition of S3DB rules.
Materials: semantic database web service
The S3DB, which we used as our database web service, is an infrastructure for distributed data servicing that relies on semantic web concepts for the bottom-up management of heterogeneous domains . It provides a bridge between a mass of structured data annotated by using personal ontologies and a globally referenceable semantic representation indexed to controlled vocabularies . The S3DB web service exposes its API through a read/write REST protocol, S3QL. representational state transfer (REST) is a coordinated set of architectural constraints that attempts to minimize latency and network communication, while at the same time maximizing the independence and scalability of component implementations. This is achieved by placing constraints on connector semantics, in contrast to other styles that focus on component semantics. REST enables the caching and reuse of interactions, dynamic substitutability of components, and processing of actions by intermediaries in order to meet the needs of an Internet-scale distributed hypermedia system . The S3DB database is also capable of producing its output in a variety of formats such as tabular text, XML and JSON, in addition to the generation of RDF in n3 or XML . The open source application is made publicly available for a variety of operating systems. The RDF language provides a simple and flexible way of representing knowledge by breaking data structures down into dyadic predicates (triples) . In other words, RDF is a general-purpose language for representing information in the web  and providing interoperability between otherwise incompatible domain models and formats. The RDF has an official query language created by W3C, the SPARQL. The S3DB is also able to receive SPARQL queries, which are then internally converted into S3QL , as recently illustrated for the cancer genome data .
Ultimately, the S3DB web service was developed to test the hypothesis that a user-editable schema and streamlined interoperability would facilitate the acquisition of biomedical data within the biomedical context and by the biomedical domain experts themselves .
The S3DB deployment, and by association its query language, S3QL, uses SQL to operate a regular relational database backbone. Most existing S3DB deployments use postgreSQL or mySQL. Other SQL databases that have been tested do not appear to pose a major limitation at that level. The performance is that of the supporting database, to which we add an overhead of migrating user permissions between entities of the S3DB data model. An online tool, available at http://s3db-operator.googlecode.com webcite, illustrates the inner workings of this last component.
The research prototype of this database is mature enough that a few deployments have been adopted by MD Anderson Cancer Center and are subject to the same strict security audits of any other research tool dealing with sensitive data: see "Internal Services" at http://bioinformatics.mdanderson.org/ webcite. Also a number of external, public tools use it to service large datasets such as those produced by the cancer genome atlas (TCGA). For an example, see the DNA copy number browser at http://cnviewer.googlecode.com/ webcite. Currently, the scalability of this web service is that of the relational backbone - which is indeed more stringent than, for example, a map-reduction store.
RDF protocol and list of graphic rules
The automatic assembly of the interface relies on two data sources, one being the target observational data and the other containing the graphic rules for the assembly of the interface. The graphic rules contain two parts: GUI actions and GUI rules (Figure 1), both of which are collections of an S3DB project. S3DB collections are described according to the RDF triple - subject-predicate-object - and S3DB rules are the subject or object of the RDF triple . The GUI actions collection contains the actions that can be created. These are described in Figure 2. The GUI rules collection contains the instantiation of each graphic component. This collection contains the rules domain, range, action, trigger and value.
Figure 1. Relationship between GUI rules and GUI actions. Illustration of the relationship between GUI rules and GUI actions to generate RDF triples: GUI rules are S3DB rules predicated by S3DB items of the S3DB collection of AGUIA actions.
• Domain will contain the ID of the S3DB rule that will command the action.
• Range will contain the ID of the rule that will receive the action.
• Action will contain the action inserted in the GUI actions collection.
• Trigger will contain the value that will be tested.
• Value will contain the value that will be put in the range.
These collections compose the RDF triple. The GUI actions collection provides the predicate of the RDF triple for the GUI rules collection (see Figure 1). The GUI rules collection then instantiates each graphic component. The subject of the RDF triple can be a domain or a domain plus a trigger (the value that will be tested in the object if the predicate is some test action). The predicate corresponds to the action generated by the GUI actions collection and the object is composed of the range or the range plus a value (a value that will be placed on the range). The supplementary material includes examples that explain each of these rules and also provides the file for the project of the graphic rules (GUI rules project) with the GUI rules collection and GUI actions collection that contain data to download. They can be imported to S3DB just as in any other S3DB project.
Specifically, the web application requires the definition of up to 7 input parameters, three of which are mandatory - two are the locations for each of the two web services and the third is for an authentication token. The other four parameters will narrow the definition of the data elements on which the interface focuses and extend the range of data sources by allowing multiple deployments and users to be used. The access parameterization is extensively detailed and discussed in the documentation provided at the open source project hosting of AGUIA (see Methods).
Both the data source and the graphic rules source are presently expected to be S3DB databases, accessed using a REST protocol, S3QL (see Materials). As discussed later, this reflects the current lack of standards for writing documents in the RDF more than it reflects a narrow focus of the S3DB prototype. The S3DB data acquisition effort described in this report was pursued in the Department of Gastrointestinal Medical Oncology at the University of Texas MD Anderson Cancer Center; hence, these data are of biomedical origin. Specifically, over a period of one year, clinical and biomedical researchers in this department submitted to an S3DB deployment a variety of clinical and biomolecular data totaling over 1 million independent S3DB "statements," which presently describe 1369 patients. These data were imported to S3DB through a script that receives the data in a tabular format. This script generates queries in S3QL, which transforms the data into S3DB RDF statements. Any S3DB project can be used to feed into the generic front-end AGUIA, although ideally AGUIA will also access a second S3DB project in which the AGUIA rules and actions are defined. The main operational advantage of relying on the S3DB web service is that the deletion, insertion and updating of data can be done through S3QL queries, whereas the reference SPARQL queries are limited to data retrieval. It is important to highlight that the data submission is completely decoupled from the configuration of the display. Consequently, it is easy to configure one or more alternative interfaces for the same data by targeting the RDF-based description of the domain (see Figure 3 for an example). More details about the data submission to S3DB can be found in the supplementary material. Figure 3 shows the ontology of the illustrative clinical trial project as a graph of user-submitted S3DB "rules" . The report we reference includes graphic representations of the same RDF set using different RDF browsers that are both academic and commercial. Note in this figure that the patient ID collection plays a key role in aggregating the other collections. The supplementary material includes additional documentation and an extended discussion about how AGUIA handles the "main collection" and how the user works with this collection when performing a search, insertion or update of data.
Figure 3. Ontology of clinical trials data project. User-defined ontology of the clinical trials data project in the Department of Gastrointestinal Medical Oncology, MD Anderson Cancer Center. The user describes the ontology by defining "rules," which in turn are instantiated by "statements." The oval node represents the root of the ontology; the red nodes represent rule elements that are collections of items; the green nodes represent rule objects that take literal values.
The AGUIA can be pointed to the URI of any of the red nodes in Figure 3 (S3DB collections) to start the process of assembling the automated graphical user interface. The graphic rules come from a second data source, which, unlike the first data source, comes from a project with a predefined ontology. More specifically, this second data source for the graphic clues is a distinct S3DB web service that instantiates a fixed set of S3DB rules (Figure 4). The RDF model proposed here was derived from those rules.
Figure 4. Graphic model of S3DB items11 × 5 S3DB rules. Graphic model as a set of 11 S3DB items (1-11) and a set of 5 S3DB rules (A-E). See Figure 2 for a translation into RDF schema by cross-tabulating the 5 GUI rules by the 11 GUI actions. The type of relationship for the GUI rules (S3DB items for the GUI actions -- GBox; S3DB rules and S3DB collections -- TBox; and S3DB values -- ABox) identifies the type of data that instantiate each of the 16 parameters of the model.
The graphic rules project (Figure 4 and Figure 2) is populated with statements that specify how to assemble the web application. In other words, by submitting statements that instantiate those rules, the user is configuring the graphical user interface as modular components. The graphic model described in Figure 4 corresponds exactly to the set of S3DB rules linking two object collections - GUI rules and GUI actions - with, respectively, 5 and 11 literal values. These rules (the 14 types of values) can be instantiated as many times as is required, and in any way needed to produce the desirable graphical appearance. This figure shows 3 parts of the knowledge base (KB): the GBox, TBox and ABox. The AGUIA GBox contains only 11 graphic components that can be used alone to compose the layout of a web application. Conventionally, the TBox indicates the terminological component, which in the AGUIA context has a domain and range that are S3DB rules, which themselves point to S3DB collections and/or S3DB rules. Finally, the ABox indicates the assertion component, which links to the context of AGUIA's GBox as values/literals that trigger AGUIA actions (Figure 4 and Figure 2).
As highlighted in the Introduction, the identification of this model was pursued by decomposing a diversity of layouts requested by clinical researchers such that any of the layouts requested can be automatically produced by AGUIA from instantiations of this model. The translation into the RDF schema of the relationships described by the 15 S3DB rules represented in Figure 4 is the essence of the modeling work detailed in Figure 2. In this figure each RDF row represents in RDF language the instantiation of each graphic component. For example, the date of birth is always a date, therefore the specification of the nature of the data can trigger its visualization. The supplementary material also provides examples of the instantiation of each graphic component in turtle language.
The goal of the work described here was to develop a front-end web application that may be reused for different projects with different needs. The use of the AGUIA web application was illustrated in two projects with widely varying needs: a project that contains clinical/molecular data collected from patients with gastrointestinal cancer, and a workshop project that contains data collected from participants in a workshop, including their home institution, research interests and personal data. The former test case was used to produce the illustrations in this report; the latter was used for illustration in the screencast video (see the supplementary material).
Figure 5. Automated GUI assembly process. Automated GUI assembly process: A) individual queries converted into independent URL calls in S3QL or SPARQL format; B) decomposition of the elements of the query that pertain to the domain description and to the graphic layout; C) translation of those query results into an AGUIA graphic document model (using notation 3: Project Collection Item; Rule Statement); D) merging of the two data structures back into the GUI object model, which is reflected back to the browser's own object model (DOM), as described in Figure 2.
Note that the absence of a graphical annotation (GBox) will not prevent the application from assembling the graphical interface automatically. The sole difference is that its structure will be based on only the structure of the relationships between the data elements. Therefore, the graphic annotations provided through the GUI rules and actions can also be thought of as a way to further direct the GUI assembly process. After it is assembled, the AGUIA is able to search, insert and update registres through S3QL and SPARQL queries that are made behind the scenes by GET HTTP calls. The supplementary material includes explanations and screencast illustrations of how to search, insert and update data using AGUIA.
The AGUIA web application is currently used by distinct groups of users in the Departments of Biostatistics, Bioinformatics and Gastrointestinal Medical Oncology at MD Anderson Cancer Center. Figure 6 depicts a snapshot of AGUIA when it is pointed to the gastric oncology clinical trial research database of the Gastrointestinal Medical Oncology Department. Both goals highlighted in the Background section for this application - the autonomous assembly of the user interface in the browser in response to the invocation of an independent GBox descriptor - were fully achieved. Consequently, the application was shown to accommodate the characteristically volatile schemas and to automatically assemble new interfaces in response to changes in the data model, without the need for additional coding. As noted in the Background, data schemas were observed to change to reflect both new research data and new researchers.
Figure 6. Snapshot AGUIA. Snapshot of the logon screen when AGUIA is pointed to GI clinical trials S3DB web service, followed by the start page assembled by AGUIA when configured to target resource "Patient ID." Selecting another resource, say "Tissue," will produce the exact same graphic interface as if the initial target had been that resource: the operation of AGUIA is fully based on representation state transfer (REST) calls. Part A shows sub-tabs; the action used to produce this field is "hasSubCollections." Part B shows a text field; the action used to produce this field is "hasText." Part C shows a date field; the action used to produce this field is "hasDate." Part D shows a formula field; the action used to produce this field is "hasFormula." Part E shows inactive fields; the action used to produce these fields is "hasFieldInactive." Part F shows options fields; the action used to produce these fields is "hasOptions."
The user can test AGUIA to this database through a public login (url:http://ibl.mdanderson.org/edu webcite username and password: public). This demonstration project describes fictitious patients but the data structure is exactly the same as that in use for gastric cancer clinical trials. The video in the supplementary material describes the login in this project and allows the viewer to see the assembly of the web application. By comparing Figure 6 with the ontology of the data store (Figure 3), it is apparent that a substantial number of GUI rules and GUI actions must be in use for this clinical data service. The list of S3DB rules (RDF describing the clinical domain) and the list of GUI actions and rules that modulates the assembly of the graphical user interface are provided with the supplementary material, as part of the documentation of AGUIA at the open source repository. This material includes a file in turtle/n3 format that provides the graphic rules and an exhaustive discussion of how each GUI action is interpreted by AGUIA. Examples of its use in the gastric oncology case study are also provided as supplementary material at the same URL.
The relevance of the assembly of the interfaces illustrated in Figure 6 in response to REST calls is that the navigation of the interface itself is a succession of REST calls. For example, the opening page was triggered by the following URL being sent to the web browser:
The test/evaluation of the illustrative case study involved 1369 patients, which corresponded to approximately 1 million independent S3DB "statements" , as highlighted in the RDF protocol and list of graphic rules sub-section. The test/evaluation was divided into the following steps: application assembly, search, and view, insertion and update of data.
- Application assembly
The illustrative application described in this report is assembled in about 35 seconds if all graphic components are used, and in about 29 seconds if only one of them is invoked. In either case the data store has to be reached and the GBox retrieved and processed, which accounts for most of the assembly time. After the application has been completely assembled, all subsequent actions involving rearrangements of components of the interface become nearly instantaneous because all graphic components are contained in the memory and no further consultation of the store's GBox descriptors is needed. For all practical effects, the browser-based application is equivalent to a stand-alone application except for the important fact that no "download" and "installation" steps are needed.
The search operation of the application can be simple when it involves only one node (RDF resource) and complex when it involves more than one node. In either case the search operation consists of translating the use of graphic elements assembled as unstructured by the GBox into SPARQL queries, which are then issued back to the data store. For an example query and the screencast showing the graphic operations involved in generating them, see the supplementary material at [http://sites.google.com/site/aguiadocumentation/documentation/how-to-search webcite]. Both the single node and multiple node scenarios are analyzed hereafter by considering three scenarios of increasing complexity.
First case: Search involving only one node (one level)
In this case the search is realized directly (one level), for example, the search by participant number of all patients of the Patient ID collection (Figure 7).
Figure 7. One node search. Search involving only one node.
Second case: Search involving one parent node and one child node (two levels)
In this case the search is performed at two stages (see Figure 8). In the first level the query contains the parent node (PatientID) and the second level contains only one child node (Demographics). An example of this type of search is a search by participant number and name of all the patients in the Patient ID and Demographics collections, respectively.
Figure 8. Two nodes search. Search involving one parent node and one child node.
Third case: Search involving one parent node and two child nodes (two levels)
In this case the search is again performed at two stages (see Figure 9). For example, a search by participant number, gender and tumor grade of all patients, seeking to identify those of female gender with a G2 moderately differentiated tumor, would involve the Patient ID, Demographics and Histology collections, respectively.
Figure 9. Three nodes search. Search involving one parent node and two child nodes.
- Execution time
The three preceding scenarios were timed to assess the performance of the application. Figure 10 illustrates the length of time required to realize a search in each one of the cases.
Figure 10. Search time. Search times for the three cases.
The results depicted in Figure 10 show that each child node added 10 seconds to the search time. The analysis of the code execution shows that this corresponds to the addition of each node to the SPARQL query, suggesting linear scaling for the query performance.
- View, insertion and update
The illustrative web application was configured to view, insert and update data one patient at a time. This takes place in one of two possible ways: View, insert and update patient data to only one node (only one collection on S3DB); or view, insert and update all patient data to all nodes (all collections on S3DB). Figure 11 describes these two scenarios, including the time associated with each operation in the context of the illustrative clinical trial application and a data store of 1 million statements. As reflected by those values, the process of data store indexing is the point at which semantic databases are comparatively slow.
Figure 11. Execution time to realize view, insert and update. Time required to realize each one of the operations (view, insert and update), with one or all nodes.
The performance and operation of the prototype can be seen in a screencast video in the supplementary material (http://sites.google.com/site/aguiadocumentation/video webcite). This video shows a typical usage of the AGUIA interface.
The knowledge base  defined by RDF does not, per se, make distinctions between assertions (ABox) and terminology (TBox) components. In other words, RDF does not by itself differentiate between what is a domain and what is an instantiation of knowledge. This is one of the three reasons why we resorted to S3DB as a mediator  in the creation and management of the RDF. The distinction between TBox and ABox triples corresponds, respectively, to S3DB rules and S3DB statements [5,25]. The second reason for resorting to this data service application for the study described here is the convenience of its REST API, which bodes well with the intent to produce an application that is native to the web-browser environment and which therefore can be easily used by a wide variety of domain experts in a number of computational environments. The last reason is that this work is precisely configured to test the hypothesis than an editable schema and streamlined interoperability do make a difference .
The main limitation of the study is the restriction of only accepting an RDF that already distinguishes between the ABox and TBox. Furthermore, and with regard to the AGUIA ontology itself, the user cannot assert types of actions beyond the 11 that are listed in Figure 2, even if it is clear that a higher level abstraction would make a more generic use of the browser's document object model (DOM) native methods. The strengths of AGUIA when compared to other applications, such as OntoWebber  and WebML , include AGUIA's ability to assemble multiple layouts to reflect the multiple contexts of various users. In AGUIA, the user interface is more than a presentation of the page resources, it specifically responds to the semantics behind the information that is represented. As a consequence, when compared to an IIPS study , for example, an AGUIA browser-based application is not just more nimble but it is also simpler to use. When using IIPS, the user has a set of tools and needs to know how to use each one to get a satisfactory result in the assembly site. We argue that the literal and semantic attributes of the data can automatically drive that selection.
The AGUIA web application described in this report works with the RDF produced by the S3DB web service. Furthermore, the AGUIA graphic model, composed of combinations of GUI actions and GUI rules (Figure 4), is itself described by the RDF schema (Figure 2). Therefore, the only fundamental barrier to its application in the much wider RDF world in general is the identification of procedures that automatically annotate knowledge bases represented in that format so that they recognize its terminology and assertion components. It could be argued that RDF representations are inherently assertive , which poses a fundamental obstacle to its use in the elaboration of terminology. However, it is also clear that the view of the semantic web as a "web of linked data"  is particularly conducive to the universal aggregation required by the systems nature of both biological processes and biomedical infrastructures. Our experience with the GI clinical trials initiative described herein and also with earlier work with lung cancer research data  is that the accumulation of diverse data sources renders the emergence of comprehensive terminology irresistible.
It is therefore the expectation of the authors that in a domain as fluid as the life sciences in general and molecular biology in particular , data processing applications will have to include an iterative tool that will allow domain experts to experiment with the annotation while inspecting a self-assembled graphical user interface. If that direction is to be taken, then the graphic model we have described could be conceived as an integral component of the knowledge base - a GBox . Motik proposed "GBox" to indicate the graphic specification aspect of a knowledge base, as added to the conventional ABox and TBox distinctions. Accordingly, the layout of a GUI web application is conceivably totally describable as a formal GBox component of a knowledge base. The GBox, as shown in Figure 4, was reduced to 11 or fewer components. AGUIA demonstrates how these components alone can determine how the web application was assembled (Figure 6). Such a three-component paradigm would argue that the distinction between terminology and assertion will both have an effect and be a function of the graphic presentation. It would also follow that because different domain experts request different graphic representations (GBoxes) of the knowledge base, they may also be indirectly stating that they place the boundaries between the ABox and TBox differently.
The AGUIA application allows the clinical/domain user's requests and their replies to be automatically translated into graphic interfaces. The main challenge now is to automate as much as possible the processing of RDF "spaghetti" in order to distinguish domain from instantiation such that the automated interface assembly may respond sensibly. We are aware that this is a major challenge since the Web and the content it hosts is assertive (ABox). Accordingly, the main contribution of AGUIA will be to assist in isolating the automatic generation of a TBox that can be interactively checked by domain experts precisely because its automated graphic rendering is in place. The rendering process itself is greatly facilitated by recent improvements in the browser's graphic capabilities, particularly after the introduction of HTML5 and XForms (W3C), and by the open source libraries for user interface assembly that make use of them such as Orbeon . In summary, AGUIA provides a formal, automatable bridge between RDF documents and the browser's DOM-centric extensible syntax.
By developing the autonomously assembled interface applications in response to specific requests from various users who were interacting with a multiplicity of domains and platforms while working with gastrointestinal clinical trials, a number of conclusions became apparent. In regard to the identification of user-friendly, domain-aware interfaces, it appears that it is more effective to develop graphic annotations before settling for a rigid distinction between assertion and terminology, which is in contrast to the more conventional approach to ontology modeling. In regard to the challenge of deploying the applications themselves, it became apparent that modularizing the interface assembly using REST protocols is particularly effective because it does not require a distinction between the universal resource identifiers (URI) that target data elements and those that configure the assembly of the graphical user interface. In conclusion, the long standing artificial intelligence (AI) challenge of contextually aware interfacing appears to benefit from the same RDF-based collaborative annotation that is behind The Web of Linked Data. The data-driven user annotation of graphic rules (GBox) was observed to benefit the automation of the graphic interfaces used to interact with those same data elements.
AGUIA: Autonomous Graphical User Interface Assembly; RDF: Resource Description Framework; S3DB: Simple Sloppy Semantic Database; HTML: HyperText Markup Language; DHTML: Dynamic HyperText Markup Language; DOM: Document Object Model; CSS: Cascading Style Sheets; PHP: Hypertext PreProcessor; REST: Representational State Transfer; S3QL: Simple Sloppy Semantic Query Language; URI: Universal Resource Identifiers; GUI: Graphical User Interface; TBox: Terminological Component; ABox: Assertion Component; API: Application Programming Interface; GBox: Graphic Component. CTSDS: Clinical Trials Semantic Data Services; W3C: World Wide Web Consortium.
The authors declare that they have no competing interests.
MCC participated in the design and developed the application, identified the data model, and drafted the manuscript; HFD participated in modeling the database; HFD assisted with the S3DB web service; YA, SVP and JAA provided clinical data and domain expertise; ATV and JSA supervised the project and improved the manuscript. All authors read and approved the final manuscript.
MCC acknowledges support by the CAPES Foundation (Brazil) award CAPES/LNCC 31036015, CNPQ and FAPERJ. This work was also supported in part by the Center for Clinical and Translational Sciences under contract 1UL1RR024148 from the NIH (CTSA).
Deus HF, Stanislaus R, Veiga DF, Behrens C, Wistuba II, Minna JD, Garner HR, Swisher SG, Roth JA, Correa AM, Broom B, Coombes K, Chang A, Vogel LH, Almeida JS: A Semantic Web Management Model for Integrative Biomedical Informatics.
Semantic Data Services for: Enterprise Data Interoperability [http://xml.gov/documents/completed/metamatrix/semanticdataservices.htm] webcite
SKOS Simple Knowledge Organization System Reference [http://www.w3.org/TR/2009/REC-skos-reference-20090818/] webcite
24th IEEE International Conference on Advanced Information Networking and Applications 2010, 2010:2-3. Publisher Full Text
Principled Design of the Modern Web Architecture [http://www.ics.uci.edu/~taylor/documents/2002-REST-TOIT.pdf] webcite
The pre-publication history for this paper can be accessed here: