Towards the integration of mouse databases - definition and implementation of solutions to two use-cases in mouse functional genomics

Gruenberger, Michael; Alberts, Rudi; Smedley, Damian; Swertz, Morris; Schofield, Paul; Schughart, Klaus

doi:10.1186/1756-0500-3-16

Research article
Open access
Published: 22 January 2010

Towards the integration of mouse databases - definition and implementation of solutions to two use-cases in mouse functional genomics

Michael Gruenberger¹,
Rudi Alberts²,
Damian Smedley³,
Morris Swertz⁴,
Paul Schofield¹,
The CASIMIR consortium &
…
Klaus Schughart²

BMC Research Notes volume 3, Article number: 16 (2010) Cite this article

4522 Accesses
2 Citations
Metrics details

Abstract

Background

The integration of information present in many disparate biological databases represents a major challenge in biomedical research. To define the problems and needs, and to explore strategies for database integration in mouse functional genomics, we consulted the biologist user community and implemented solutions to two user-defined use-cases.

Results

We organised workshops, meetings and used a questionnaire to identify the needs of biologist database users in mouse functional genomics. As a result, two use-cases were developed that can be used to drive future designs or extensions of mouse databases. Here, we present the use-cases and describe some initial computational solutions for them. The application for the gene-centric use-case, "MUSIG-Gen" starts from a list of gene names and collects a wide range of data types from several distributed databases in a "shopping cart"-like manner. The iterative user-driven approach is a response to strongly articulated requests from users, especially those without computational biology backgrounds. The application for the phenotype-centric use-case, "MUSIG-Phen", is based on a similar concept and starting from phenotype descriptions retrieves information for associated genes.

Conclusion

The use-cases created, and their prototype software implementations should help to better define biologists' needs for database integration and may serve as a starting point for future bioinformatics solutions aimed at end-user biologists.

Background

At present, we are just beginning to appreciate the complexity of genotype-phenotype association in humans, but more detailed and comprehensive analyses in basic research are urgently needed. Although studies in humans are important, they are limited because of the size of cohorts, strong but often unknown environmental influences, poor and inconsistently coded diagnosis, and lack of repeatability. Therefore, animal models are absolutely essential to complement human studies; they allow the investigation of underlying biological mechanisms in well-controlled experimental systems.

In particular, the mouse is an ideal model system for studying genetic factors that contribute to diseases because genetic reference populations (GRPs) with a large number of allelic variants in many genes, combinations thereof, and many knock-out mouse lines with deletions in single genes are available [1]. Research on mouse model systems has generated valuable discoveries for our understanding of the biological mechanisms of the normal function of the immune system as well as immune abnormalities, cardiovascular diseases, cancer, and infectious diseases [2].

Consequently, funding agencies around the world have supported an increasing number of functional genomics projects focused on the use of the laboratory mouse as a model for human disease. The results obtained have been collected in various databases. However, in most cases, these databases represent single project outputs and are maintained at different sites. Exceptions are, for example, the mouse genome database (MGD) database of MGI [3], the mouse phenome database (MPD) [4], Europhenome [5] and the GeneNetwork database [6], which have collected information from many different sources. MGD is a database which has been optimized for researchers in the field of mouse functional genetics and genomics. It is constantly updated and manually curated and thus contains information of extremely high quality. Similarly, the GeneNetwork database contains phenotype and genotype information on mouse GRPs from the literature and directly entered source data, as well as tools to map quantitative trait loci. Both databases are extensively linked to other informatics resources.

However, there is a large volume of data in distributed databases that is not contained in MGI (Mouse Genome Informatics) or GeneNetwork and which are important for functional genomics studies (see the Mouse Resource Browser MRB [7]). Ad-hoc integration of these databases is very difficult. Many databases require a separate login procedure and need to be accessed using different methods (e.g. via a website, downloadable files or web services). Several resources do not adopt common standards e.g. using the same identifier for a given gene or protein [8]. In this case, a user may need to convert their gene identifiers to whatever the particular resource understands, e.g. MGI or Ensembl/mouse IDs, before starting a search.

As a first step towards new concepts for database integration, we have established a network of scientists from Europe, North America, Japan and Australia. The network is funded as a Coordination Action by the European Commission and called CASIMIR (Coordination and Sustainability of International Mouse Informatics Resources) [9]. The Coordination Action is aimed at recommending standards to allow data sharing and integration between different projects.

Much can already be achieved using query tools that ease selection and joining of distributed data, such as BioMart [10], and/or workflow tools that support stepwise data retrieval, conversion and integration, such as Taverna [11] and Galaxy [12]. A prerequisite is that sources provide programmatic interfaces for queries or workflow tools that can be used to access or import the original data. However, such interfaces are often not available. This challenge was addressed by Smedley et al. who federated BioMart and MOLGENIS [13, 14] in a Taverna workflow [15]. But these solutions are still too involved for many bench biologists to use directly for their research. Task-oriented user interfaces are needed on top of all these tools to more closely support biologists in their integrative analyses.

In order to gather the perspective of the end-users, the biologists, who will perform the actual data mining we designed use-cases together with biologists. Subsequently, two software implementations were developed on the basis of these use-cases to provide tools which could carry out the tasks requested by the users in the most practical format. Here, we describe two use-cases that arose as a result of our discussions with biologist-users during workshops, meetings and via a questionnaire. Furthermore, we demonstrate the first steps towards their implementation.

Methods and Results

Definition of the use-cases

During the first sessions with different user groups, some principle needs for data mining became apparent. These needs were further confirmed in subsequent meetings and demonstrations of development steps to biologist users. A user-friendly interface should not only query multiple databases but also allow for multiple search terms, allow iterative interactions, and contain a tool that allows storage of the results. Furthermore, most of the currently performed data mining in functional mouse genomics concerns genes, their functions and variants on one side; and phenotype descriptions on the other side. Based on these discussions, we designed two generic use-cases that should be suitable to a larger scientific community: a gene-centric and a phenotype-centric use-case.

Gene-centric use-case

The advent of high-throughput technologies in biology, such as gene expression microarrays, makes it now possible to identify, with the help of statistical and bioinformatics tools, large groups of candidate genes changing their expression levels in different experimental conditions. However, of the genes identified in this way, usually a few hundred, only a limited number of genes (in the order of 20-50) can feasibly be studied experimentally in the laboratory. Therefore, researchers prioritize the gene lists based on their own knowledge, literature, and additional information from many different web accessible databases, such as gene and protein descriptions, genetic diversity information, expression patterns in different tissues, etc. Since the searching of all these web databases by hand is very laborious and time-consuming, our user groups decided to describe a gene-centric use-case starting with an input of a limited number of gene names and aiming to facilitate easy and automatic collection of information about these genes from different sources. This process should be performed in an interactive fashion and allow storage and export of the results obtained.

An iterative user-driven strategy was developed based on the principles of an "online shop" (Fig. 1). Here, a customer can perform searches on the available data and collect them in a shopping cart. By performing additional searches for other data and by evaluating additional information on them, the customer can then decide to add or remove articles from his cart. Finally the collected articles are "exported" by executing an order.

Following the above principle, the integration of mouse databases via a gene-centric use-case should allow candidate gene symbols to be entered into a query form which then automatically collects basic information like synonyms, gene IDs, descriptions and genome locations for the entries (Fig. 1). Based on this information the user will then be able to refine the gene hit list by selecting the interesting genes and removing false hits. The final list will then be saved as a 'shopping cart' which can be revisited, modified, refined or extended. Finally, it should be possible to export the gene list in Excel-readable CSV format (Fig. 1).

A difficulty often encountered when performing analyses on genes, is that they have several synonyms and that in many scientific publications the systematic gene nomenclature is not followed (see [16]). Examples are RANTES (correct gene symbol Ccl5), MIP1a (Ccl3) and IP-10 (Cxcl10). For other genes, it may be not known to the researcher that they represent members of large gene families, and one has to choose one or all to proceed with the analysis. Examples are Hox, Fgf, Inhibin, and interferon genes. Here, we consider as the "correct gene name" the name which is given by the international nomenclature committees: Mouse (International Committee on Standardized Genetic Nomenclature for Mice [17]), human (HUGO Gene Nomenclature Committee [18]), and rat (Rat Gene Nomenclature Committee [19]).

It is thus important that the use-case allows entering any gene name, synonyms, incomplete names, etc., but still makes sure that the correct genes will be found. For this, entries will be searched in a first step against the MGI database for disambiguation [20]. For each gene name multiple hits may appear and the user is then able to select the correct ones and add them to the cart.

In a second step, it is possible to collect additional information from different databases for the genes in the cart list. Examples of databases are MGI and ENSEMBL/mouse for information on gene structure and links to other resources; Eurexpress [21], SymAtlas [22] and ArrayExpress [23] for gene expression information; and INTACT [24] for gene interaction data. After retrieval of this information the user may refine his gene list in a given cart by searching for other genes or deleting genes in the current list.

The list of collected genes in a shopping cart can then be used to perform meta-analyses. For example, an analysis of GO-terms will allow finding out if certain GO-categories are over-represented in the particular gene list, indicating that the genes may belong to a specific pathway or biological process. Similarly, an analysis of expression patterns may reveal if there is a certain tissue in which the genes from the list are preferentially expressed.

At present, only few of the currently existing databases offer some of the above-described functionalities, the most comprehensive one being MGI. And thus far, only BioMart represents an initiative which aims to allow the user to design queries on information from otherwise disparate databases. Also, BioMart allows refining searches and filtering out relevant information. However, Biomart is currently aimed at the advanced and trained user and is not yet designed for simple querying and collection of results in a shopping cart to which new genes and information can be added.

Phenotype-centric use-case

A second use-case was defined through the interaction with the user groups. It should allow researchers to begin their search with a phenotype description (Fig. 2). In this use-case, the scientist will search a phenotype ontology, obtain the closest hits and then decide which terms should be used in the following query. The use-case should also allow browsing of the phenotype ontology and the selection of terms of interest. The result of the searches for phenotype descriptions should then link to the associated genes.

At present, the most extensive and well structured phenotype ontology for the mouse is the Mammalian Phenotype (MP) ontology [25], accessible at MGI. MP is therefore used as a first standard which will allow querying MGI but also other databases that are using MP terms for phenotype descriptions, like EuroPhenome [26].

In the future, cross-referencing mouse MP terms with ontologies that describe diseases (such as the Disease Ontology - DO [27]) and phenotypes in humans (such as the Human Phenotype Ontology HPO [28] and Mouse Pathology Ontology MPATH [29]) should allow users to make cross-species searches by starting from phenotype descriptions. This will be particularly useful for human clinician researchers who are not familiar with mouse databases but who would like to know if there is a mouse model available for a given human disease.

The results from the phenotype-driven searches should then be linked to gene names associated with a given phenotype. These genes are presented as a list from which the user can choose the genes of interest and save them in a shopping cart. It is then possible to feed the genes into the gene-centric use-case and perform a more detailed data mining or meta-analysis.

The description and further development of the phenotype-driven use-case may represent a very useful concept for scientists and clinicians outside the mouse community. For example the Human Phenotype Ontology HPO is based on OMIM [30] and a search may be generated using HPO as a starting point to retrieve disease ID's from OMIM which can then be linked to gene symbols. The Drosophilia phenotype ontology [31] developed by the Flybase group could be used to retrieve gene symbols and thereby gene function information from Flybase [32]. Or the C. elegans phenotype ontology [33] could be used to retrieve gene symbols from Wormbase [34]. Gene symbols retrieved from these databases could then be stored in a shopping cart.

Implementation of the use-cases: MUSIG-Gen and MUSIG-Phen

Web services for database integration

A prerequisite for computer-supported data integration is programmatic access to select and retrieve data from distributed resources. As described by [15] there are several possible technical solutions to integrate data from different mouse informatics databases. The "CASIMIR strategy" is based on semantic standardization or wrapping of information transferred by web services. Currently the most popular implementations of web services use the SOAP/WSDL or the XML-REST protocols. The advantages of opening APIs and transferring information using XML schemas are discussed in [15].

For Europhenome and Mugen [35] SOAP/WSDL web services were available which could be used for MUSIG-Phen, and we set up a BioMart web service for part of the MGI data. Other databases such as the Ontology Lookup Service (OLS [36]) for ontology data and INTACT already had web services.

Users may want to integrate their local database or other databases. To demonstrate how this can be achieved, we generated web services for accessing GNF SymAtlas expression data. For this, we first saved the SymAtlas data locally. We then defined the Entrez Gene ID's as a common field which could be retrieved from the MGI Biomart and matched to the records in the local SymAtlas database. We then used MOLGENIS to create the relevant SOAP web services to retrieve the data from the local database, to subsequently load and display them in the shopping cart interface.

Implementation of MUSIG-Gen

After having defined the use-cases we wanted to provide users and developers with a first implementation which may then be tested and further revised in the future. Thus, certain parts of the use-case scheme outlined in Fig. 1 were implemented in the application MUSIG-Gen http://www.casimir.org.uk/usecase1/. In the following, we describe this tool from the perspective of the scientific user.

Fig. 3 displays the entry form of MUSIG-Gen where the user can type in gene names or synonyms (example: synonyms for chemokines). The result of the subsequent search query shows a list of hits from the MGI database which contain the query name (Fig. 4) and, in the default setting, additional information for each gene, like gene symbol, full gene name, all synonyms, and chromosomal location. This information allows the user to decide which one of the hits in the list corresponds to the gene of interest. As shown for the inputs "RANTES" and "IP-10", the correct gene names are displayed together with the search term and all other synonyms. If, for example, "Fgf" is used as query, all Fgf gene family members are displayed. The user may now decide which members to follow further. The genes selected in this process via the check box may then be saved in a shopping cart.

The gene list can subsequently be retrieved from the cart (Fig. 5) and additional information added, for example MGI IDs. These are hyper-linked to the corresponding entry at MGI so that the user has access to all MGI information on this particular gene with a single mouse click. Similarly, information on gene expression can be retrieved from the SymAtlas database. This query creates a new column for all genes on the list, displaying the SymAtlas IDs. The ID is again hyper-linked to SymAtlas and the corresponding data can be visualized with one mouse click (Fig. 6). Also, a search for information on Single Nucleotide Polymorphisms (SNPs) has been implemented. This function queries the Ensembl database and is currently set to display SNPs which result in non-synonymous coding changes in the open reading frame of the genes as well as the SNP Variation ID and a link to the Ensembl page with more details. (Fig. 6).

New genes can be easily added to an existing cart by calling up the entry form from within a cart and follow the same procedure as described above.

Because the genes listed in a cart contain a correct and unique identifier (MGI and/or Ensembl IDs) they can be directly used to query other databases. Such features and searches could be easily added to the existing MUSIG-Gen application. But even more important, it may now possible to perform an analysis on the entire group of genes in the cart. In the current version of the use-case, we implemented a GO term count as a proof-of-concept for the user interface. GO terms can be associated with all genes of the list using the 'load more data' feature and the representation of different GO-terms across the whole gene list be displayed (Fig. 7). These analyses may be extended to more sophisticated meta-analysis including also statistical evaluations in the future. Similarly, we added a tool to associate phenotype terms from the MP ontology and show their representation in the cart gene list.

As a final step, we added an export function to the shopping cart which allows the user to export his data in CSV format and then perform highly customized analysis locally.

Technical aspects of the implementation of MUSIG-Gen

The application layer of the shopping cart was developed in PHP. PHP proved to be a good choice for the development of the user interfaces, but did create some problems for the development of the web service client scripts because of a lack of multi-threading. The latter makes it impossible to retrieve data from different web services at the same time. The major problem is that some web pages access multiple services and depending on the network speed and the kind of query some web services are slow to respond. This operation would thus stop the page from loading in the browser. We managed to mitigate this problem by creating an AJAX (Asynchronous JavaScript and XML) based loading system using the PHP PEAR AJAX [37] libraries. This system loads the main page first and then accesses each web service individually, thereby creating a more responsive system which lets the user interact with some data while the remainder of the data is still being retrieved.

The shopping cart system uses a Postgresql database to store user data. The data stored comprises the user's personal data (which is integrated into our web site management system to allow for a single login system) as well as the data retrieved from the different web services. The system imposes no limits as to how many data fields or data values a user can download and store in his shopping carts.

The application initially retrieves gene nomenclature and genome location data based on gene symbol: By default, nomenclature and genome location data is loaded from our MGI BioMart http://www.casimir.org.uk/biomart/martview/. Other data from the MGI BioMart can also be loaded, such as MGI, Ensembl, EntrezGene IDs as well as GO and MP ontologies. The Ensembl BioMart can also be queried at this stage for Uniprot IDs. Both BioMarts are accessed using the default BioMart XML-REST services. For this, we developed and used a generic BioMart XML-REST PHP client class which can be used to query any BioMarts.

Data may also be loaded from the Eurexpress BioMart or from the GNF and INTACT SOAP web services (using generic PHP SOAP libraries). There are also some fields which have the option of loading additional information, e.g. the GO and MP ID fields. The user can choose to load the ontology term names which are loaded from the OLS SOAP web service.

The source code and documentation for the MUSIG-Gen prototype may be downloaded form the following web server: http://www.casimir.org.uk/sourcecode/

Implementation of MUSIG-Phen

Based on the scheme outlined in Fig. 7, certain parts of the phenotype-centric use-case were implemented in the application MUSIG-Phen http://www.casimir.org.uk/usecase2/. The MUSIG-Phen prototype starts from a phenotype description, collects the genes associated with this phenotype in a cart and then performs all the analyses described above for MUSIG-Gen.

The starting point of MUSIG-Phen is a search page in which a free text entry will display a list of MP terms that most closely resemble the search term. The user may now choose the appropriate term, send a query to MGI and retrieve a list of genes that are associated with it. The list of genes can then be saved in a cart and further analyzed as described for MUSIG-Gen, e.g. add more information, perform meta-analysis, export lists. Alternatively, the user may start his query by browsing the hierarchical list of MP terms, select one and then retrieve the genes associated to the MP term (Fig. 8).

At this stage, the implementation is very similar to the services already provided by MGI. Thus, in addition to the current MGI search options, we implemented the possibility to query other external databases which contain phenotype descriptions based on MP terms. We demonstrated feasibility of this feature for searches of the Mugen and Europhenome databases.

At the present state, the MUSIG-Phen software was not designed for more sophisticated queries, because discussions with users revealed that further detailed queries very soon become highly specialized and complex for certain user subgroups. However, the present use-case implementation may already serve to query nascent databases (e.g. phenotype data from EUMODIC) and represents a very useful platform to test new developments which aim to connect mouse and human phenotype databases.

Technical aspects of the implementation of MUSIG-Phen

The implementation of the phenotype-centric use-case uses three SOAP/WSDL web services and our MGI BioMart web service: Initially the Mammalian Phenotype (MP) ontology is loaded from the OLS web service. The user-selected MP term is sent as query input to the MGI, EuroPhenome and MUGEN web services and matching gene symbols are returned. Gene symbols can then be selected and sent to the gene-centric use-case shopping cart.

Basic information about web services, such as type (for example BioMart or SOAP) and location URL is currently stored in a separate table. However, a larger web service catalogue such as BioMoby [38], Biocatalogue [39] or the mouse-centric MRB could easily be integrated and used to create a wider array of services. These services could also be linked to create a Taverna-like workflow tool which automatically matches IDs and fields from different services. The current limitation to this approach is the lack of standardization across databases and web services with respect to the use of ontologies and the naming of web service fields. For example a field for MGI gene IDs could be called mgi_id, gene_id, MGIGeneId etc. which would make automatic matching impossible. We therefore favor the idea to develop a web service field ontology which should be integrated into MRB or Biocatalogue to provide a look-up service for field names. Currently developments are ongoing within the Biocatalogue project to create a web service ontology to which web service developers annotate their fields which may provide a suitable solution to this problem.

The source code and documentation for the MUSIG-Phen prototype may be downloaded form the following web server: http://www.casimir.org.uk/sourcecode/

Discussion and Conclusion

The aim of generating the MUSIG-Gen and MUSIG-Phen applications was to provide a first set of solutions to user-defined use-cases and thereby generate a test environment for a fully distributed integration strategy. We also presented the applications to various user groups and collected their feed-back. All users appreciated the tools which were able to integrate data from several databases, and they especially liked the principle of the shopping cart. An additional, often mentioned suggestion was to link the genes in MUSIG-Gen to mouse mutants and phenotypes as well as gene expression information. We are planning to add these functionalities to future prototypes.

Our plan for a third use-case is to define the needs for an integration of mouse and human functional genomics databases. Here, we believe that the phenotype-centric use case may serve as a valuable basis to provide an entry point for clinical researchers. The concept would be to enter descriptions of human disease phenotypes as queries and to obtain mouse phenotype descriptions which relate to these terms. However, for such a query, it will first be necessary to relate the human phenotype descriptions with MP terms or with more detailed EQ-based phenotype descriptions.

References

Peters LL, Robledo RF, Bult CJ, Churchill GA, Paigen BJ, Svenson KL: The mouse as a model for human biology: a resource guide for complex trait analysis. Nat Rev Genet. 2007, 8: 58-69.
Article CAS PubMed Google Scholar
Rosenthal N, Brown S: The mouse ascending: perspectives for human-disease models. Nat Cell Biol. 2007, 9: 993-9.
Article CAS PubMed Google Scholar
Bult CJ, Eppig JT, Kadin JA, Richardson JE, Blake JA: The Mouse Genome Database (MGD): mouse biology and model systems. Nucl Acids Res. 2008, 36: D724-8.
Article CAS PubMed Central PubMed Google Scholar
Bogue MA, Grubb SC, Maddatu TP, Bult CJ: Mouse Phenome Database (MPD). Nucl Acids Res. 2007, 35: D643-9.
Article CAS PubMed Central PubMed Google Scholar
Mallon AM, Blake A, Hancock JM: EuroPhenome and EMPReSS: online mouse phenotyping resource. Nucl Acids Res. 2008, 36: D715-8.
Article CAS PubMed Central PubMed Google Scholar
Wang J, Williams RW, Manly KF: WebQTL: Web-based complex trait analysis. Neuroinformatics. 2003, 1: 299-308.
Article PubMed Google Scholar
Zouberakis M, Chandras C, Hancock JM, Schofield PN, Aidinis V: The Mouse Resource Browser (MRB) - A near-complete registry of mouse resources. BioInformatics and BioEngineering. BIBE 2008. 8th IEEE International Conference on (2008). 2008, 1-5.
Chapter Google Scholar
Hancock J, Chandras C, Zouberakis M, Aidinis V, Schofield PN: Integrating information from EU-funded mouse functional genomics projects: a questionnaire-based analysis. BioInformatics and BioEngineering. BIBE 2008. 8th IEEE International Conference on (2008). 2008, 1-5.
Chapter Google Scholar
Hancock J, Schofield PN, Chandras C, Zouberakis M, Aidinis V, Smedley D, Rosenthal N, Schughart K: CASIMIR: Coordination and Sustainability of International Mouse Informatics Resources. BioInformatics and BioEngineering. BIBE 2008. 8th IEEE International Conference on (2008). 2008, 1-5.
Chapter Google Scholar
Smedley D, Haider S, Ballester B, Holland R, London D, Thorisson G, Kasprzyk A: Bio-Mart--biological queries made easy. BMC Genomics. 2009, 10: 22-
Article PubMed Central PubMed Google Scholar
Hull D, Wolstencroft K, Stevens R, Goble C, Pocock MR, Li P, Oinn T: Taverna: a tool for building and running workflows of services. Nucl Acids Res. 2006, W729-32. 34 Web Server
Galaxy. [http://galaxy.psu.edu/]
Swertz MA, De Brock EO, Van Hijum SA, De Jong A, Buist G, Baerends RJ, Kok J, Kuipers OP, Jansen RC: Molecular Genetics Information System (MOLGENIS): alternatives in developing local experimental genomics databases. Bioinformatics. 2004, 20: 2075-83.
Article CAS PubMed Google Scholar
Swertz MA, Jansen RC: Beyond standardization: dynamic software infrastructures for systems biology. Nat Rev Genet. 2007, 8: 235-43.
Article CAS PubMed Google Scholar
Smedley D, Swertz MA, Wolstencroft K, Proctor G, Zouberakis M, Bard J, Hancock JM, Schofield P: Solutions for data integration in functional genomics: a critical assessment and case study. Brief Bioinform. 2008, 9: 532-44.
Article CAS PubMed Google Scholar
Sundberg J, Schofield P: A mouse by any other name. Journal of Investigative Dermatology. 2009, 129: 1599-1601.
Article CAS PubMed Central PubMed Google Scholar
Guidelines for Nomenclature of Mouse and Rat Strains. [http://www.informatics.jax.org/mgihome/nomen/strains.shtml]
HUGO Gene Nomenclature Committee. [http://www.genenames.org/]
Rat Genome and Nomenclature Committee. [http://ratmap.gen.gu.se/RGNC/]
Eppig JT, Blake JA, Bult CJ, Richardson JE, Kadin JA, Ringwald M: Mouse genome informatics (MGI) resources for pathology and toxicology. Toxicol Pathol. 2007, 35: 456-7.
Article PubMed Google Scholar
Eurexpress. [http://www.eurexpress.org/ee/]
BioGPS. [http://biogps.gnf.org/?referer=symatlas#goto=welcome]
Parkinson : ArrayExpress update--from an archive of functional genomics experiments to the atlas of gene expression. Nucleic Acids Res. 2009, D868-72. 37 Database
Hermjakob H: IntAct - an open source molecular interaction database. Nucl Acids Res. 2004, 32: D452-D455.
Article CAS PubMed Central PubMed Google Scholar
Smith CL, Goldsmith CA, Eppig JT: The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information. Genome Biol. 2005, 6: R7-
Article PubMed Central PubMed Google Scholar
Eumodic. [http://www.eumodic.org/aboutus.html]
Disease Ontology. [http://diseaseontology.sourceforge.net/]
Robinson PN, Kohler S, Bauer S, Seelow D, Horn D, Mundlos S: The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. Am J Hum Genet. 2008, 83: 610-5.
Article CAS PubMed Central PubMed Google Scholar
Schofield : Pathbase: a database of mutant mouse pathology. Nucl Acids Res. 2004, D512-5.
Google Scholar
Online Mendelian Inheritance in Man, OMIM (TM). 2009, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University (Baltimore, MD) and National Center for Biotechnology Information, National Library of Medicine (Bethesda, MD), [http://www.ncbi.nlm.nih.gov/omim/]
Fly phenotype ontology. [http://subversion.flymine.org/tags/flymine_release_2_1/flymine/model/phenotype/phenotype.ontology]
Tweedie S: FlyBase: enhancing Drosophila Gene Ontology annotations. Nucl Acids Res. 2009, 37: D555-D559.
Article CAS PubMed Central PubMed Google Scholar
C. elegans phenotype ontology. [http://www.obofoundry.org/cgi-bin/detail.cgi?id=worm_phenotype]
Tamberlyn Bieri: WormBase: new content and better access. Nucl Acids Res. 2007, 35: D506-10.
Article Google Scholar
Aidinis V: MUGEN mouse database; animal models of human immunological diseases. Nucl Acids Res. 2008, D1048-54.
Google Scholar
Cote RG, Jones P, Apweiler R, Hermjakob H: The ontology lookup service, a lightweight cross-platform tool for controlled vocabulary queries. BMC Bioinformatics. 2006, 7: 97-
Article PubMed Central PubMed Google Scholar
HTML-AJAX. [http://pear.php.net/package/HTML_AJAX]
Wilkinson MD, Links M: BioMOBY: an open source biological web services proposal. Brief Bioinform. 2002, 331-41.
Google Scholar
Goble CA, Stevens RD, Hull D, Wolstencroft K, Lopez R: Data Curation + Process Curation = Data Integration + Science. Brief Bioinform. 2008, 9: 506-517.
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We thank George Gkoutos for helpful discussions on the technical developments of the use cases and all the members of the international advisory board of the CASIMIR consortium http://www.casimir.org.uk/ for their helpful comments and support.

This work was supported by the Commission of the European Community, Framework Programme 6 contract no. LSHG-CT-2006-037811, NWO Rubicon, grant 825.09.008, CASIMIR and the HZI program Infection & Immunity.

Author information

Authors and Affiliations

Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge, CB2 3EG, UK
Michael Gruenberger & Paul Schofield
Department of Infection Genetics, Helmholtz Centre for Infection Research &, University of Veterinary Medicine Hannover, Inhoffenstr. 7, D-38124, Braunschweig, Germany
Rudi Alberts & Klaus Schughart
European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
Damian Smedley
Department of Genetics, University Medical Center Groningen & Groningen Bioinformatics Centre, University of Groningen, P.O. Box 30001, 9700, RB, Groningen, The Netherlands
Morris Swertz

Authors

Michael Gruenberger
View author publications
You can also search for this author in PubMed Google Scholar
Rudi Alberts
View author publications
You can also search for this author in PubMed Google Scholar
Damian Smedley
View author publications
You can also search for this author in PubMed Google Scholar
Morris Swertz
View author publications
You can also search for this author in PubMed Google Scholar
Paul Schofield
View author publications
You can also search for this author in PubMed Google Scholar
Klaus Schughart
View author publications
You can also search for this author in PubMed Google Scholar

Consortia

The CASIMIR consortium

Corresponding author

Correspondence to Klaus Schughart.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

KLS conceived the study, organised the user workshops, developed the use-cases and wrote the manuscript. DS deployed Biomart for the various resources used for the use-case implementations. MS was involved in developing the use-cases and drafting the manuscript. RA developed the use-cases, set-up the Symatlas web service and drafted the manuscript. MG developed the prototypes, conducted the user demonstrations and wrote the manuscript. PNS coordinates the CASIMIR project and was involved in developing the use-cases and drafting the manuscript. All authors read and approved the final manuscript.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Authors’ original file for figure 8

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Gruenberger, M., Alberts, R., Smedley, D. et al. Towards the integration of mouse databases - definition and implementation of solutions to two use-cases in mouse functional genomics. BMC Res Notes 3, 16 (2010). https://doi.org/10.1186/1756-0500-3-16

Download citation

Received: 17 November 2009
Accepted: 22 January 2010
Published: 22 January 2010
DOI: https://doi.org/10.1186/1756-0500-3-16

Towards the integration of mouse databases - definition and implementation of solutions to two use-cases in mouse functional genomics

Abstract

Background

Results

Conclusion

Background

Methods and Results

Definition of the use-cases

Gene-centric use-case

Phenotype-centric use-case

Implementation of the use-cases: MUSIG-Gen and MUSIG-Phen

Web services for database integration

Implementation of MUSIG-Gen

Technical aspects of the implementation of MUSIG-Gen

Implementation of MUSIG-Phen

Technical aspects of the implementation of MUSIG-Phen

Discussion and Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Consortia

The CASIMIR consortium

Corresponding author

Additional information

Competing interests

Authors' contributions

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Research Notes

Contact us