Open Access Database

NovelFam3000 – Uncharacterized human protein domains conserved across model organisms

Danielle Kemmer1, Raf M Podowski1, David Arenillas2, Jonathan Lim2, Emily Hodges1, Peggy Roth3, Erik LL Sonnhammer1, Christer Höög1 and Wyeth W Wasserman24*

Author Affiliations

1 Center for Genomics and Bioinformatics, Karolinska Institutet, Stockholm, Sweden

2 Centre for Molecular Medicine and Therapeutics, University of British Columbia, Vancouver, Canada

3 Department of Developmental Biology, Stockholm University, Stockholm, Sweden

4 Department of Medical Genetics, University of British Columbia, Vancouver, Canada

For all author emails, please log on.

BMC Genomics 2006, 7:48  doi:10.1186/1471-2164-7-48

Published: 13 March 2006

Abstract

Background

Despite significant efforts from the research community, an extensive portion of the proteins encoded by human genes lack an assigned cellular function. Most metazoan proteins are composed of structural and/or functional domains, of which many appear in multiple proteins. Once a domain is characterized in one protein, the presence of a similar sequence in an uncharacterized protein serves as a basis for inference of function. Thus knowledge of a domain's function, or the protein within which it arises, can facilitate the analysis of an entire set of proteins.

Description

From the Pfam domain database, we extracted uncharacterized protein domains represented in proteins from humans, worms, and flies. A data centre was created to facilitate the analysis of the uncharacterized domain-containing proteins. The centre both provides researchers with links to dispersed internet resources containing gene-specific experimental data and enables them to post relevant experimental results or comments. For each human gene in the system, a characterization score is posted, allowing users to track the progress of characterization over time or to identify for study uncharacterized domains in well-characterized genes. As a test of the system, a subset of 39 domains was selected for analysis and the experimental results posted to the NovelFam3000 system. For 25 human protein members of these 39 domain families, detailed sub-cellular localizations were determined. Specific observations are presented based on the analysis of the integrated information provided through the online NovelFam3000 system.

Conclusion

Consistent experimental results between multiple members of a domain family allow for inferences of the domain's functional role. We unite bioinformatics resources and experimental data in order to accelerate the functional characterization of scarcely annotated domain families.