Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Proceedings of the 2009 AMIA Summit on Translational Bioinformatics

Open Access Proceedings

Towards large-scale sample annotation in gene expression repositories

Erik Pitzer12*, Ronilda Lacson1*, Christian Hinske1, Jihoon Kim1, Pedro AF Galante13 and Lucila Ohno-Machado1

Author Affiliations

1 Decision Systems Group, Brigham and Women's Hospital, Boston, MA, USA

2 Upper Austria University of Applied Sciences, Hagenberg, Austria

3 Ludwig Institute for Cancer Research, São Paulo Branch, São Paulo, Brazil

For all author emails, please log on.

BMC Bioinformatics 2009, 10(Suppl 9):S9  doi:10.1186/1471-2105-10-S9-S9

Published: 17 September 2009

Abstract

Background

Large repositories of biomedical research data are most useful to translational researchers if their data can be aggregated for efficient queries and analyses. However, inconsistent or non-existent annotations describing important sample details such as name of tissue or cell line, histopathological type, and subject characteristics like demographics, treatment, and survival are seldom present in data repositories, making it difficult to aggregate data.

Results

We created a flexible software tool that allows efficient annotation of samples using a controlled vocabulary, and report on its use for the annotation of over 12,500 samples.

Conclusion

While the amount of data is very large and seemingly poorly annotated, a lot of information is still within reach. Consistent tool-based re-annotation enables many new possibilities for large scale interpretation and analyses that would otherwise be impossible.