Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

Open Access Software

EST2Prot: Mapping EST sequences to proteins

Paul Shafer1, David M Lin2 and Golan Yona1*

Author Affiliations

1 Department of Computer Science, Cornell University, Ithaca, NY, USA

2 Department of Biomedical Sciences, Cornell University, Ithaca, NY, USA

For all author emails, please log on.

BMC Genomics 2006, 7:41  doi:10.1186/1471-2164-7-41

Published: 4 March 2006



EST libraries are used in various biological studies, from microarray experiments to proteomic and genetic screens. These libraries usually contain many uncharacterized ESTs that are typically ignored since they cannot be mapped to known genes. Consequently, new discoveries are possibly overlooked.


We describe a system (EST2Prot) that uses multiple elements to map EST sequences to their corresponding protein products. EST2Prot uses UniGene clusters, substring analysis, information about protein coding regions in existing DNA sequences and protein database searches to detect protein products related to a query EST sequence. Gene Ontology terms, Swiss-Prot keywords, and protein similarity data are used to map the ESTs to functional descriptors.


EST2Prot extends and significantly enriches the popular UniGene mapping by utilizing multiple relations between known biological entities. It produces a mapping between ESTs and proteins in real-time through a simple web-interface. The system is part of the Biozon database and is accessible at webcite.