Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Methodology article

A compatible exon-exon junction database for the identification of exon skipping events using tandem mass spectrum data

Fan Mo1, Xu Hong1, Feng Gao2, Lin Du3, Jun Wang1, Gilbert S Omenn45 and Biaoyang Lin1*

Author Affiliations

1 Systems Biology Division, Zhejiang-California Nanosystems Institute (ZCNI) of Zhejiang University, Zhejiang University Huajiachi Campus, 268 Kaixuan Road, Hangzhou 310029, PR China

2 Department of General Surgery, The Second Affiliated Hospital, ShanXi Medical University, 382 Wuyi Road, Taiyuan 030000, PR China

3 College of Life Science, Zhejiang University Zijingang Campus, Zijinhua Road, Hangzhou 310058, PR China

4 Center for Computational Medicine and Biology, National Center for Integrative Biomedical Informatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI 48109-2218, USA.

5 Departments of Internal Medicine and Human Genetics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI 48109-2218, USA

For all author emails, please log on.

BMC Bioinformatics 2008, 9:537  doi:10.1186/1471-2105-9-537

Published: 16 December 2008

Abstract

Background

Alternative splicing is an important gene regulation mechanism. It is estimated that about 74% of multi-exon human genes have alternative splicing. High throughput tandem (MS/MS) mass spectrometry provides valuable information for rapidly identifying potentially novel alternatively-spliced protein products from experimental datasets. However, the ability to identify alternative splicing events through tandem mass spectrometry depends on the database against which the spectra are searched.

Results

We wrote scripts in perl, Bioperl, mysql and Ensembl API and built a theoretical exon-exon junction protein database to account for all possible combinations of exons for a gene while keeping the frame of translation (i.e., keeping only in-phase exon-exon combinations) from the Ensembl Core Database. Using our liver cancer MS/MS dataset, we identified a total of 488 non-redundant peptides that represent putative exon skipping events.

Conclusion

Our exon-exon junction database provides the scientific community with an efficient means to identify novel alternatively spliced (exon skipping) protein isoforms using mass spectrometry data. This database will be useful in annotating genome structures using rapidly accumulating proteomics data.