A compatible exon-exon junction database for the identification of exon skipping events using tandem mass spectrum data
- Equal contributors
1 Systems Biology Division, Zhejiang-California Nanosystems Institute (ZCNI) of Zhejiang University, Zhejiang University Huajiachi Campus, 268 Kaixuan Road, Hangzhou 310029, PR China
2 Department of General Surgery, The Second Affiliated Hospital, ShanXi Medical University, 382 Wuyi Road, Taiyuan 030000, PR China
3 College of Life Science, Zhejiang University Zijingang Campus, Zijinhua Road, Hangzhou 310058, PR China
4 Center for Computational Medicine and Biology, National Center for Integrative Biomedical Informatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI 48109-2218, USA.
5 Departments of Internal Medicine and Human Genetics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI 48109-2218, USA
BMC Bioinformatics 2008, 9:537 doi:10.1186/1471-2105-9-537Published: 16 December 2008
Alternative splicing is an important gene regulation mechanism. It is estimated that about 74% of multi-exon human genes have alternative splicing. High throughput tandem (MS/MS) mass spectrometry provides valuable information for rapidly identifying potentially novel alternatively-spliced protein products from experimental datasets. However, the ability to identify alternative splicing events through tandem mass spectrometry depends on the database against which the spectra are searched.
We wrote scripts in perl, Bioperl, mysql and Ensembl API and built a theoretical exon-exon junction protein database to account for all possible combinations of exons for a gene while keeping the frame of translation (i.e., keeping only in-phase exon-exon combinations) from the Ensembl Core Database. Using our liver cancer MS/MS dataset, we identified a total of 488 non-redundant peptides that represent putative exon skipping events.
Our exon-exon junction database provides the scientific community with an efficient means to identify novel alternatively spliced (exon skipping) protein isoforms using mass spectrometry data. This database will be useful in annotating genome structures using rapidly accumulating proteomics data.