In eukaryotes, pre-mRNA molecules undergo splicing, which is the removal of sequences called introns to produce mature mRNA transcripts whose open reading frames (ORFs) may then be translated to proteins. Often this splicing step may be performed in many ways - a situation known as alternative splicing [1,2] that can be described by structures such as splice graphs . Alternative splicings, even those that differ only slightly, may result in proteins with substantially different biological properties [4,5].
Materials and methods
We have developed an algorithm for finding the longest ORFs of alternatively spliced transcripts described by splice graphs. Our algorithm executes in time linear in the size of the splice graph (and therefore optimal), determining the splicings that result in an open reading frame encoding a maximal-length protein for that gene. We show how our algorithm may be used to help identify biologically interesting protein products from RNA-seq data.
This work was partially supported by NIGMS Grant 1R01GM086888-01, Kentucky NSF-EPSCoR Grant 0814194, NSF Grant EF-0523661, and USDA-NRA Grant 2005-35319-16141.
An earlier version of this work  was presented at presented at the Workshop on Integrative Data Analysis in Systems Biology in 2011.