|
Resolution: standard / high Figure 1.
The figure illustrates two gene-factorizations into 7 and 4 pseudo-exons of the genomic sequence G. Let S1, S2 and S3 be EST sequences in S agreeing to the genomic sequence G, where sequence S1 = ABDEF, S2 = ABCDE and S3 = BDEFG, each letter in {A, B, C, D, E, F, G} denotes a sequence (A). In (B) and (C) two alternative EST-genome alignments of sequences S1, S2 and S3 are represented: each EST factorization of Si associated with the EST-genome alignment is shadowed. Pseudo-exons in the gene-factorization are colored white, while introns are in grey. Segments labelled by letters represent regions of the genomic sequence that align to a substring of the input sequence of the corresponding letter. Note that an approach that aligns independently each sequence S1, S2 and S3 to G, one after the other, may produce the gene-factorization <A, B, C, D, F, E, G> consisting of 7 pseudo-exons (B), while the one minimizing the number of pseudo-exons provides only 4 pseudo-exons (C). Indeed, there are EST factorizations of each Si that are compatible or variant compatible with the gene-factorization GE = <AB, C, DE, FG>. More precisely, <AB, DE, F> is an EST-factorization of S1 that is compatible to GE. Then <AB, C, DE> is an EST-factorization of S2 compatible to GE. Finally, <B, DE, FG> is an EST-factorization of S3 compatible with GE (C).
Bonizzoni et al. BMC Bioinformatics 2005 6:244 doi:10.1186/1471-2105-6-244 |