<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2164-8-133</ui>
   <ji>1471-2164</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>A population study of the minicircles in <it>Trypanosoma cruzi</it>: predicting guide RNAs in the absence of empirical RNA editing</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Thomas</snm>
               <fnm>Sean</fnm>
               <insr iid="I1"/>
               <email>sean.thomas@sbri.org</email>
            </au>
            <au id="A2">
               <snm>Martinez</snm>
               <mnm>Trejo</mnm>
               <fnm>LL Isadora</fnm>
               <insr iid="I2"/>
               <email>ltrejom@ucla.edu</email>
            </au>
            <au id="A3">
               <snm>Westenberger</snm>
               <mi>J</mi>
               <fnm>Scott</fnm>
               <insr iid="I2"/>
               <email>scottw@scripps.edu</email>
            </au>
            <au id="A4" ca="yes">
               <snm>Sturm</snm>
               <mi>R</mi>
               <fnm>Nancy</fnm>
               <insr iid="I2"/>
               <email>nsturm@ucla.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Molecular Biology Institute, UCLA, Los Angeles, CA, USA</p>
            </ins>
            <ins id="I2">
               <p>Department of Microbiology, Immunology and Molecular Genetics, UCLA, Los Angeles, CA, USA</p>
            </ins>
         </insg>
         <source>BMC Genomics</source>
         <issn>1471-2164</issn>
         <pubdate>2007</pubdate>
         <volume>8</volume>
         <issue>1</issue>
         <fpage>133</fpage>
         <url>http://www.biomedcentral.com/1471-2164/8/133</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17524149</pubid>
               <pubid idtype="doi">10.1186/1471-2164-8-133</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>10</day>
               <month>11</month>
               <year>2006</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>24</day>
               <month>5</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>24</day>
               <month>5</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Thomas et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>The structurally complex network of minicircles and maxicircles comprising the mitochondrial DNA of kinetoplastids mirrors the complexity of the RNA editing process that is required for faithful expression of encrypted maxicircle genes. Although a few of the guide RNAs that direct this editing process have been discovered on maxicircles, guide RNAs are mostly found on the minicircles. The nuclear and maxicircle genomes have been sequenced and assembled for <it>Trypanosoma cruzi</it>, the causative agent of Chagas disease, however the complement of 1.4-kb minicircles, carrying four guide RNA genes per molecule in this parasite, has been less thoroughly characterised.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Fifty-four CL Brener and 53 Esmeraldo strain minicircle sequence reads were extracted from <it>T. cruzi </it>whole genome shotgun sequencing data. With these sequences and all published <it>T. cruzi </it>minicircle sequences, 108 unique guide RNAs from all known <it>T. cruzi </it>minicircle sequences and two guide RNAs from the CL Brener maxicircle were predicted using a local alignment algorithm and mapped onto predicted or experimentally determined sequences of edited maxicircle open reading frames. For half of the sequences no statistically significant guide RNA could be assigned. Likely positions of these unidentified gRNAs in <it>T. cruzi </it>minicircle sequences are estimated using a simple Hidden Markov Model. With the local alignment predictions as a standard, the HMM had an ~85% chance of correctly identifying at least 20 nucleotides of guide RNA from a given minicircle sequence. Inter-minicircle recombination was documented. Variable regions contain species-specific areas of distinct nucleotide preference. Two maxicircle guide RNA genes were found.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>The identification of new minicircle sequences and the further characterization of all published minicircles are presented, including the first observation of recombination between minicircles. Extrapolation suggests a level of 4% recombinants in the population, supporting a relatively high recombination rate that may serve to minimize the persistence of gRNA pseudogenes. Characteristic nucleotide preferences observed within variable regions provide potential clues regarding the transcription and maturation of <it>T. cruzi </it>guide RNAs. Based on these preferences, a method of predicting <it>T. cruzi </it>guide RNAs using only primary minicircle sequence data was created.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="bmc" subtype="user_supplied_xml" id="endnote"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Kinetoplastids are single-celled protists, some of which are free-living, while <it>Trypanosoma cruzi </it>and others cause significant plant, animal, and human diseases. Decades after surviving the potentially deadly acute phase of Chagas disease, a human infected with <it>T. cruzi </it>may develop fatal damage to cardiac and smooth muscle tissue in approximately 30% of cases. Chagas disease may be caused by tissue destruction caused by the parasite, errant autoimmune responses or some combination of the two <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>. Chagas disease affects millions in South and Central America, and there are no predictive tests for disease outcome.</p>
         <p>Among eukaryotes that possess mitochondrial genomes, there is a remarkable diversity of genome structure ranging from single circular chromosomes to extremely complex arrangements with multiple chromosomes found in organisms like <it>Amoebidium parasiticum </it><abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. The mitochondrial DNA (kDNA) of the kinetoplastids is a unique structure comprised of dozens of ~25-Mb maxicircles and thousands of 1.4-kb minicircles linked together in a dense network called the kinetoplast. Each maxicircle copy is thought to be nearly identical, although this assumption may be incorrect <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>, and the number of minicircle sequence classes ranges from one to over a hundred depending on the kinetoplastid. This structural behemoth belies a functional complexity whereby messages transcribed from maxicircles must be decrypted by means of a uridine insertion/deletion RNA editing process <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Following the hybridization of the 5'-anchor region of a guide RNA (gRNA) to the 3' end of its target message, sequential base pairing directs U insertion and deletion in a processive enzyme cascade <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. Non-canonical G-U base pairs are permissible in these RNA-RNA interactions, conferring transition tolerance to the gRNA sequence, with a staggering number of potential guides directing identical editing events. Editing events cannot be predicted based solely on a gRNA sequence, nor <it>visa versa</it>. The primary repository of gRNAs are the minicircles <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>, with a handful of gRNAs found on maxicircles <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>.</p>
         <p>In <it>T. cruzi </it>the kinetoplast DNA has two tantalizing links to disease. Minicircles can integrate into the host genome <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp> potentially persisting long after an active infection has been cleared. This example of horizontal DNA transfer has implications for the autoimmune characteristics seen in the clinics. Parasite integration events have been localized to host LINE-1 retrotransposable elements, thus conferring mobility upon the parasite sequences as hundreds of thousands of these elements exist per genome <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. Maxicircles may also play a role in pathology, as a lesion discovered in a parasite mitochondrial gene was correlated with disease presentation <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. The extensively edited NADH dehydrogenase 7 (<it>ND7</it>) gene harbours a deletion that would compromise the electron transport chain, a defect found exclusively in <it>T. cruzi </it>strains isolated from asymptomatic patients. This and other related loci may provide the first functional linkage between parasite genotype and disease manifestation. The minicircle and maxicircle phenomena are not mutually exclusive. The potential association of these events with Chagas pathology makes understanding the structure and function of mitochondrial DNA particularly relevant in <it>T. cruzi</it>.</p>
         <p><it>T. cruzi </it>is divided into several strains, each with distinct geographic distributions, host preferences and disease severity <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp>. These 'discrete typing units' (DTUs) delineate six subtypes (I, IIa, IIb, IIc, IId and IIe). DTUs IIb and I represent the ancestral <it>T. cruzi </it>lineages. Genetic recombination in <it>T. cruzi </it>occurs through a loosely-defined, whole-cell fusion mechanism <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. The extant population structure can be derived from two discrete fusion events <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. A fusion of DTUs I and IIb generated a hybrid ancestor to DTUs IIa and IIc that has shared elements of recombination and homozygosity and have since diverged from one another. A more recent DTU IIc/IIb fusion gave rise to the largely heterozygous DTUs IId and IIe subtypes. The CL Brener strain chosen for genome sequencing <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> belongs to DTU IIe, and Esmeraldo strain was sequenced to a lesser degree as a representative of DTU IIb. The maxicircle genomes fall into three distinct clades that are partitioned in among DTU I, DTU IIb, and DTUs IIa/IIc/IId/IIe <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp>. The CL Brener and Esmeraldo maxicircles have been assembled in their entirety <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>.</p>
         <p>This report details the isolation of minicircle sequences from the whole genome shotgun (WGS) reads from the CL Brener and Esmeraldo strains of <it>T. cruzi </it>generated as a by-blow of the Genome Sequence Project <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. In combination with the predicted or actual sequences of edited messages, a thorough characterization of all available <it>T. cruzi </it>minicircle data is presented. The features of the assigned gRNAs were then used to generate a selection scheme for gRNA genes in the absence of known editing events. Minicircles were assembled from the genome project sequence reads, revealing two instances of apparent minicircle recombination.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Intragenomic conservation of conserved sequence blocks</p>
            </st>
            <p>A typical <it>T. cruzi </it>minicircle is approximately 1.4-kb and contains four conserved sequence regions, each followed by a variable region containing a gRNA. Each conserved region is composed of three individual conserved sequence blocks (CSB-1, CSB-2, and CSB-3) each of which are broadly conserved <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. Numerous minicircle sequences with CSBs were found in the CL Brener and Esmeraldo strain WGS reads. Multicopy sequences can have extensive variability in kinetoplastids <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>, so to be certain of isolating the greatest number of minicircle sequences, the intragenomic diversity of individual CSB sequences was first assessed by examining the conserved regions of all extracted <it>T. cruzi </it>minicircle sequences using a two-tag method to capture native variability of the intermediate sequence (see Methods).</p>
            <p>Remarkable conservation was observed among the dozens of conserved region sequences identified (Fig <figr fid="F1">1</figr>). Less than 1% of sequences contained variations at each position within the CSBs, and the conservation for each strain extended well beyond the basic consensus. The base CSB-1 sequence was expanded from 10 nt to 18 nt common to CL Brener and Esmeraldo minicircles, with a strong 24-nt stretch in CL Brener. The G-rich, 8-bp CSB-2 element is expanded into a 26-nt purine-rich region in both <it>T. cruzi </it>strains, with a 29 nt length in CL Brener. The 12-nt CSB-3 was extended to 27 nt in both strains. These extended CSBs likely reflect species-specificity and could prove useful for taxonomic distinctions.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Intragenomic conserved region conservation in minicircles</p>
               </caption>
               <text>
                  <p><b>Intragenomic conserved region conservation in minicircles</b>. A) Variation of CSB-1, CSB-2, and CSB-3 was assessed by using two static tags to pull out the natural variability of the third. B) Weblogo diagrams <url>http://weblogo.berkeley.edu</url> show the high degree of intragenomic CSB conservation. At each position the height of the letter represents the proportion of sequences with the base represented by that letter</p>
               </text>
               <graphic file="1471-2164-8-133-1"/>
            </fig>
            <p>The conserved regions of the minicircles represent tags that can be used to pull minicircle sequences from the WGS reads while being confident that few, if any, sequences are missed due to intragenomic variation.</p>
         </sec>
         <sec>
            <st>
               <p>Minicircle sequences extracted from <it>T. cruzi </it>WGS reads</p>
            </st>
            <p>The areas of complete CSB conservation were used as tags to extract as many minicircle reads as possible. A sorting script extracted 54 CL Brener and 53 Esmeraldo sequence reads. If a sequence was found to contain minicircle sequence, then the read from that same clone in the opposite direction was also extracted from the database, and if the 'mate pair' sequences overlapped they were joined. In this manner 32 contiguous sequences were assembled. Dozens of the reads contained several complete variable regions, with each variable region potentially carrying a gRNA. The structural linkages between variable regions were recorded to determine if recombination occurred between minicircle sequence classes and if any correlation existed between editing target and minicircle neighbours.</p>
            <p>Each variable region was expected to encode a single gRNA, thus for the task of gRNA prediction the sequences were broken into individual units of the upstream CSB-3 plus one variable region plus its downstream CSB. A spectrogram alignment of unique units revealed striking regions of nucleotide preference in several places within the variable regions (Fig <figr fid="F2">2</figr>). An enrichment of G residues was evident adjacent to the CSBs, while the central portion of the variable region indicated a bias toward As and Cs. This pattern held true for both CL Brener and Esmeraldo strains.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Alignments of CL Brener and Esmeraldo minicircle variable regions</p>
               </caption>
               <text>
                  <p><b>Alignments of CL Brener and Esmeraldo minicircle variable regions</b>. Spectrogram representation of the CLUSTALX alignment of minicircle sequences identified from the CL Brener (top) and Esmeraldo (bottom) WGS reads. Each base within each read is represented as a coloured square ('T'-red; 'A'-purple; 'C'-blue; 'G'-green).</p>
               </text>
               <graphic file="1471-2164-8-133-2"/>
            </fig>
            <p>The minicircle variable regions displayed specific nucleotide preferences over their length, potentially reflecting the sequence bias of the gRNA genes. We next sought to correlate the putative gRNA genes with RNA editing events, however as many of these sequences are undetermined, some creative sequence manipulation was required in order to proceed.</p>
         </sec>
         <sec>
            <st>
               <p>Editing in <it>T. cruzi</it></p>
            </st>
            <p>One of the major goals of this analysis is to predict gRNAs. The current local alignment method of predicting gRNAs <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> requires knowledge of the sequence for the fully edited target genes that for the most part are not yet known for <it>T. cruzi</it>, ATPase 6 <abbrgrp><abbr bid="B25">25</abbr></abbrgrp> and COII <abbrgrp><abbr bid="B26">26</abbr></abbrgrp> being the only exceptions. We have initiated the experimental determination of <it>T. cruzi </it>editing events, but in the meantime used predictions to facilitate our minicircle study.</p>
            <p>Predicted edited sequences (Additional File <supplr sid="S1">1</supplr>) were generated by manually inserting or deleting Us in the unedited message sequence following the known sequences of the corresponding <it>T. brucei </it>edited mRNAs, while preserving conservation of the resulting amino acid sequence. Predicted sequences were generated for: CyB, CR3, CR4, MURF2, ND3, ND4, ND7, ND8, ND9, and RPS12. Concurrently, the edited sequence of COIII was obtained by RT-PCR for CL Brener: When compared with our <it>in silico </it>prediction, the true COIII sequence differed from the predicted version such that two of the estimated 30 gRNAs would have been missed (data not shown). For this reason, sizeable portions of the other predicted sequences were expected to yield many useful gRNA predictions. Additional confidence was gained for segments of predicted sequence where high-scoring matches with putative gRNAs were found.</p>
            <suppl id="S1">
               <title>
                  <p>Additional File 1</p>
               </title>
               <text>
                  <p>Predicted fully edited mRNAs for CL Brener strain of <it>T. cruzi </it>(FASTA format). Sequences were generated by manually inserting or deleting Us in the unedited message sequence following the known sequences of the corresponding <it>T. brucei </it>edited mRNAs, while preserving conservation of the resulting amino acid sequence.</p>
               </text>
               <file name="1471-2164-8-133-S1.fas">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>A combination of actual and predicted mRNA sequences was used for exploring the gRNA assignments to specific genes. This method is not perfect due to the potential for error in our predictions, but these areas will be clarified as the edited sequences are obtained at the bench.</p>
         </sec>
         <sec>
            <st>
               <p><it>T. cruzi </it>gRNAs identified in minicircle and maxicircle sequences</p>
            </st>
            <p>Potential gRNAs were identified among minicircle sequences from the genome project and in GenBank using a three-part process (Fig. <figr fid="F3">3abc</figr>): the single greatest Smith-Waterman local alignment score for a variable region was most likely to represent the overlap of the gRNA and target mRNA region. A permutation test was performed to determine the probability of a given 'best overlap', and then a false discovery rate (FDR &#8211; see Methods) control was used to determine whether a hybridization with a given probability was significant (Fig. <figr fid="F3">3c</figr>). This statistic provided the criterion used to identify 108 potential gRNAs from 248 minicircle variable regions obtained from GenBank <abbrgrp><abbr bid="B25">25</abbr><abbr bid="B27">27</abbr></abbrgrp>, CL Brener and Esmeraldo WGS reads (Additional File <supplr sid="S2">2</supplr>). The predicted minicircle gRNAs are positioned consistently within the variable region, providing more evidence that these highly scoring alignments represent regions of gRNA hybridization to target mRNAs.</p>
            <suppl id="S2">
               <title>
                  <p>Additional File 2</p>
               </title>
               <text>
                  <p>Text file of predicted gRNAs. File contains predicted mRNA/gRNA hybridizations and scores for those predicted interactions. gRNAs were predicted from minicircle sequences using Smith-Waterman local alignments with predicted mRNA sequences listed in Additional File <supplr sid="S1">1</supplr>.</p>
               </text>
               <file name="1471-2164-8-133-S2.txt">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>gRNA identification method and validation</p>
               </caption>
               <text>
                  <p><b>gRNA identification method and validation</b>. A) A local alignment was performed for each minicircle sequence against a library of predicted edited mRNA sequences yielding a single best score. B) 1000 permutations and local alignment batteries were performed as a method of calculating the approximate probability of a given best score. C) The best hybridization scores for each minicircle sequence were ranked by cumulative probability from 0 to 1 (circles). All points to the left of the intersection with the false discovery rate threshold (heavy solid line) were deemed to represent scores from predicted gRNAs. D) Using false discovery rate to control for multiple testing, alignments with scores deemed significant were said to be predicted gRNAs.</p>
               </text>
               <graphic file="1471-2164-8-133-3"/>
            </fig>
            <p>The predicted maps of RNA editing for <it>T. cruzi </it>(Additional File <supplr sid="S3">3</supplr>) showed that while the final edited mRNA was conserved due to the required amino acid composition of the resulting proteins the gRNAs that perform the editing were variable from one strain to the next; this variability was not restricted to transition mutations. The inexact line-up of gRNAs from different strains on certain regions of the COIII map (Fig <figr fid="F4">4</figr>, <figr fid="F5">5</figr>, <figr fid="F6">6</figr>), for example, demonstrated that the start and stop positions of hybridization can drift from strain to strain, and that gRNA heterogeneity may occur within isolates.</p>
            <suppl id="S3">
               <title>
                  <p>Additional File 3</p>
               </title>
               <text>
                  <p>PDF file containing maps of edited mRNA sequences showing sites of predicted gRNA associations, also indicated are the various DTUs of the strain from which the sequence is derived. The format of presentation is similar to that of Figure <figr fid="F4">4</figr>, <figr fid="F5">5</figr>, <figr fid="F6">6</figr>, with the gRNA sequences shown below the predicted mRNA sequences.</p>
               </text>
               <file name="1471-2164-8-133-S3.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>COIII gRNA map for <it>T. cruzi</it></p>
               </caption>
               <text>
                  <p><b>COIII gRNA map for <it>T. cruzi</it></b>. Beneath the unedited COIII sequence, predicted gRNAs are mapped onto the fully edited CL Brener COIII mRNA sequence. Beside each putative gRNA is the WGS clone name or GenBank sequence identifier followed by the DTU strain designation for that sequence. Watson-Crick base pairs, '|', and non-canonical G-U base pairs, ':' are indicated.</p>
               </text>
               <graphic file="1471-2164-8-133-4"/>
            </fig>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>COIII gRNA map for <it>T. cruzi</it></p>
               </caption>
               <text>
                  <p><b>COIII gRNA map for <it>T. cruzi</it></b>. Beneath the unedited COIII sequence, predicted gRNAs are mapped onto the fully edited CL Brener COIII mRNA sequence. Beside each putative gRNA is the WGS clone name or GenBank sequence identifier followed by the DTU strain designation for that sequence. Watson-Crick base pairs, '|', and non-canonical G-U base pairs, ':' are indicated.</p>
               </text>
               <graphic file="1471-2164-8-133-5"/>
            </fig>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>COIII gRNA map for <it>T. cruzi</it></p>
               </caption>
               <text>
                  <p><b>COIII gRNA map for <it>T. cruzi</it></b>. Beneath the unedited COIII sequence, predicted gRNAs are mapped onto the fully edited CL Brener COIII mRNA sequence. Beside each putative gRNA is the WGS clone name or GenBank sequence identifier followed by the DTU strain designation for that sequence. Watson-Crick base pairs, '|', and non-canonical G-U base pairs, ':' are indicated.</p>
               </text>
               <graphic file="1471-2164-8-133-6"/>
            </fig>
            <p>In addition to gRNAs identified from minicircle sequences, two gRNAs were predicted by local alignments with high confidence to reside within CL Brener maxicircles. Both maxicircle gRNAs are on the opposite strand of nearby protein-coding genes, with an ND7 gRNA placed 95-bp downstream of ND5 and approximately 500-bp upstream of the repetitive region, and a MURF 2 gRNA located 66-bp downstream of the CR4 gene overlapping the start codon of ND4. The previously-described COII gRNA lies immediately downstream of the COII gene on the same coding strand <abbrgrp><abbr bid="B25">25</abbr><abbr bid="B27">27</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Areas of nucleotide preference within minicircle variable regions</p>
            </st>
            <p>The spectrograms of full minicircle units (CSB to CSB) revealed areas of conserved nucleotide preference within the variable region (Fig <figr fid="F2">2</figr>). To explore this further, the sequences with identified gRNAs were aligned as follows: regions upstream of the gRNAs were aligned to their 3' ends, gRNAs were all aligned at +1, 5' to 3' of predicted hybridization, and regions downstream of the gRNAs were aligned by their 5' ends.</p>
            <p>The resulting spectrogram revealed nucleotide preferences with respect to these orientations (Fig <figr fid="F7">7</figr>). As might be expected to promote annealing to the target message, the 5'-anchor region of the gRNAs had a higher than background preference for Cs (blue), and the entire gRNA had a slightly higher T to A skew (red to purple) with a spike centred approximately 12 nt into the predicted gRNA hybridization. While the areas immediately around the gRNAs were more T-rich (red) than the gRNAs themselves, the most clear transition occurred approximately 50-nt downstream of the gRNA where the preference for As and Ts dropped sharply and the region became more G-rich (green).</p>
            <fig id="F7">
               <title>
                  <p>Figure 7</p>
               </title>
               <caption>
                  <p><it>T. cruzi </it>minicircle gRNA alignments reveal sequence bias</p>
               </caption>
               <text>
                  <p><b><it>T. cruzi </it>minicircle gRNA alignments reveal sequence bias</b>. Sequences containing predicted gRNAs were aligned by the boundaries of the hybridization prediction. A) Spectrogram ('T'-red; 'A'-purple; 'C'-blue; 'G'-green) of aligned minicircle sequences. B) Ratios of nucleotide composition ('T'-red; 'A'-purple; 'C'-blue; 'G'-green) of aligned sequences by position.</p>
               </text>
               <graphic file="1471-2164-8-133-7"/>
            </fig>
            <p>Given that the precise switch was dictated more by distance from the end of predicted hybridization than relative to the CSBs, these observations may have implications for the transcription and maturation of gRNAs. Note that the 5' ends of these gRNAs have not been physically mapped, and that corrections in the predicted editing events may either extend or decrease the 5' and 3' ends of the actual gRNA.</p>
         </sec>
         <sec>
            <st>
               <p>Guide RNAs predicted using a simple Hidden Markov Model</p>
            </st>
            <p>Although dozens of gRNAs were predicted using local alignments, approximately half of the minicircle sequences still lacked a statistically significant gRNA-mRNA match. Because the nucleotide preferences observed for assigned gRNAs had a direct correlation to the position of the gene within the minicircle sequence, landmarks for independent gRNA predictions in the absence of known editing events were derived. To do so required a statistical model and a method of finding the optimum path of a sequence through that model. In this case the Viterbi algorithm was used to find the best path through a Hidden Markov Model (HMM) for each known <it>T. cruzi </it>minicircle sequence (Fig <figr fid="F8">8</figr>, <figr fid="F9">9</figr>).</p>
            <fig id="F8">
               <title>
                  <p>Figure 8</p>
               </title>
               <caption>
                  <p>A Hidden Markov Model for the independent prediction of gRNA genes</p>
               </caption>
               <text>
                  <p><b>A Hidden Markov Model for the independent prediction of gRNA genes</b>. Shown here is the overall structure of the HMM used to predict gRNAs from nucleotide probabilities. The first line depicts the four states: 'upstream', 'gRNA', 'downstream 1' and 'downstream 2', as well as the presence of sub-states for each. The implicit 'START' and 'END' states are not shown. The transition probabilities from the gRNA state are shown, otherwise the transitions are assumed to occur invariably. Depicted below the model is a visual representation of the nucleotide preferences for each state (T-red, A-purple, G-green, C-blue) and the emission probabilities used for each state.</p>
               </text>
               <graphic file="1471-2164-8-133-8"/>
            </fig>
            <fig id="F9">
               <title>
                  <p>Figure 9</p>
               </title>
               <caption>
                  <p>HMM predicted gRNA alignment</p>
               </caption>
               <text>
                  <p><b>HMM predicted gRNA alignment</b>. Shown in the standard spectrogram format used for Fig 7, the gRNAs predicted by HMM for all known <it>T. cruzi </it>minicircle sequences are presented, including over 100 sequences for which no gRNA was predicted using local alignments.</p>
               </text>
               <graphic file="1471-2164-8-133-9"/>
            </fig>
            <p>The 108 minicircle reads carrying gRNAs predicted by local alignments were first used to test the ability of this method to correctly predict those gRNAs. Although the 5' and 3' ends show differences with those predicted by local alignments, 85% of HMM-predicted gRNAs overlapped the local alignment predictions by at least 20 nt (Additional File <supplr sid="S4">4</supplr>). The remaining 15% contained very little or no overlap with the local alignment prediction. This method was then applied to all known <it>T. cruzi </it>minicircle sequences to predict the gRNAs (Additional File <supplr sid="S5">5</supplr>) and their locations within each minicircle sequence (Fig <figr fid="F9">9</figr>).</p>
            <suppl id="S4">
               <title>
                  <p>Additional File 4</p>
               </title>
               <text>
                  <p>Image (png format) of HMM test to predict gRNAs obtained by local alignment (LA). Presented in pairs of lines, this spectrogram alignment represent gRNA predictions with the first line representing the HMM prediction and the second line in each pair representing the LA prediction. Line 1 and line 2 therefore represent the HMM and LA gRNA predictions (respectively) for the same sequence, and lines 3 and 4 the HMM and LA predictions for a separate minicircle sequence. A pair of lines is presented for each gRNA sequence with overlapping predictions.</p>
               </text>
               <file name="1471-2164-8-133-S4.png">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S5">
               <title>
                  <p>Additional File 5</p>
               </title>
               <text>
                  <p>This file lists the most likely gRNA sequence from each <it>T. cruzi </it>minicircle sequence as predicted using the HMM. FASTA format. An 85% accuracy is expected for those variable regions that do contain gRNAs. The fraction of variable regions that do not contain a functioning gRNA is not known, and for this reason some of these sequences may represent false predictions.</p>
               </text>
               <file name="1471-2164-8-133-S5.fas">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>Compared to the local alignment placements (Fig <figr fid="F7">7</figr>), the nucleotide preferences surrounding the HMM predictions were even more visually pronounced (Fig <figr fid="F9">9</figr>). While this was to be expected given the neutrality of the local alignment method to nucleotide usage and the reliance of the HMM on context-dependent usage, there was not enough empirical evidence regarding <it>T. cruzi </it>gRNAs to characterize the accuracy of each prediction method. The validity of these predictions will be determined with further experimental editing information. Some of these sequences may also represent pseudogenes.</p>
         </sec>
         <sec>
            <st>
               <p>Recombination of minicircle sequence classes</p>
            </st>
            <p>Each variable region class was assigned a unique identifier. A single <it>T. cruzi </it>minicircle read could contain a string of up to four such class numbers. In general the functional target mRNA of a gRNA had no bearing on its structural location relative to other gRNAs on a minicircle sequence, similar to the situation in <it>T. brucei</it>. Recombination among minicircles had not been documented, so this linkage information inherent within the sequence reads was examined for apparently contradictory links between sequence classes.</p>
            <p>Two examples of discontinuity were found. Reads CLARO12TF and TCGA393TF contain identical downstream variable regions, but the upstream variable regions, separated by the intermediate CSBs, are completely different (Fig <figr fid="F10">10</figr>). The same was true for clones CLAOS79 and CMCBV30. These alternate linkages were evidence of recombination among minicircles.</p>
            <fig id="F10">
               <title>
                  <p>Figure 10</p>
               </title>
               <caption>
                  <p>Recombination of minicircle sequence classes</p>
               </caption>
               <text>
                  <p><b>Recombination of minicircle sequence classes</b>. A CT 'barcode' spectrogram ('T'-red; 'C'-blue) of CLUSTALX alignment of WGS reads containing identical minicircle sequences. The CSB can be seen here as C-rich regions. Clone CLARO12 displays a combination of variable regions unlike those found on clones TCGA393, CLASB69 or CLAVZ20 and clone CLAOS79 bears a unique combination distinct from that found in clone CMCBV30.</p>
               </text>
               <graphic file="1471-2164-8-133-10"/>
            </fig>
            <p>The minicircle sequences presented represent a fraction of minicircle sequence and as two of 54 CL Brener reads bore evidence of recombination extrapolated to a level of 4% recombinants in the overall minicircle population. However, the minicircle reads presented do not represent complete coverage, and few reads contain three, let alone all four, linked variable regions. This incomplete linkage information is likely to compound the already limited scope of the data, and for these reasons 4% represents a conservative and imprecise estimate for the percentage of minicircle recombinants.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>We present here a population study of the minicircles of <it>T. cruzi </it>using the identification of 108 minicircle sequences from WGS reads for the CL Brener and Esmeraldo strains of <it>T. cruzi</it>, some of which carried evidence of recombination among minicircles. These 108 minicircle sequences, in combination with previously published minicircle and maxicircle sequences, were used to predict 110 total gRNAs using Smith-Waterman local alignments. The positions of these putative gRNAs within the variable region revealed clear nucleotide preferences within and around the gRNAs that were used to create a simple HMM capable of predicting gRNAs from primary minicircle sequence alone. The predicted gRNAs were mapped onto the predicted or experimentally determined sequences of fully edited mitochondrial mRNAs.</p>
         <p>The remarkable CSB conservation observed across kinetoplastids <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> is consistent with the observation that there is little intragenomic variability among these sequences. The observation of specific regions of nucleotide preference within the variable region, in particular the ~50-bp relatively 'AT'-rich (and 'C'-poor) region immediately downstream of all predicted gRNAs from minicircle sequences, was unexpected. Given the less linear appearance of this preference when aligned by CSBs, this feature appears to be dependent on gRNA position within the variable region and may be important for some aspect of gRNA transcription or maturation. Although some interesting features of minicircles in other kinetoplastids have been observed, almost nothing is known about the transcription or maturation of gRNAs in any kinetoplastid <abbrgrp><abbr bid="B28">28</abbr><abbr bid="B27">27</abbr></abbrgrp>. If the nucleotide preferences were involved in maturation one would also expect to see them surrounding gRNAs discovered on the maxicircle, but this is not the case. This suggests that these sequences are involved rather in minicircle-specific process, although it cannot be excluded that they have no biological importance, merely reflecting the common ancestry of all minicircle sequences. If these sequences are involved in gRNA transcription, it would suggest that maxicircle and minicircle transcription occurs in different manners, using different <it>cis </it>signals. The maxicircle gRNAs discovered, specifically the overlap of the <it>MURF2 </it>gRNA with the <it>ND4 </it>start codon, provide additional evidence that both strands of the maxicircle are transcribed and that the mechanisms of mitochondrial gene expression are likely to be more complex than our current understanding <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>.</p>
         <p>Because G-U base pairs in complementary RNA strands impart transition tolerance into the editing process, current methods of identifying gRNAs are limited to situations where the fully edited mRNA is known <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> or predicted. The use of permutation tests and false discovery rate control extends the use of local alignments, enabling the robust and systematic identification of gRNAs. The local alignment method used here can tolerate mismatches and gaps that might arise as a result of errant mRNA sequence predictions. It also allows the discovery of pseudo-gRNAs that contain mismatches and have, therefore, lost their functionality. Given the characteristic nucleotide preferences found around the gRNAs, it was possible to develop a method of gRNA prediction that requires no prior knowledge of mRNA sequence.</p>
         <p>In addition to many other uses, HMMs have been used to predict genes and alternate splice sites <abbrgrp><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr></abbrgrp>. Using the characteristic nucleotide probabilities found at specific distances from gRNAs, an HMM was constructed, representing the first attempt to predict gRNAs independent of known editing events. Although offering useful overlap, these predictions do not always or match exactly the local alignment predictions and how either of these predictions match the actual RNA sequences of the molecules these predictions represent is unclear. In the case of local alignments, only areas of hybridization are detected while other potentially transcribed regions are ignored or areas that are not transcribed are included. While it appears that the HMM is 'missing' or 'adding' sequence compared to the local alignment used to train it, if the transcription or maturation of gRNAs depends on local nucleotide preferences in some way, then the HMM predictions may turn out to be closer than the local alignment predictions. The HMM employed here is extremely simple and with more experimental evidence about gRNAs in <it>T. cruzi </it>this model could be further articulated, yielding more accurate predictions. Our HMM is specific to <it>T. cruzi</it>, but examinations of <it>Leishmania </it>minicircles reveal distinct species-specific regions of nucleotide preference that also align with gRNA position (data not shown). Useful models for these and other kinetoplastids could also be constructed.</p>
         <p>Minicircle based assays have been used as markers for <it>T. cruzi </it>diversity <abbrgrp><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp>. Based on the gRNA maps, although the resulting protein encoded by the message is highly conserved, the position and sequence of certain gRNAs can be strain-specific. Restriction fragment length polymorphisms could be developed to distinguish these differences, providing another rapid method of determining from which subtype(s) a given DNA sample is from. Further cataloguing and comparison of matching gRNA genes from different strains and DTUs will provide useful information regarding the kinetoplast molecular clock and functional transition tolerance within the gRNAs.</p>
         <p>From this minimal dataset two examples of minicircle recombination were identified. Recombination may occur relatively frequently among minicircles, as it does within the DNA of animal mitochondria <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. The presence of thousands of molecules sharing four conserved sequence regions may enhance even low-level homologous recombination. Computer simulations of minicircle evolution in <it>Leishmania </it>show that random segregation can account for the plasticity and frequency of minicircle sequence classes <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>, however <it>Leishmania </it>minicircles contain only one gRNA each, thereby minimizing the effects of any possible recombination. With evidence of recombination among minicircles, this new dynamic must be incorporated into future models of minicircle evolution in <it>T. cruzi </it><abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. Active recombination may serve as a mechanism to weed out pseudogenes.</p>
         <p>Roughly half of all extracted minicircle reads contained a predicted gRNA, and several likely explanations exist for this observation, the first of which is that the predicted editing of <it>T. cruzi </it>genes is likely to be incorrect in places, preventing detection of gRNAs that edit those regions. Certainly, the maps for known sequences like ATPase 6 and COIII are more complete than those of other genes. Guide RNAs with small hybridization footprints may also be lost in the statistical noise. The most biologically relevant possibility is that there are probably many variable regions that contain pseudo-guides that once performed a role in editing but have subsequently lost that function. The future sequencing of the edited RNAs for all <it>T. cruzi </it>maxicircle genes will aid gRNA identification and validation, add to the incomplete maps presented here, and give insight into the prevalence of defunct variable regions in <it>T. cruzi </it>minicircles.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>Incidentally sequenced minicircles were identified among the WGS reads, serving as a reminder that high-throughput data can often have uses beyond that for which the collection methods are designed, and that public availability of this data allows these uses to be discovered. We used these 'contaminants' to glean information about the minicircle population in <it>T. cruzi </it>and found evidence of recombination among minicircle sequence classes that suggest a degree of plasticity not explicitly accounted for in many models of minicircle evolution. Together with previously published minicircle sequences, gRNAs were systematically identified, generating maps of RNA editing for many <it>T. cruzi </it>maxicircle genes. Although by their nature incomplete, these maps represent a starting point from which to completely characterize the extent of RNA editing in <it>T. cruzi</it>. The large number of identified genes allowed the design of a minicircle gRNA prediction model that can locate gRNA genes in the absence of specific mRNA editing information.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Identifying minicircle sequences among WGS reads</p>
            </st>
            <p>In order to identify the intragenomic variability of CSB sequences within the conserved regions, a two-tag discovery strategy was employed using the WGS data from CL Brener and Esmeraldo. The first step was to search restrictively for two tags (CSB-1 and CSB-3 for example) while capturing all of the variability of the third tag between the two landmarks (CSB-2 in the previous example). Employed combinatorially, the intragenomic variability of CSB-1, CSB-2, and CSB-3 was determined. From this information, search parameters were designed to allow positive identification of any number of full CSB-CSB units within any query sequence, without excluding sequences on the basis of slight intragenomic CSB variation. This method was 'greedy' in that if a given sequence contained multiple units it captured all of them at once, thereby preserving any physical linkages between variable regions. To assemble mate pairs with overlapping sequence, a standard overlap alignment algorithm was implemented in the form of a perl script to assemble sequences. Once assembled, the contiguous sequences were each checked by eye using BioEdit. CSB conservation was depicted using Weblogo <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>, and spectrograms were constructed using CLUSTALX (alignments), BioEdit (alignment editing and visualization), and Adobe Photoshop (building composites of BioEdit screenshots for spectrograms).</p>
         </sec>
         <sec>
            <st>
               <p>COIII RT-PCR</p>
            </st>
            <p>A primer complementary to the predicted 3' end of COIII (2 pmol) was added to dNTPs (1 &#956;L of a 10 mM mix) and RNA (1 &#956;g) from CL Brener strain cells in a 12 &#956;L reaction and heated to 65&#176;C for 5 minutes, then placed on ice. For a 20 &#956;L reaction, 4 &#956;L of 5&#215; First Strand Buffer (Invitrogen) and 2 &#956;L of 0.1 M DTT were added. The reaction was warmed to 42&#176;C in a thermocycler for 2 min. and 1 &#956;L of SuperScript&#8482; II (Invitrogen) was added and the tube gently mixed. The reaction was incubated at 42&#176;C for 50 min. PCR was then performed on the RT reaction and a no-RT control using a 55&#176;C annealing temperature. A ~900-bp band seen with ethidium staining represented the full length edited COIII cDNA that was then cloned using a TOPO TA cloning kit (Invitrogen) and sequenced. No signal appeared in the control lane after 30 cycles. Primers used: COIII Forward, TATATTTGTTGGTGTTAGTGG; COIII Reverse, TTATACACACAAATACATAACG.</p>
         </sec>
         <sec>
            <st>
               <p>Local alignments</p>
            </st>
            <p>The alignment algorithm used was a perl implementation of the Smith-Waterman dynamic programming method of determining the optimal hybridization of a query sequence with a target sequence <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. Local alignments allow overhangs on the part of the target and query without penalty, ensuring that <it>in silico </it>hybridizations of the gRNA with the mRNA could be detected even though the gRNA makes up only a portion of the minicircle query sequence and would hybridize with only a small region of the target mRNA sequence. The only peculiarity of the algorithm used here was the scoring matrix: it was designed to detect complementary (not identity), it allowed for the fact that non-canonical G-U base pairs are common, and it used a large gap penalty, reflecting the expectation that the gRNA should be exactly complementary to the target sequence it edits. For each minicircle sequence, optimal local alignments were performed against the fully edited sequence (predicted or actual) of every edited maxicircle gene and the information from the best alignment was saved to determine its significance (Fig <figr fid="F3">3A</figr>). Although many matrices were tested, the weight matrix used here evaluated a match (A-U, G-U, C-G) and with a score of 4, a mismatch (A-A, C-C, G-G, U-U, A-C, A-G, U-C) with a score of -20, and assessed a gap penalty of 100.</p>
         </sec>
         <sec>
            <st>
               <p>Probabilities of local alignment scores &#8211; permutation tests</p>
            </st>
            <p>In order to determine the probability of a given 'best' local alignment score, the distribution of such scores is required for sequences of a given length and composition against a standard mRNA library. Given these factors this distribution is neither standard nor easy to derive. The method used here to address this problem is the permutation test <abbrgrp><abbr bid="B40">40</abbr><abbr bid="B41">41</abbr><abbr bid="B42">42</abbr></abbrgrp> implemented in perl. If a minicircle sequence is permuted, losing all information while retaining the overall nucleotide frequencies, the resulting 'best' score against the mRNA library will be drawn at a certain probability from the unknown but inherent distribution of scores. As the number of permutations sampled approaches infinity, the approximation of that unknown distribution approaches the actual distribution. In this manner the probability of observing a given best local alignment score was empirically estimated using 1000 gRNA permutations (Fig. <figr fid="F3">3B</figr>). Although extremely computationally intensive, this remains the best method of approximating probability distributions that are difficult to derive mathematically.</p>
         </sec>
         <sec>
            <st>
               <p>Significance of local alignment scores &#8211; false discovery rate control</p>
            </st>
            <p>When multiple tests are performed with each allowing a 5% chance of a false positive, the overall number of false positives received can be quite high. Many methods have been suggested to control for multiple testing and several have been broadly applied in biological research <abbrgrp><abbr bid="B43">43</abbr><abbr bid="B44">44</abbr></abbrgrp>. The false discovery rate (FDR) method of correcting for multiple sampling <abbrgrp><abbr bid="B45">45</abbr></abbrgrp> was most appropriate for this study given the expectation of discovering exactly one appropriate gRNA match for each minicircle sequence. To perform FDR control (Fig. <figr fid="F3">3C</figr>), where 'X' is a data set including 'n' points, X is ranked in decreasing order of probability (from 0 to 1) by probability such that P(X<sub>n</sub>) > P(X<sub>n-1</sub>)>... X<sub>1</sub>. Then for i = n &#8594; 1, where &#945; = 0.05, the following stepwise algorithm is applied: <it>if at any step <inline-formula><m:math name="1471-2164-8-133-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mfrac><m:mrow><m:mi>i</m:mi><m:mo>&#8901;</m:mo><m:mi>&#945;</m:mi></m:mrow><m:mi>n</m:mi></m:mfrac><m:mo>></m:mo><m:mi>p</m:mi><m:mi>r</m:mi><m:mi>o</m:mi><m:mi>b</m:mi><m:mo stretchy="false">(</m:mo><m:msub><m:mi>X</m:mi><m:mi>i</m:mi></m:msub><m:mo stretchy="false">)</m:mo></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaWcaaqaaiabdMgaPjabgwSixJGaciab=f7aHbqaaiabd6gaUbaacqGH+aGpcqWGWbaCcqWGYbGCcqWGVbWBcqWGIbGycqGGOaakcqWGybawdaWgaaWcbaGaemyAaKgabeaakiabcMcaPaaa@3E7A@</m:annotation></m:semantics></m:math></inline-formula> then k = i. Reject the null hypotheses for all points from X</it><sub>1 </sub>to <it>X</it><sub><it>k</it></sub>. This analysis was performed in perl, with the results visualized by Microsoft Excel (Fig <figr fid="F3">3D</figr>).</p>
         </sec>
         <sec>
            <st>
               <p>Predicting gRNAs using a simple Hidden Markov Model</p>
            </st>
            <p>HMMs use the known 'emission' probabilities of an observed variable to predict a 'hidden' variable with defined 'transition' probabilities to and from all possible states of that hidden variable. In this case, the differing nucleotide preferences along the minicircle sequence are used to determine the probability that a given nucleotide is positioned within the gRNA state, the hidden state we wish to know. The HMM constructed here consists of four distinct states each with characteristic nucleotide preferences (Fig <figr fid="F8">8</figr>). The upstream (U) state is itself a collection of 40 individual states each with identical nucleotide probabilities and a definite transition onwards to the next sub-state with a probability of 1. After beginning in the 'start' state, a gRNA sequence will enter the upstream state and invariably transition to the gRNA state, the final sequence of which constitutes a gRNA prediction. A nucleotide in the gRNA state can transition back into a gRNA state with a high probability or can transition on to the downstream region which itself is made up of two states, D1 and D2. Like the upstream region the two downstream regions are themselves composed of multiple sub-states with identical nucleotide probabilities, with D1 containing 48 nucleotides and D2 containing 25. The sequences finally leave the model through an 'end' state. The nucleotide probabilities for each state were calculated using the predicted gRNA alignments generated by local alignments (Fig <figr fid="F7">7</figr>). The optimal chain of these states through each minicircle sequence can be recognized as the chain with the highest cumulative probability under the assumption that the probability of transition to a subsequent state depends only upon the nature of the current state and assuming that there is independence among each observed point in the chain. While neither of these assumptions is precisely true for nucleotide sequences, HMMs have been used successfully to predict genes and alternative splicing <abbrgrp><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr></abbrgrp>. A perl implementation of the commonly used Viterbi algorithm was applied here to determine this optimal path through each minicircle sequence <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>ST extracted minicircle sequences, predicted edited sequences of maxicircle genes, performed all analyses of minicircle sequences and drafted the manuscript. IT performed RT-PCR and sequencing of COIII. SW predicted edited sequences of maxicircle genes. NS conceived of the study, participated in data interpretation and preparation of the manuscript. All authors have read and approve of this manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We thank all of those whose work on the <it>T. cruzi </it>genome project that made this study possible. We would also like to thank David Campbell, Robert Hitchcock, Bidyottam Mitra and Jesse Zamudio for helpful discussions and/or critical reading of the manuscript. Sequence data for this study was obtained from GenBank and The Institute for Genomic Research <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>. Funding for the <it>T. cruzi </it>genome project was provided by the National Institute of Allergy and Infectious Disease (NIAID). SJW was funded through the UCLA IGERT training grant, and this work was support through grant NIH AI056034 to D. Campbell and NRS.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Trypanosoma cruzi-induced molecular mimicry and Chagas' disease</p>
            </title>
            <aug>
               <au>
                  <snm>Girones</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Cuervo</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Fresno</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Curr Top Microbiol Immunol</source>
            <pubdate>2005</pubdate>
            <volume>296</volume>
            <fpage>89</fpage>
            <lpage>123</lpage>
            <xrefbib>
               <pubid idtype="pmpid">16323421</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Chagas disease: a role for autoimmunity?</p>
            </title>
            <aug>
               <au>
                  <snm>Tarleton</snm>
                  <fnm>RL</fnm>
               </au>
            </aug>
            <source>Trends Parasitol</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <issue>10</issue>
            <fpage>447</fpage>
            <lpage>451</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.pt.2003.08.008</pubid>
                  <pubid idtype="pmpid" link="fulltext">14519582</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Mitochondrial genomes: anything goes</p>
            </title>
            <aug>
               <au>
                  <snm>Burger</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Gray</snm>
                  <fnm>MW</fnm>
               </au>
               <au>
                  <snm>Lang</snm>
                  <fnm>BF</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <issue>12</issue>
            <fpage>709</fpage>
            <lpage>716</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tig.2003.10.012</pubid>
                  <pubid idtype="pmpid" link="fulltext">14642752</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Radically different maxicircle classes within the same kinetoplast: an artefact or a novel feature of the kinetoplast genome?</p>
            </title>
            <aug>
               <au>
                  <snm>Flegontov</snm>
                  <fnm>PN</fnm>
               </au>
               <au>
                  <snm>Kolesnikov</snm>
                  <fnm>AA</fnm>
               </au>
            </aug>
            <source>Kinetoplastid Biol Dis</source>
            <pubdate>2006</pubdate>
            <volume>5</volume>
            <fpage>5</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1578572</pubid>
                  <pubid idtype="pmpid" link="fulltext">16978422</pubid>
                  <pubid idtype="doi">10.1186/1475-9292-5-5</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Editing of kinetoplastid mitochondrial mRNAs by uridine addition and deletion generates conserved amino acid sequences and AUG initiation codons</p>
            </title>
            <aug>
               <au>
                  <snm>Shaw</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Feagin</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Stuart</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Simpson</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1988</pubdate>
            <volume>53</volume>
            <issue>3</issue>
            <fpage>401</fpage>
            <lpage>411</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0092-8674(88)90160-2</pubid>
                  <pubid idtype="pmpid" link="fulltext">2452696</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>A model for RNA editing in kinetoplastid mitochondria: "guide" RNA molecules transcribed from maxicircle DNA provide the edited information</p>
            </title>
            <aug>
               <au>
                  <snm>Blum</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Bakalara</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Simpson</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1990</pubdate>
            <volume>60</volume>
            <issue>2</issue>
            <fpage>189</fpage>
            <lpage>198</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0092-8674(90)90735-W</pubid>
                  <pubid idtype="pmpid">1688737</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Leishmania tarentolae minicircles of different sequence classes encode single guide RNAs located in the variable region approximately 150 bp from the conserved region</p>
            </title>
            <aug>
               <au>
                  <snm>Sturm</snm>
                  <fnm>NR</fnm>
               </au>
               <au>
                  <snm>Simpson</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1991</pubdate>
            <volume>19</volume>
            <issue>22</issue>
            <fpage>6277</fpage>
            <lpage>6281</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">329139</pubid>
                  <pubid idtype="pmpid" link="fulltext">1720240</pubid>
                  <pubid idtype="doi">10.1093/nar/19.22.6277</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>http://dna.kdna.ucla.edu/trypanosome/database.html</p>
            </title>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Integration of Trypanosoma cruzi kDNA minicircle sequence in the host genome may be associated with autoimmune serum factors in Chagas disease patients</p>
            </title>
            <aug>
               <au>
                  <snm>Sim&#245;es-Barbosa</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Barros</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Nitz</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Arga&#241;araz</snm>
                  <fnm>ER</fnm>
               </au>
               <au>
                  <snm>Teixeira</snm>
                  <fnm>AR</fnm>
               </au>
            </aug>
            <source>Mem Inst Oswaldo Cruz</source>
            <pubdate>1999</pubdate>
            <volume>94 Suppl 1</volume>
            <fpage>249</fpage>
            <lpage>252</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10677727</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Possible integration of Trypanosoma cruzi kDNA minicircles into the host cell genome by infection</p>
            </title>
            <aug>
               <au>
                  <snm>Teixeira</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Arga&#241;araz</snm>
                  <fnm>ER</fnm>
               </au>
               <au>
                  <snm>Freitas</snm>
                  <fnm>LH</fnm>
                  <suf>Jr.</suf>
               </au>
               <au>
                  <snm>Lacava</snm>
                  <fnm>ZG</fnm>
               </au>
               <au>
                  <snm>Santana</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Luna</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Mutat Res</source>
            <pubdate>1994</pubdate>
            <volume>305</volume>
            <issue>2</issue>
            <fpage>197</fpage>
            <lpage>209</lpage>
            <xrefbib>
               <pubid idtype="pmpid">7510031</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Hitchhiking Trypanosoma cruzi minicircle DNA affects gene expression in human host cells via LINE-1 retrotransposon</p>
            </title>
            <aug>
               <au>
                  <snm>Sim&#245;es-Barbosa</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Arga&#241;araz</snm>
                  <fnm>ER</fnm>
               </au>
               <au>
                  <snm>Barros</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>de C&#225;ssia Rosa</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Alves</snm>
                  <fnm>NP</fnm>
               </au>
               <au>
                  <snm>Louvandini</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>M.R.</snm>
                  <fnm>DSA</fnm>
               </au>
               <au>
                  <snm>Nitz</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Sturm</snm>
                  <fnm>NR</fnm>
               </au>
               <au>
                  <snm>Nascimento</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Teixeira</snm>
                  <fnm>ARL</fnm>
               </au>
            </aug>
            <source>Mem Inst Oswaldo Cruz</source>
            <pubdate>2006</pubdate>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Differential transcription profiles in Trypanosoma cruzi associated with clinical forms of Chagas disease: Maxicircle NADH dehydrogenase subunit 7 gene truncation in asymptomatic patient isolates</p>
            </title>
            <aug>
               <au>
                  <snm>Baptista</snm>
                  <fnm>CS</fnm>
               </au>
               <au>
                  <snm>Vencio</snm>
                  <fnm>RZ</fnm>
               </au>
               <au>
                  <snm>Abdala</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Carranza</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Westenberger</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Silva</snm>
                  <fnm>MN</fnm>
               </au>
               <au>
                  <snm>Pereira</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Galvao</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Gontijo</snm>
                  <fnm>ED</fnm>
               </au>
               <au>
                  <snm>Chiari</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Sturm</snm>
                  <fnm>NR</fnm>
               </au>
               <au>
                  <snm>Zingales</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Mol Biochem Parasitol</source>
            <pubdate>2006</pubdate>
            <volume>(in press)</volume>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Genetic subdivisions within Trypanosoma cruzi (Discrete Typing Units) and their relevance for molecular epidemiology and experimental evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Tibayrenc</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Kinetoplastid Biol Dis</source>
            <pubdate>2003</pubdate>
            <volume>2</volume>
            <issue>1</issue>
            <fpage>12</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">270070</pubid>
                  <pubid idtype="pmpid" link="fulltext">14613498</pubid>
                  <pubid idtype="doi">10.1186/1475-9292-2-12</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Population structure and genetic typing of Trypanosoma cruzi, the agent of Chagas disease: a multilocus enzyme electrophoresis approach</p>
            </title>
            <aug>
               <au>
                  <snm>Barnabe</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Brisse</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Tibayrenc</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Parasitology</source>
            <pubdate>2000</pubdate>
            <volume>120 ( Pt 5)</volume>
            <fpage>513</fpage>
            <lpage>526</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1017/S0031182099005661</pubid>
                  <pubid idtype="pmpid">10840981</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Identification of six Trypanosoma cruzi lineages by sequence-characterised amplified region markers</p>
            </title>
            <aug>
               <au>
                  <snm>Brisse</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Dujardin</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Tibayrenc</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Mol Biochem Parasitol</source>
            <pubdate>2000</pubdate>
            <volume>111</volume>
            <issue>1</issue>
            <fpage>95</fpage>
            <lpage>105</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0166-6851(00)00302-9</pubid>
                  <pubid idtype="pmpid" link="fulltext">11087920</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Characterisation of large and small subunit rRNA and mini-exon genes further supports the distinction of six Trypanosoma cruzi lineages</p>
            </title>
            <aug>
               <au>
                  <snm>Brisse</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Verhoef</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Tibayrenc</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Int J Parasitol</source>
            <pubdate>2001</pubdate>
            <volume>31</volume>
            <issue>11</issue>
            <fpage>1218</fpage>
            <lpage>1226</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0020-7519(01)00238-7</pubid>
                  <pubid idtype="pmpid" link="fulltext">11513891</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Mechanism of genetic exchange in American trypanosomes</p>
            </title>
            <aug>
               <au>
                  <snm>Gaunt</snm>
                  <fnm>MW</fnm>
               </au>
               <au>
                  <snm>Yeo</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Frame</snm>
                  <fnm>IA</fnm>
               </au>
               <au>
                  <snm>Stothard</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Carrasco</snm>
                  <fnm>HJ</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Mena</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>Veazey</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Miles</snm>
                  <fnm>GA</fnm>
               </au>
               <au>
                  <snm>Acosta</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>de Arias</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Miles</snm>
                  <fnm>MA</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2003</pubdate>
            <volume>421</volume>
            <issue>6926</issue>
            <fpage>936</fpage>
            <lpage>939</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature01438</pubid>
                  <pubid idtype="pmpid" link="fulltext">12606999</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Two hybridization events define the population structure of Trypanosoma cruzi</p>
            </title>
            <aug>
               <au>
                  <snm>Westenberger</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Barnabe</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Campbell</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Sturm</snm>
                  <fnm>NR</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2005</pubdate>
            <volume>171</volume>
            <issue>2</issue>
            <fpage>527</fpage>
            <lpage>543</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1456769</pubid>
                  <pubid idtype="pmpid" link="fulltext">15998728</pubid>
                  <pubid idtype="doi">10.1534/genetics.104.038745</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>The genome sequence of Trypanosoma cruzi, etiologic agent of Chagas disease</p>
            </title>
            <aug>
               <au>
                  <snm>El-Sayed</snm>
                  <fnm>NM</fnm>
               </au>
               <au>
                  <snm>Myler</snm>
                  <fnm>PJ</fnm>
               </au>
               <au>
                  <snm>Bartholomeu</snm>
                  <fnm>DC</fnm>
               </au>
               <au>
                  <snm>Nilsson</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Aggarwal</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Tran</snm>
                  <fnm>AN</fnm>
               </au>
               <au>
                  <snm>Ghedin</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Worthey</snm>
                  <fnm>EA</fnm>
               </au>
               <au>
                  <snm>Delcher</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Blandin</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Westenberger</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Caler</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Cerqueira</snm>
                  <fnm>GC</fnm>
               </au>
               <au>
                  <snm>Branche</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Haas</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Anupama</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Arner</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Aslund</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Attipoe</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Bontempi</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Bringaud</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Burton</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Cadag</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Campbell</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Carrington</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Crabtree</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Darban</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>da Silveira</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>de Jong</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Edwards</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Englund</snm>
                  <fnm>PT</fnm>
               </au>
               <au>
                  <snm>Fazelina</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Feldblyum</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Ferella</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Frasch</snm>
                  <fnm>AC</fnm>
               </au>
               <au>
                  <snm>Gull</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Horn</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Hou</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Kindlund</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Klingbeil</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kluge</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Koo</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Lacerda</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Levin</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Lorenzi</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Louie</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Machado</snm>
                  <fnm>CR</fnm>
               </au>
               <au>
                  <snm>McCulloch</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>McKenna</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Mizuno</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Mottram</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Nelson</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ochaya</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Osoegawa</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Pai</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Parsons</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Pentony</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Pettersson</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Pop</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ramirez</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Rinta</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Robertson</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Salzberg</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Sanchez</snm>
                  <fnm>DO</fnm>
               </au>
               <au>
                  <snm>Seyler</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Sharma</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Shetty</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Simpson</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Sisk</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Tammi</snm>
                  <fnm>MT</fnm>
               </au>
               <au>
                  <snm>Tarleton</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Teixeira</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Van Aken</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Vogt</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Ward</snm>
                  <fnm>PN</fnm>
               </au>
               <au>
                  <snm>Wickstead</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Wortman</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Fraser</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Stuart</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Andersson</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2005</pubdate>
            <volume>309</volume>
            <issue>5733</issue>
            <fpage>409</fpage>
            <lpage>415</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1112631</pubid>
                  <pubid idtype="pmpid" link="fulltext">16020725</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Nucleotide sequences provide evidence of genetic exchange among distantly related lineages of Trypanosoma cruzi</p>
            </title>
            <aug>
               <au>
                  <snm>Machado</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Ayala</snm>
                  <fnm>FJ</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>2001</pubdate>
            <volume>98</volume>
            <issue>13</issue>
            <fpage>7396</fpage>
            <lpage>7401</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">34680</pubid>
                  <pubid idtype="pmpid" link="fulltext">11416213</pubid>
                  <pubid idtype="doi">10.1073/pnas.121187198</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Evidence for genetic exchange and hybridization in Trypanosoma cruzi based on nucleotide sequences and molecular karyotype</p>
            </title>
            <aug>
               <au>
                  <snm>Brisse</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Henriksson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Barnabe</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Douzery</snm>
                  <fnm>EJ</fnm>
               </au>
               <au>
                  <snm>Berkvens</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Serrano</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>De Carvalho</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>Buck</snm>
                  <fnm>GA</fnm>
               </au>
               <au>
                  <snm>Dujardin</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Tibayrenc</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Infect Genet Evol</source>
            <pubdate>2003</pubdate>
            <volume>2</volume>
            <issue>3</issue>
            <fpage>173</fpage>
            <lpage>183</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S1567-1348(02)00097-7</pubid>
                  <pubid idtype="pmpid" link="fulltext">12797979</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Trypanosoma cruzi mitochondrial maxicircles display species- and strain-specific variation and possess a conserved element in the non-coding region</p>
            </title>
            <aug>
               <au>
                  <snm>Westenberger</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Cerqueira</snm>
                  <fnm>GC</fnm>
               </au>
               <au>
                  <snm>El-Sayed</snm>
                  <fnm>NM</fnm>
               </au>
               <au>
                  <snm>Zingales</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Campbell</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Sturm</snm>
                  <fnm>NR</fnm>
               </au>
            </aug>
            <source>BMC Genomics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <issue>1</issue>
            <fpage>60</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1559615</pubid>
                  <pubid idtype="pmpid" link="fulltext">16553959</pubid>
                  <pubid idtype="doi">10.1186/1471-2164-7-60</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Conserved sequence blocks in kinetoplast minicircles from diverse species of trypanosomes</p>
            </title>
            <aug>
               <au>
                  <snm>Ray</snm>
                  <fnm>DS</fnm>
               </au>
            </aug>
            <source>Mol Cell Biol</source>
            <pubdate>1989</pubdate>
            <volume>9</volume>
            <issue>3</issue>
            <fpage>1365</fpage>
            <lpage>1367</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">362734</pubid>
                  <pubid idtype="pmpid" link="fulltext">2542768</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Intragenomic spliced leader RNA array analysis of kinetoplastids reveals unexpected transcribed region diversity in Trypanosoma cruzi</p>
            </title>
            <aug>
               <au>
                  <snm>Thomas</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Westenberger</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Campbell</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Sturm</snm>
                  <fnm>NR</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>2005</pubdate>
            <volume>352</volume>
            <fpage>100</fpage>
            <lpage>108</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.gene.2005.04.002</pubid>
                  <pubid idtype="pmpid" link="fulltext">15925459</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Organization and complexity of minicircle-encoded guide RNAs in Trypanosoma cruzi</p>
            </title>
            <aug>
               <au>
                  <snm>Avila</snm>
                  <fnm>HA</fnm>
               </au>
               <au>
                  <snm>Simpson</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>RNA</source>
            <pubdate>1995</pubdate>
            <volume>1</volume>
            <issue>9</issue>
            <fpage>939</fpage>
            <lpage>947</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1369342</pubid>
                  <pubid idtype="pmpid" link="fulltext">8548658</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Transcription and editing of cytochrome oxidase II RNAs in Trypanosoma cruzi</p>
            </title>
            <aug>
               <au>
                  <snm>Kim</snm>
                  <fnm>KS</fnm>
               </au>
               <au>
                  <snm>Teixeira</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Kirchhoff</snm>
                  <fnm>LV</fnm>
               </au>
               <au>
                  <snm>Donelson</snm>
                  <fnm>JE</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>1994</pubdate>
            <volume>269</volume>
            <issue>2</issue>
            <fpage>1206</fpage>
            <lpage>1211</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8288582</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Trypanosoma cruzi: Sequence analysis of the variable region of kinetoplast minicircles</p>
            </title>
            <aug>
               <au>
                  <snm>Telleria</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lafay</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Virreira</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Barnabe</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Tibayrenc</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Svoboda</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Exp Parasitol</source>
            <pubdate>2006</pubdate>
            <volume>(in press)</volume>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Characterisation of kinetoplast DNA minicircles from Herpetomonas samuelpessoai</p>
            </title>
            <aug>
               <au>
                  <snm>Fu</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Lambson</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Barker</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>FEMS Microbiol Lett</source>
            <pubdate>1999</pubdate>
            <volume>172</volume>
            <issue>1</issue>
            <fpage>65</fpage>
            <lpage>71</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1574-6968.1999.tb13451.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">10079529</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>An intragenic guide RNA location suggests a complex mechanism for mitochondrial gene expression in Trypanosoma brucei</p>
            </title>
            <aug>
               <au>
                  <snm>Clement</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Mingler</snm>
                  <fnm>MK</fnm>
               </au>
               <au>
                  <snm>Koslowsky</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Eukaryot Cell</source>
            <pubdate>2004</pubdate>
            <volume>3</volume>
            <issue>4</issue>
            <fpage>862</fpage>
            <lpage>869</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">500885</pubid>
                  <pubid idtype="pmpid" link="fulltext">15302819</pubid>
                  <pubid idtype="doi">10.1128/EC.3.4.862-869.2004</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>http://dna.kdna.ucla.edu/trypanosome/grnasearch.htm</p>
            </title>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Computational approaches to gene prediction</p>
            </title>
            <aug>
               <au>
                  <snm>Do</snm>
                  <fnm>JH</fnm>
               </au>
               <au>
                  <snm>Choi</snm>
                  <fnm>DK</fnm>
               </au>
            </aug>
            <source>J Microbiol</source>
            <pubdate>2006</pubdate>
            <volume>44</volume>
            <issue>2</issue>
            <fpage>137</fpage>
            <lpage>144</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16728949</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Computational methods for alternative splicing prediction</p>
            </title>
            <aug>
               <au>
                  <snm>Bonizzoni</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Rizzi</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Pesole</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Brief Funct Genomic Proteomic</source>
            <pubdate>2006</pubdate>
            <volume>5</volume>
            <issue>1</issue>
            <fpage>46</fpage>
            <lpage>51</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bfgp/ell011</pubid>
                  <pubid idtype="pmpid" link="fulltext">16769678</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Identification of species, strains and clones of Leishmania by characterization of kinetoplast DNA minicircles</p>
            </title>
            <aug>
               <au>
                  <snm>Spithill</snm>
                  <fnm>TW</fnm>
               </au>
               <au>
                  <snm>Grumont</snm>
                  <fnm>RJ</fnm>
               </au>
            </aug>
            <source>Mol Biochem Parasitol</source>
            <pubdate>1984</pubdate>
            <volume>12</volume>
            <issue>2</issue>
            <fpage>217</fpage>
            <lpage>236</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0166-6851(84)90137-3</pubid>
                  <pubid idtype="pmpid">6090898</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Sensitive detection and schizodeme classification of Trypanosoma cruzi cells by amplification of kinetoplast minicircle DNA sequences: use in diagnosis of Chagas' disease</p>
            </title>
            <aug>
               <au>
                  <snm>Sturm</snm>
                  <fnm>NR</fnm>
               </au>
               <au>
                  <snm>Degrave</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Morel</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Simpson</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Mol Biochem Parasitol</source>
            <pubdate>1989</pubdate>
            <volume>33</volume>
            <issue>3</issue>
            <fpage>205</fpage>
            <lpage>214</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0166-6851(89)90082-0</pubid>
                  <pubid idtype="pmpid">2565018</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Widespread recombination in published animal mtDNA sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Tsaousis</snm>
                  <fnm>AD</fnm>
               </au>
               <au>
                  <snm>Martin</snm>
                  <fnm>DP</fnm>
               </au>
               <au>
                  <snm>Ladoukakis</snm>
                  <fnm>ED</fnm>
               </au>
               <au>
                  <snm>Posada</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Zouros</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2005</pubdate>
            <volume>22</volume>
            <issue>4</issue>
            <fpage>925</fpage>
            <lpage>933</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msi084</pubid>
                  <pubid idtype="pmpid" link="fulltext">15647518</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>A theoretical study of random segregation of minicircles in trypanosomatids</p>
            </title>
            <aug>
               <au>
                  <snm>Savill</snm>
                  <fnm>NJ</fnm>
               </au>
               <au>
                  <snm>Higgs</snm>
                  <fnm>PG</fnm>
               </au>
            </aug>
            <source>Proc Biol Sci</source>
            <pubdate>1999</pubdate>
            <volume>266</volume>
            <issue>1419</issue>
            <fpage>611</fpage>
            <lpage>620</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1689810</pubid>
                  <pubid idtype="pmpid" link="fulltext">10212451</pubid>
                  <pubid idtype="doi">10.1098/rspb.1999.0680</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Evolution of RNA editing in trypanosome mitochondria</p>
            </title>
            <aug>
               <au>
                  <snm>Simpson</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Thiemann</snm>
                  <fnm>OH</fnm>
               </au>
               <au>
                  <snm>Savill</snm>
                  <fnm>NJ</fnm>
               </au>
               <au>
                  <snm>Alfonzo</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Maslov</snm>
                  <fnm>DA</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>2000</pubdate>
            <volume>97</volume>
            <issue>13</issue>
            <fpage>6986</fpage>
            <lpage>6993</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">34374</pubid>
                  <pubid idtype="pmpid" link="fulltext">10860961</pubid>
                  <pubid idtype="doi">10.1073/pnas.97.13.6986</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>http://weblogo.berkeley.edu</p>
            </title>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Identification of common molecular subsequences</p>
            </title>
            <aug>
               <au>
                  <snm>Smith</snm>
                  <fnm>TF</fnm>
               </au>
               <au>
                  <snm>Waterman</snm>
                  <fnm>MS</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1981</pubdate>
            <volume>147</volume>
            <issue>1</issue>
            <fpage>195</fpage>
            <lpage>197</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0022-2836(81)90087-5</pubid>
                  <pubid idtype="pmpid" link="fulltext">7265238</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>The Design of Experiment</p>
            </title>
            <aug>
               <au>
                  <snm>Fisher</snm>
                  <fnm>RA</fnm>
               </au>
            </aug>
            <publisher>New York , Hafner</publisher>
            <pubdate>1935</pubdate>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Significance tests which may be applied to samples from any population</p>
            </title>
            <aug>
               <au>
                  <snm>Pitman</snm>
                  <fnm>EJG</fnm>
               </au>
            </aug>
            <source>Royal Statistical Society Supplement</source>
            <pubdate>1937</pubdate>
            <volume>4</volume>
            <fpage>119</fpage>
            <lpage>130,225-232</lpage>
            <xrefbib>
               <pubid idtype="doi">10.2307/2984124</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Construction of permutation tests</p>
            </title>
            <aug>
               <au>
                  <snm>Welch</snm>
                  <fnm>WJ</fnm>
               </au>
            </aug>
            <source>Journal of American Statistical Association</source>
            <pubdate>1990</pubdate>
            <volume>85</volume>
            <issue>1</issue>
            <fpage>693</fpage>
            <lpage>698</lpage>
            <xrefbib>
               <pubid idtype="doi">10.2307/2290004</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Teoria statistica delle classi e calcolo delle probabilit&#224;</p>
            </title>
            <aug>
               <au>
                  <snm>Bonferroni</snm>
                  <fnm>CE</fnm>
               </au>
            </aug>
            <source>Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di Firenze</source>
            <pubdate>1936</pubdate>
            <volume>8</volume>
            <fpage>3</fpage>
            <lpage>62</lpage>
         </bibl>
         <bibl id="B44">
            <title>
               <p>A simple sequentially rejective multiple test procedure</p>
            </title>
            <aug>
               <au>
                  <snm>Holm</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Scand J Statist</source>
            <pubdate>1979</pubdate>
            <volume>6</volume>
            <fpage>65</fpage>
            <lpage>70</lpage>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Controlling the false discovery rate: a practical and powerful approach to multiple testing.</p>
            </title>
            <aug>
               <au>
                  <snm>Benjamini</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Hochberg</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>J Roy Statist Soc Ser B</source>
            <pubdate>1995</pubdate>
            <volume>57</volume>
            <issue>1</issue>
            <fpage>289</fpage>
            <lpage>300</lpage>
         </bibl>
         <bibl id="B46">
            <title>
               <p>A tutorial on hidden Markov models and selected applications in speech recognition</p>
            </title>
            <aug>
               <au>
                  <snm>Rabiner</snm>
                  <fnm>LR</fnm>
               </au>
            </aug>
            <source>Proceedings of the IEEE</source>
            <pubdate>1989</pubdate>
            <volume>77</volume>
            <issue>2</issue>
            <fpage>257&#8211;286</fpage>
            <xrefbib>
               <pubid idtype="doi">10.1109/5.18626</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>http://www.tigr.org</p>
            </title>
         </bibl>
      </refgrp>
   </bm>
</art>
