|Comparison of gene family content|
|T.c. marinkellei||T.c. cruzi Sylvio X10|
|Gene family a||Size in assembly b||% Short reads c||Size in assembly b||% Short reads c||SE d|
|DGF||2,129,983 (6.22 %)||3.433||1,265,650 (3.28 %)||1.324||Tcm|
|TS||2,109,163 (6.16 %)||6.291||2,953,602 (7.65 %)||6.298||Tcc X10|
|MASP||540,360 (1.58 %)||1.317||727,537 (1.88 %)||1.434||Tcc X10|
|RHS||521,665 (1.52 %)||2.234||1,314,589 (3.41 %)||2.915||Tcc X10|
|GP63||452,732 (1.32 %)||1.229||514,422 (1.33 %)||0.898||Tcm|
|TcMUC mucin||273,890 (0.80 %)||0.557||334,544 (0.87 %)||0.515||Tcc X10|
|ABC||37,490 (0.11 %)||0.124||42,072 (0.11 %)||0.162||Tcc X10|
|RBP||25,946 (0.08 %)||0.080||26,732 (0.07 %)||0.074||Tcc X10|
a Gene family abbreviations: DGF=Dispersed Gene Family, TS=trans-sialidase, MASP=Mucin-associated surface protein, GP63=Surface protease, RHS=Retrotransposon Hot Spot protein, ABC=ABC Transporter, RBP=RNA Binding Protein.
b The combined number of base pairs of this gene family that was identified in the assembly. Sequences were identified using RepeatMasker and a repeat library of coding sequences from the Tcc CLBR genome. These numbers include partial coding sequences. The number inside parenthesis refers to the percentage of total assembly size.
c The percentage of short reads that mapped to these features.
d SE=Significantly Enriched. Refers to if one genome contained significantly more of this gene family. The significance was determined from an empirical distribution of read depth differences from homologous regions of Tcm and Tcc X10, corrected for genome size. The empirical distribution was used to calculate a p-value.
Franzén et al.
Franzén et al. BMC Genomics 2012 13:531 doi:10.1186/1471-2164-13-531