Table 3

Comparison of gene family content
T.c. marinkellei T.c. cruzi Sylvio X10
Gene family a Size in assembly b % Short reads c Size in assembly b % Short reads c SE d
DGF 2,129,983 (6.22 %) 3.433 1,265,650 (3.28 %) 1.324 Tcm
TS 2,109,163 (6.16 %) 6.291 2,953,602 (7.65 %) 6.298 Tcc X10
MASP 540,360 (1.58 %) 1.317 727,537 (1.88 %) 1.434 Tcc X10
RHS 521,665 (1.52 %) 2.234 1,314,589 (3.41 %) 2.915 Tcc X10
GP63 452,732 (1.32 %) 1.229 514,422 (1.33 %) 0.898 Tcm
TcMUC mucin 273,890 (0.80 %) 0.557 334,544 (0.87 %) 0.515 Tcc X10
ABC 37,490 (0.11 %) 0.124 42,072 (0.11 %) 0.162 Tcc X10
RBP 25,946 (0.08 %) 0.080 26,732 (0.07 %) 0.074 Tcc X10

a Gene family abbreviations: DGF=Dispersed Gene Family, TS=trans-sialidase, MASP=Mucin-associated surface protein, GP63=Surface protease, RHS=Retrotransposon Hot Spot protein, ABC=ABC Transporter, RBP=RNA Binding Protein.

b The combined number of base pairs of this gene family that was identified in the assembly. Sequences were identified using RepeatMasker and a repeat library of coding sequences from the Tcc CLBR genome. These numbers include partial coding sequences. The number inside parenthesis refers to the percentage of total assembly size.

c The percentage of short reads that mapped to these features.

d SE=Significantly Enriched. Refers to if one genome contained significantly more of this gene family. The significance was determined from an empirical distribution of read depth differences from homologous regions of Tcm and Tcc X10, corrected for genome size. The empirical distribution was used to calculate a p-value.

Franzén et al.

Franzén et al. BMC Genomics 2012 13:531   doi:10.1186/1471-2164-13-531

Open Data