Pipeline used to reconstruct and analyze the S. mansoni phylome. Each protein sequence encoded in the parasite genome was compared against a database of proteins from other 12 fully sequenced eukaryotic proteomes (Table 1) to select putative homologous proteins. Groups of potential homologs were aligned and subsequently trimmed to remove gap-rich regions. The refined alignment was used to build a NJ tree, which was then used as a “seed” tree to perform a ML likelihood analysis as implemented in PhyML. In the ML analysis, up to five different evolutionary models were tested and the model best fitting to the data was determined by the Akaike Information Criterion (AIC). Different algorithms were used to identify homology relationships and lineage-specific duplications. To extract and interpret the large data set obtained a Structured Query Language (SQL) relational database was built. This database was the main resource for data mining in this work. Adapted from .
Silva et al. BMC Genomics 2012 13:617 doi:10.1186/1471-2164-13-617