|
Resolution: standard / high Figure 5.
The GeneTree pipeline. Colours and conventions are used as in Figure 3. (A) The figure shows only the second half of the pipeline as the first part is very
similar to the first two blocks of the Multiple alignment pipeline (panel 4A). The
main difference is that we use the BLAST between the whole proteins as a block instead
of splitting them in coding exons. In short, the proteins are clustered and aligned
and a phylogenetic tree is built on top of each alignment. Then, the OrthoTree module calls orthologues and paralogues and the last 3 modules handle the calculation
of dN/dS values for pairs of proteins. This pipeline contains alternative routes depicted
in turquoise used when some particular exceptions are thrown, namely when Muscle is
unable to align all the proteins in a cluster or when TreeBeST cannot infer the phylogenetic
tree. This can happen when the cluster of proteins is too large. We use the BreakPAFCluster module to split these clusters in sub-groups and restart the alignment. (B) Timeline for the GeneTree pipeline. This figure shows the progress of the GeneTree
pipeline for Ensembl release 49 (39 species). The pipeline is monitored approximately
every 2 minutes. BLAST and SubmitPep jobs co-occur in one phase of the pipeline. In another phase, Muscle, TreeBeST and OrthoTree also run at the same time.
Severin et al. BMC Bioinformatics 2010 11:240 doi:10.1186/1471-2105-11-240 |