Schematic of bioinformatics fluidity pipeline. (A) Genomes are annotated automatically to minimize curation bias ; (B) For a given pair of genomes, all genes are compared using an all vs. all protein alignment; (C) Shared genes are identified based on whether alignment identity and coverage exceed i and c respectively; (D) Gene families are calculated based on a maximal clustering rule; (E) The number of shared genes is found for each pair of genomes, Gi and Gj, from which the number of unique genes can be calculated. Refer to the Methods for complete details of the pipeline and Additional file 1, Table S1 for a complete list of genomes analyzed.
Kislyuk et al. BMC Genomics 2011 12:32 doi:10.1186/1471-2164-12-32