Open Access Highly Accessed Open Badges Methodology article

Proteinortho: Detection of (Co-)orthologs in large-scale analysis

Marcus Lechner12*, Sven Findeiß24, Lydia Steiner234, Manja Marz1, Peter F Stadler2456789 and Sonja J Prohaska34

Author Affiliations

1 RNA Bioinformatics Group, Department of Pharmaceutical Chemistry, Philipps-University Marburg, Marbacher Weg 6, D-35037 Marburg, Germany

2 Bioinformatics Group, Department of Computer Science, Härtelstraße 16-18, D-04107, Leipzig, Germany

3 Bioinformatics in EvoDevo Group, Department of Computer Science, University of Leipzig, Härtelstraße 16-18, D-04107, Leipzig, Germany

4 Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107, Leipzig, Germany

5 Max Planck Institute for Mathematics in the Sciences, Inselstraße 22 D-04103 Leipzig, Germany

6 Fraunhofer Institute for Cell Therapy and Immunology, Perlickstraße 1, D-04103 Leipzig, Germany

7 Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria

8 Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark

9 Santa Fe Institute, 1399 Hyde Park Rd, Santa Fe, NM 87501, USA

For all author emails, please log on.

BMC Bioinformatics 2011, 12:124  doi:10.1186/1471-2105-12-124

Published: 28 April 2011



Orthology analysis is an important part of data analysis in many areas of bioinformatics such as comparative genomics and molecular phylogenetics. The ever-increasing flood of sequence data, and hence the rapidly increasing number of genomes that can be compared simultaneously, calls for efficient software tools as brute-force approaches with quadratic memory requirements become infeasible in practise. The rapid pace at which new data become available, furthermore, makes it desirable to compute genome-wide orthology relations for a given dataset rather than relying on relations listed in databases.


The program Proteinortho described here is a stand-alone tool that is geared towards large datasets and makes use of distributed computing techniques when run on multi-core hardware. It implements an extended version of the reciprocal best alignment heuristic. We apply Proteinortho to compute orthologous proteins in the complete set of all 717 eubacterial genomes available at NCBI at the beginning of 2009. We identified thirty proteins present in 99% of all bacterial proteomes.


Proteinortho significantly reduces the required amount of memory for orthology analysis compared to existing tools, allowing such computations to be performed on off-the-shelf hardware.