Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Research article

Statistical distributions of optimal global alignment scores of random protein sequences

Hongxia Pang12, Jiaowei Tang12, Su-Shing Chen2 and Shiheng Tao12*

Author Affiliations

1 School of Life Science, Northwest A&F University, Yangling, Shaanxi, China

2 Institute of Bioinformatics, Northwest A&F University, Yangling, Shaanxi, China

For all author emails, please log on.

BMC Bioinformatics 2005, 6:257  doi:10.1186/1471-2105-6-257

Published: 15 October 2005

Abstract

Background

The inference of homology from statistically significant sequence similarity is a central issue in sequence alignments. So far the statistical distribution function underlying the optimal global alignments has not been completely determined.

Results

In this study, random and real but unrelated sequences prepared in six different ways were selected as reference datasets to obtain their respective statistical distributions of global alignment scores. All alignments were carried out with the Needleman-Wunsch algorithm and optimal scores were fitted to the Gumbel, normal and gamma distributions respectively. The three-parameter gamma distribution performs the best as the theoretical distribution function of global alignment scores, as it agrees perfectly well with the distribution of alignment scores. The normal distribution also agrees well with the score distribution frequencies when the shape parameter of the gamma distribution is sufficiently large, for this is the scenario when the normal distribution can be viewed as an approximation of the gamma distribution.

Conclusion

We have shown that the optimal global alignment scores of random protein sequences fit the three-parameter gamma distribution function. This would be useful for the inference of homology between sequences whose relationship is unknown, through the evaluation of gamma distribution significance between sequences.