BMC Bioinformatics Volume 8
|
Viewing options:Associated material:Related literature:- Articles citing this article
- Other articles by authors
- Related articles/pages
Tools:Post to:
|
Methodology articleA Bayesian nonparametric method for prediction in EST analysisAntonio Lijoi1 , Ramsés H Mena2 and Igor Prünster3  1Department of Economics and Quantitative Methods, University of Pavia, 27100 Pavia and Institute for Applied Mathematics and Information Technology, National Research Council, 20133 Milan, Italy 2Research Institute for Applied Mathematics and Systems, National Autonomous University of Mexico, Mexico City, A.P. 20-726, Mexico 3Department of Statistics and Applied Mathematics and ICER, University of Turin, 10122 Turin and Carlo Alberto College, 10024 Moncalieri, Italy author email corresponding author email
BMC Bioinformatics 2007,
8:339doi:10.1186/1471-2105-8-339
|
|
| Published: |
14 September 2007 |
Abstract
Background
Expressed sequence tags (ESTs) analyses are a fundamental tool for gene identification in organisms. Given a preliminary EST sample from a certain library, several statistical prediction problems arise. In particular, it is of interest to estimate how many new genes can be detected in a future EST sample of given size and also to determine the gene discovery rate: these estimates represent the basis for deciding whether to proceed sequencing the library and, in case of a positive decision, a guideline for selecting the size of the new sample. Such information is also useful for establishing sequencing efficiency in experimental design and for measuring the degree of redundancy of an EST library.
Results
In this work we propose a Bayesian nonparametric approach for tackling statistical problems related to EST surveys. In particular, we provide estimates for: a) the coverage, defined as the proportion of unique genes in the library represented in the given sample of reads; b) the number of new unique genes to be observed in a future sample; c) the discovery rate of new genes as a function of the future sample size. The Bayesian nonparametric model we adopt conveys, in a statistically rigorous way, the available information into prediction. Our proposal has appealing properties over frequentist nonparametric methods, which become unstable when prediction is required for large future samples. EST libraries, previously studied with frequentist methods, are analyzed in detail.
Conclusion
The Bayesian nonparametric approach we undertake yields valuable tools for gene capture and prediction in EST libraries. The estimators we obtain do not feature the kind of drawbacks associated with frequentist estimators and are reliable for any size of the additional sample. |