Genome-wide identification of specific oligonucleotides using artificial neural network and computational genomic analysis
- Equal contributors
1 Department of Computer Science, National Chung-Hsing University, Taichung, Taiwan, ROC
2 Institute of Biomedical Sciences, National Chung-Hsing University, Taichung, Taiwan, ROC
3 Institute of Statistical Science, Academia Sinica, Taipei, Taiwan, ROC
4 Institute of Molecular Biology, National Chung-Hsing University, Taichung, Taiwan, ROC
5 NTU Center for Genomic Medicine, National Taiwan University College of Medicine, Taipei, Taiwan, ROC
6 Departments of Biotechnology and Bioinformatics, Asia University, Taichung, Taiwan, ROC
BMC Bioinformatics 2007, 8:164 doi:10.1186/1471-2105-8-164Published: 22 May 2007
Genome-wide identification of specific oligonucleotides (oligos) is a computationally-intensive task and is a requirement for designing microarray probes, primers, and siRNAs. An artificial neural network (ANN) is a machine learning technique that can effectively process complex and high noise data. Here, ANNs are applied to process the unique subsequence distribution for prediction of specific oligos.
We present a novel and efficient algorithm, named the integration of ANN and BLAST (IAB) algorithm, to identify specific oligos. We establish the unique marker database for human and rat gene index databases using the hash table algorithm. We then create the input vectors, via the unique marker database, to train and test the ANN. The trained ANN predicted the specific oligos with high efficiency, and these oligos were subsequently verified by BLAST. To improve the prediction performance, the ANN over-fitting issue was avoided by early stopping with the best observed error and a k-fold validation was also applied. The performance of the IAB algorithm was about 5.2, 7.1, and 6.7 times faster than the BLAST search without ANN for experimental results of 70-mer, 50-mer, and 25-mer specific oligos, respectively. In addition, the results of polymerase chain reactions showed that the primers predicted by the IAB algorithm could specifically amplify the corresponding genes. The IAB algorithm has been integrated into a previously published comprehensive web server to support microarray analysis and genome-wide iterative enrichment analysis, through which users can identify a group of desired genes and then discover the specific oligos of these genes.
The IAB algorithm has been developed to construct SpecificDB, a web server that provides a specific and valid oligo database of the probe, siRNA, and primer design for the human genome. We also demonstrate the ability of the IAB algorithm to predict specific oligos through polymerase chain reaction experiments. SpecificDB provides comprehensive information and a user-friendly interface.