A simple optimization can improve the performance of single feature polymorphism detection by Affymetrix expression arrays
1 Genetic Strains Research Center, National Institute of Genetics, 1111 Yata, Mishima, Shizuoka 411-8540, Japan
2 Department of Mathematical Analysis and Statistical Inference, The Institute of Statistical Mathematics, 10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan
3 Transdisciplinary Research Integration Center, Research Organization of Information and Systems, Kamiyacho Central Place 2F, 4-3-13 Toranomon, Minatoku, Tokyo 105-0001, Japan
4 Graduate School of Information Science and Electrical Engineering, Kyushu University, 744 Motooka, Nishi-ku, Fukuoka 819-0395, Japan
5 Health Informatics and Biostatistics Lab., Oita University of Nursing and Health Sciences, 2944-9 Megusuno, Oita, Oita 870-1201, Japan
BMC Genomics 2010, 11:315 doi:10.1186/1471-2164-11-315Published: 20 May 2010
High-density oligonucleotide arrays are effective tools for genotyping numerous loci simultaneously. In small genome species (genome size: < ~300 Mb), whole-genome DNA hybridization to expression arrays has been used for various applications. In large genome species, transcript hybridization to expression arrays has been used for genotyping. Although rice is a fully sequenced model plant of medium genome size (~400 Mb), there are a few examples of the use of rice oligonucleotide array as a genotyping tool.
We compared the single feature polymorphism (SFP) detection performance of whole-genome and transcript hybridizations using the Affymetrix GeneChip® Rice Genome Array, using the rice cultivars with full genome sequence, japonica cultivar Nipponbare and indica cultivar 93-11. Both genomes were surveyed for all probe target sequences. Only completely matched 25-mer single copy probes of the Nipponbare genome were extracted, and SFPs between them and 93-11 sequences were predicted. We investigated optimum conditions for SFP detection in both whole genome and transcript hybridization using differences between perfect match and mismatch probe intensities of non-polymorphic targets, assuming that these differences are representative of those between mismatch and perfect targets. Several statistical methods of SFP detection by whole-genome hybridization were compared under the optimized conditions. Causes of false positives and negatives in SFP detection in both types of hybridization were investigated.
The optimizations allowed a more than 20% increase in true SFP detection in whole-genome hybridization and a large improvement of SFP detection performance in transcript hybridization. Significance analysis of the microarray for log-transformed raw intensities of PM probes gave the best performance in whole genome hybridization, and 22,936 true SFPs were detected with 23.58% false positives by whole genome hybridization. For transcript hybridization, stable SFP detection was achieved for highly expressed genes, and about 3,500 SFPs were detected at a high sensitivity (> 50%) in both shoot and young panicle transcripts. High SFP detection performances of both genome and transcript hybridizations indicated that microarrays of a complex genome (e.g., of Oryza sativa) can be effectively utilized for whole genome genotyping to conduct mutant mapping and analysis of quantitative traits such as gene expression levels.