Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Software

AnyExpress: Integrated toolkit for analysis of cross-platform gene expression data using a fast interval matching algorithm

Jihoon Kim1, Kiltesh Patel1, Hyunchul Jung12, Winston P Kuo3 and Lucila Ohno-Machado12*

Author Affiliations

1 Division of Biomedical Informatics, University of California, San Diego, CA, USA

2 Bioinformatics Program, University of California, San Diego, CA, USA

3 Laboratory for Innovative Translational Technologies, Harvard Medical School, Boston, MA, USA

For all author emails, please log on.

BMC Bioinformatics 2011, 12:75  doi:10.1186/1471-2105-12-75

Published: 17 March 2011

Abstract

Background

Cross-platform analysis of gene express data requires multiple, intricate processes at different layers with various platforms. However, existing tools handle only a single platform and are not flexible enough to support custom changes, which arise from the new statistical methods, updated versions of reference data, and better platforms released every month or year. Current tools are so tightly coupled with reference information, such as reference genome, transcriptome database, and SNP, which are often erroneous or outdated, that the output results are incorrect and misleading.

Results

We developed AnyExpress, a software package that combines cross-platform gene expression data using a fast interval-matching algorithm. Supported platforms include next-generation-sequencing technology, microarray, SAGE, MPSS, and more. Users can define custom target transcriptome database references for probe/read mapping in any species, as well as criteria to remove undesirable probes/reads.

AnyExpress offers scalable processing features such as binding, normalization, and summarization that are not present in existing software tools.

As a case study, we applied AnyExpress to published Affymetrix microarray and Illumina NGS RNA-Seq data from human kidney and liver. The mean of within-platform correlation coefficient was 0.98 for within-platform samples in kidney and liver, respectively. The mean of cross-platform correlation coefficients was 0.73. These results confirmed those of the original and secondary studies. Applying filtering produced higher agreement between microarray and NGS, according to an agreement index calculated from differentially expressed genes.

Conclusion

AnyExpress can combine cross-platform gene expression data, process data from both open- and closed-platforms, select a custom target reference, filter out undesirable probes or reads based on custom-defined biological features, and perform quantile-normalization with a large number of microarray samples. AnyExpress is fast, comprehensive, flexible, and freely available at http://anyexpress.sourceforge.net webcite.