Probabilistic retrieval and visualization of biologically relevant microarray experiments

Caldas, José; Gehlenborg, Nils; Faisal, Ali; Brazma, Alvis; Kaski, Samuel

doi:10.1186/1471-2105-10-S13-P1

Volume 10 Supplement 13

Highlights from the Fifth International Society for Computational Biology (ISCB) Student Council Symposium

Poster presentation
Open access
Published: 19 October 2009

Probabilistic retrieval and visualization of biologically relevant microarray experiments

José Caldas¹,
Nils Gehlenborg^2,3,
Ali Faisal¹,
Alvis Brazma² &
…
Samuel Kaski¹

BMC Bioinformatics volume 10, Article number: P1 (2009) Cite this article

3620 Accesses
6 Citations
Metrics details

Background

Repositories of genome-wide expression studies such as ArrayExpress [1] have been growing rapidly over the last few years and continue to do so. The more experimental data are deposited into these repositories, the more likely it becomes that some of them can provide a meaningful biological context to aid in the planning and analysis of new studies. Retrieval of experiments based on their textual description and experimental design has several shortcomings. First of all, textual description of an experiment or its results is not as information-rich as the actual data itself. Secondly, information about the experimental design alone is only of limited use in retrieving biologically relevant data because it does not reflect the results, which contain the bulk of the information and may reveal unexpected relationships. We introduce novel retrieval methods that incorporate the actual gene expression measurements into the search process, along with visualization tools for interpreting and exploring the results [2].

Methods

We developed a two-stage procedure, first identifying differentially active gene sets in each experiment using a recent nonparametric statistical method [3], and then combining gene set activation patterns into higher-level structures, so-called biological topics, using a state-of-the-art probabilistic model [4]. The probabilistic formulation enables the use of a natural and rigorous metric for assessing the similarity between two experiments. For interpreting and exploring retrieval results, we have developed visualization methods that also provide insight into the model used to perform the retrieval.

Results

We show that gene sets corresponding to each biological topic form highly coherent and holistic components. Several case studies performed on a subset of ArrayExpress show that our method can retrieve experiments relevant to a biological question, as long as sufficient amounts of data are available, and highlight relations between experiments, either because the same biological questions were targeted, or because of unexpected relationships that were confirmed in the literature. The visualization methods allow us to both efficiently interpret the model and put retrieval results in the context of the whole set of experiments (see Figure 1 for an example).

Conclusion

Using a combination of existing and novel methods for modeling and visualizing a heterogeneous collection of gene expression experiments, we were able to decompose and relate experiments via biologically meaningful components. Our approach allows search within a gene expression database to be driven by actual measurement data.

References

Parkinson H, Kapushesky M, Kolesnikov N, Rustici G, Shojatalab M, Abeygunawardena N, Brube H, Dylab M, Emam I, Farne A: ArrayExpress update – from an archive of functional genomics experiments to the atlas of gene expression. Nucleic Acids Res 2007, 37(Database Issue):D868-D872.
Google Scholar
Caldas J, Gehlenborg N, Faisal A, Brazma A, Kaski K: Probabilistic retrieval and visualization of biologically relevant microarray experiments. Bioinformatics 2009, 25():i145-i153. 10.1093/bioinformatics/btp215
Article PubMed Central CAS PubMed Google Scholar
Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gilette MA, Paulovich A, Pomeroy SL, Golub TR, et al.: Gene set enrichment analysis – a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 2005, 102: 15545–15550. 10.1073/pnas.0506580102
Article PubMed Central CAS PubMed Google Scholar
Blei D, Ng A, Jordan MI: Latent Dirichlet Allocation. J Mach Learn Res 2003, 3: 993–1022. 10.1162/jmlr.2003.3.4-5.993
Google Scholar
Venna J, Kaski S, Jordan MI: Nonlinear dimensionality reduction as information retrieval. AISTATS'07 2007.
Google Scholar

Download references

Acknowledgements

This work was supported by TEKES (grant no. 40101/07). JC, AF and SK are additionally partially supported by PASCAL 2 Network of Excellence, ICT 216886. JC is additionally supported by a doctoral grant from the Portuguese Foundation for Science and Technology (FCT). NG is supported by a PhD fellowship of the European Molecular Biology Laboratory (EMBL).

Author information

Authors and Affiliations

Helsinki Institute for Information Technology, Department of Information and Computer Science, Helsinki University of Technology, P.O. Box 5400, Helsinki, FI-02015, HUT, Finland
José Caldas, Ali Faisal & Samuel Kaski
European Bioinformatics Institute, Cambridge, CB10 1SD, UK
Nils Gehlenborg & Alvis Brazma
Graduate School of Life Sciences, University of Cambridge, Cambridge, CB2 1RX, UK
Nils Gehlenborg

Authors

José Caldas
View author publications
You can also search for this author in PubMed Google Scholar
Nils Gehlenborg
View author publications
You can also search for this author in PubMed Google Scholar
Ali Faisal
View author publications
You can also search for this author in PubMed Google Scholar
Alvis Brazma
View author publications
You can also search for this author in PubMed Google Scholar
Samuel Kaski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to José Caldas.

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Caldas, J., Gehlenborg, N., Faisal, A. et al. Probabilistic retrieval and visualization of biologically relevant microarray experiments. BMC Bioinformatics 10 (Suppl 13), P1 (2009). https://doi.org/10.1186/1471-2105-10-S13-P1

Download citation

Published: 19 October 2009
DOI: https://doi.org/10.1186/1471-2105-10-S13-P1

Highlights from the Fifth International Society for Computational Biology (ISCB) Student Council Symposium

Probabilistic retrieval and visualization of biologically relevant microarray experiments

Background

Methods

Results

Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

BMC Bioinformatics

Contact us

Highlights from the Fifth International Society for Computational Biology (ISCB) Student Council Symposium

Probabilistic retrieval and visualization of biologically relevant microarray experiments

Background

Methods

Results

Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us