Integrating automated literature searches and text mining in biomarker discovery

Ongenaert, Maté; Dehaspe, Luc

doi:10.1186/1471-2105-11-S5-O5

Volume 11 Supplement 5

Workshop on Advances in Bio Text Mining

Oral presentation
Open access
Published: 06 October 2010

Integrating automated literature searches and text mining in biomarker discovery

Maté Ongenaert¹ &
Luc Dehaspe¹

BMC Bioinformatics volume 11, Article number: O5 (2010) Cite this article

2548 Accesses
4 Citations
Metrics details

Background

Epigenetics, and more specifically DNA methylation is a fast evolving research area. In almost every cancer type, each month new publications confirm the differentiated regulation of specific genes due to methylation and mention the discovery of novel methylation markers. The last decade, high-throughput methodologies are frequently used in the discovery of such methylation biomarkers. Examples of such analyses are re-expression experiments (using the demethylating agent 5-Aza-2′-Deoxycytidine, followed by expression micro-array analysis); CpG microarrays such as the Illumina HumanMethylation27 BeadChip and large scale bisulfite sequencing.

In order to evaluate and to prioritize possible methylation biomarkers, a literature search is a good starting point. However, manual searches are time-consuming (as hundreds of genes are to be searched, taking all their aliases into account) and the summarization of the found references is a real challenge. Therefore, it would be extremely useful to have an annotated, reviewed, sorted and summarized overview of all available data, published in methylation research in cancer.

Results

In a first stage, an automated literature retrieval and annotation tool was created, code-named GoldMine. This web-based application allows entering a list of genes, keywords and highlighting terms. Of the genes, all aliases are used to search PubMed abstracts, in combination with the keywords. The gene aliases, the keywords and the highlighting terms are highlighted in different colors as well as sentences with both a gene alias and a keyword. Abstracts are presented with decreasing scores that are assigned.

Based on this framework, a cancer methylation database is created: PubMeth (as shown in Figure 1). PubMeth [1] is a cancer methylation database that contains genes that are reported to be methylated in various cancer types. A query can be based either on genes (to check in which cancer types the genes are reported as being methylated) or on cancer types (which genes are reported to be methylated in the cancer (sub) types of interest).

More recently, in the context of the SBO project on Functional Peptidomics, the MouseMining tool was developed to further exploit PubMeth results and comparable literature summary data by combining them with experimental data. In a prototypical application, MouseMining was used to correlate statistics on the co-occurrence of anatomic categories and disease names to the expression profile of candidate biomarkers.

Conclusions

The generated methylation database in cancer is freely accessible at http://www.pubmeth.org. PubMeth is based on text mining of Medline/PubMed abstracts, combined with manual reading and annotation of preselected abstracts. The text mining approach results in increased speed and selectivity (as for instance many different aliases of a gene are searched at once), while the manual screening significantly raises the specificity and quality of the database. The summarized overview of the results is very useful in case more genes or cancer types are searched at the same time.

References

Ongenaert M, Van Neste L, De Meyer T, Menschaert G, Bekaert S, Van Criekinge W: PubMeth: a cancer methylation database combining text mining and expert annotation. Nucleic Acids Res 2008, 36: D842-D846. 10.1093/nar/gkm788
Article PubMed Central CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

OncoMethylome Sciences, 4000 Liege, Belgium
Maté Ongenaert & Luc Dehaspe

Authors

Maté Ongenaert
View author publications
You can also search for this author in PubMed Google Scholar
Luc Dehaspe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maté Ongenaert.

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Ongenaert, M., Dehaspe, L. Integrating automated literature searches and text mining in biomarker discovery. BMC Bioinformatics 11 (Suppl 5), O5 (2010). https://doi.org/10.1186/1471-2105-11-S5-O5

Download citation

Published: 06 October 2010
DOI: https://doi.org/10.1186/1471-2105-11-S5-O5

Workshop on Advances in Bio Text Mining

Integrating automated literature searches and text mining in biomarker discovery

Background

Results

Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

BMC Bioinformatics

Contact us

Workshop on Advances in Bio Text Mining

Integrating automated literature searches and text mining in biomarker discovery

Background

Results

Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us