Log on / register
Feedback | Support | My details
Open AccessTechnical Note

Entropic Profiler – detection of conservation in genomes using information theory

Francisco Fernandes1 email, Ana T Freitas1,2 email, Jonas S Almeida3 email and Susana Vinga1,4 email

Instituto de Engenharia de Sistemas e Computadores: Investigação e Desenvolvimento (INESC-ID), R Alves Redol 9, 1000-029 Lisboa, Portugal

Instituto Superior Técnico – Universidade Técnica de Lisboa (IST/UTL), Av Rovisco Pais, 1049-001 Lisboa, Portugal

Dept Biostat Appl Math, Univ Texas MDAnderson Cancer Center – unit 447, 1515 Holcombe Blvd, Houston TX 77030-4009, USA

Dept Bioestatística e Informática, Faculdade de Ciências Médicas – Universidade Nova de Lisboa (FCM/UNL), Campo Mártires da Pátria 130, 1169-056 Lisboa, Portugal

author email corresponding author email

BMC Research Notes 2009, 2:72doi:10.1186/1756-0500-2-72

Published: 5 May 2009

Abstract

Background

In the last decades, with the successive availability of whole genome sequences, many research efforts have been made to mathematically model DNA. Entropic Profiles (EP) were proposed recently as a new measure of continuous entropy of genome sequences. EP represent local information plots related to DNA randomness and are based on information theory and statistical concepts. They express the weighed relative abundance of motifs for each position in genomes. Their study is very relevant because under or over-representation segments are often associated with significant biological meaning.

Findings

The Entropic Profiler application here presented is a new tool designed to detect and extract under and over-represented DNA segments in genomes by using EP. It allows its computation in a very efficient way by recurring to improved algorithms and data structures, which include modified suffix trees. Available through a web interface http://kdbio.inesc-id.pt/software/ep/ webcite and as downloadable source code, it allows to study positions and to search for motifs inside the whole sequence or within a specified range. DNA sequences can be entered from different sources, including FASTA files, pre-loaded examples or resuming a previously saved work. Besides the EP value plots, p-values and z-scores for each motif are also computed, along with the Chaos Game Representation of the sequence.

Conclusion

EP are directly related with the statistical significance of motifs and can be considered as a new method to extract and classify significant regions in genomes and estimate local scales in DNA. The present implementation establishes an efficient and useful tool for whole genome analysis.


© 1999-2010 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.