Multiple non-collinear TF-map alignments of promoter regions
1 Grup d'Algorísmica i Genètica. Departament de Llenguatges i Sistemes Informàtics. Universitat Politècnica de Catalunya. C/Jordi Girona, 1–3, 08034 Barcelona, Catalonia, Spain
2 Bioinformatics and Genomics Program, Centre de Regulació Genòmica.Pg. Marítim de la Barceloneta 37–49, 08003 Barcelona, Catalonia, Spain
BMC Bioinformatics 2007, 8:138 doi:10.1186/1471-2105-8-138Published: 24 April 2007
The analysis of the promoter sequence of genes with similar expression patterns is a basic tool to annotate common regulatory elements. Multiple sequence alignments are on the basis of most comparative approaches. The characterization of regulatory regions from co-expressed genes at the sequence level, however, does not yield satisfactory results in many occasions as promoter regions of genes sharing similar expression programs often do not show nucleotide sequence conservation.
In a recent approach to circumvent this limitation, we proposed to align the maps of predicted transcription factors (referred as TF-maps) instead of the nucleotide sequence of two related promoters, taking into account the label of the corresponding factor and the position in the primary sequence. We have now extended the basic algorithm to permit multiple promoter comparisons using the progressive alignment paradigm. In addition, non-collinear conservation blocks might now be identified in the resulting alignments. We have optimized the parameters of the algorithm in a small, but well-characterized collection of human-mouse-chicken-zebrafish orthologous gene promoters.
Results in this dataset indicate that TF-map alignments are able to detect high-level regulatory conservation at the promoter and the 3'UTR gene regions, which cannot be detected by the typical sequence alignments. Three particular examples are introduced here to illustrate the power of the multiple TF-map alignments to characterize conserved regulatory elements in absence of sequence similarity. We consider this kind of approach can be extremely useful in the future to annotate potential transcription factor binding sites on sets of co-regulated genes from high-throughput expression experiments.