Open Access Highly Accessed Open Badges Research article

Prediction of TF target sites based on atomistic models of protein-DNA complexes

Vladimir Espinosa Angarica123*, Abel González Pérez4, Ana T Vasconcelos5, Julio Collado-Vides3 and Bruno Contreras-Moreira367*

Author Affiliations

1 Departamento de Bioquímica y Biología Molecular y Celular, Facultad de Ciencias, Universidad de Zaragoza. Pedro Cerbuna 12, 50009 Zaragoza, España

2 Instituto de Biocomputación y Física de Sistemas Complejos, Universidad de Zaragoza. Corona de Aragón 42 Edificio Cervantes, 50009 Zaragoza, España

3 Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México. Av. Universidad s/n., Colonia Chamilpa 62210, Cuernavaca, Morelos, México

4 Centro Nacional de Bioinformática. Industria y San José, Capitolio Nacional, CP 10200, Habana Vieja, Ciudad de la Habana, Cuba

5 Laboratório Nacional de Computação Científica. Av. Getulio Vargas 333, Quitandinha, CEP 25651-075, Petrópolis, Rio de Janeiro, Brasil

6 Laboratory of Computational Biology, Estación Experimental de Aula Dei, Consejo Superior de Investigaciones Científicas, Av. Montañana 1.005. 50059 Zaragoza, España

7 Fundación ARAID, Paseo María Agustín 36, Zaragoza, España

For all author emails, please log on.

BMC Bioinformatics 2008, 9:436  doi:10.1186/1471-2105-9-436

Published: 16 October 2008



The specific recognition of genomic cis-regulatory elements by transcription factors (TFs) plays an essential role in the regulation of coordinated gene expression. Studying the mechanisms determining binding specificity in protein-DNA interactions is thus an important goal. Most current approaches for modeling TF specific recognition rely on the knowledge of large sets of cognate target sites and consider only the information contained in their primary sequence.


Here we describe a structure-based methodology for predicting sequence motifs starting from the coordinates of a TF-DNA complex. Our algorithm combines information regarding the direct and indirect readout of DNA into an atomistic statistical model, which is used to estimate the interaction potential. We first measure the ability of our method to correctly estimate the binding specificities of eight prokaryotic and eukaryotic TFs that belong to different structural superfamilies. Secondly, the method is applied to two homology models, finding that sampling of interface side-chain rotamers remarkably improves the results. Thirdly, the algorithm is compared with a reference structural method based on contact counts, obtaining comparable predictions for the experimental complexes and more accurate sequence motifs for the homology models.


Our results demonstrate that atomic-detail structural information can be feasibly used to predict TF binding sites. The computational method presented here is universal and might be applied to other systems involving protein-DNA recognition.