Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Research article

Automatic workflow for the classification of local DNA conformations

Petr Čech12, Jaromír Kukal23, Jiří Černý4, Bohdan Schneider4* and Daniel Svozil1*

Author Affiliations

1 Laboratory of Informatics and Chemistry, ICT Prague, Technická 5, Prague 6, 166 28, Czech republic

2 Department of Computing and Control Engineering, ICT Prague, Technická 5, Prague 6, 166 28, Czech republic

3 Faculty of Nuclear Sciences and Physical Engineering, CTU Prague, Trojanova 13, Prague 2, 122 00, Czech republic

4 Institute of Biotechnology AS CR, v. v. i., Vídeňská 1083, Prague 4, 142 00, Czech republic

For all author emails, please log on.

BMC Bioinformatics 2013, 14:205  doi:10.1186/1471-2105-14-205

Published: 25 June 2013

Abstract

Background

A growing number of crystal and NMR structures reveals a considerable structural polymorphism of DNA architecture going well beyond the usual image of a double helical molecule. DNA is highly variable with dinucleotide steps exhibiting a substantial flexibility in a sequence-dependent manner. An analysis of the conformational space of the DNA backbone and the enhancement of our understanding of the conformational dependencies in DNA are therefore important for full comprehension of DNA structural polymorphism.

Results

A detailed classification of local DNA conformations based on the technique of Fourier averaging was published in our previous work. However, this procedure requires a considerable amount of manual work. To overcome this limitation we developed an automatic classification method consisting of the combination of supervised and unsupervised approaches. A proposed workflow is composed of k-NN method followed by a non-hierarchical single-pass clustering algorithm. We applied this workflow to analyze 816 X-ray and 664 NMR DNA structures released till February 2013. We identified and annotated six new conformers, and we assigned four of these conformers to two structurally important DNA families: guanine quadruplexes and Holliday (four-way) junctions. We also compared populations of the assigned conformers in the dataset of X-ray and NMR structures.

Conclusions

In the present work we developed a machine learning workflow for the automatic classification of dinucleotide conformations. Dinucleotides with unassigned conformations can be either classified into one of already known 24 classes or they can be flagged as unclassifiable. The proposed machine learning workflow permits identification of new classes among so far unclassifiable data, and we identified and annotated six new conformations in the X-ray structures released since our previous analysis. The results illustrate the utility of machine learning approaches in the classification of local DNA conformations.

Keywords:
DNA; Dinucleotide conformation; Classification; Machine learning; Neural network; RBF; MLP; k-NN; Regularized regression; Cluster analysis