Automatic workflow for the classification of local DNA conformations
1 Laboratory of Informatics and Chemistry, ICT Prague, Technická 5, Prague 6, 166 28, Czech republic
2 Department of Computing and Control Engineering, ICT Prague, Technická 5, Prague 6, 166 28, Czech republic
3 Faculty of Nuclear Sciences and Physical Engineering, CTU Prague, Trojanova 13, Prague 2, 122 00, Czech republic
4 Institute of Biotechnology AS CR, v. v. i., Vídeňská 1083, Prague 4, 142 00, Czech republic
BMC Bioinformatics 2013, 14:205 doi:10.1186/1471-2105-14-205Published: 25 June 2013
A growing number of crystal and NMR structures reveals a considerable structural polymorphism of DNA architecture going well beyond the usual image of a double helical molecule. DNA is highly variable with dinucleotide steps exhibiting a substantial flexibility in a sequence-dependent manner. An analysis of the conformational space of the DNA backbone and the enhancement of our understanding of the conformational dependencies in DNA are therefore important for full comprehension of DNA structural polymorphism.
A detailed classification of local DNA conformations based on the technique of Fourier averaging was published in our previous work. However, this procedure requires a considerable amount of manual work. To overcome this limitation we developed an automatic classification method consisting of the combination of supervised and unsupervised approaches. A proposed workflow is composed of k-NN method followed by a non-hierarchical single-pass clustering algorithm. We applied this workflow to analyze 816 X-ray and 664 NMR DNA structures released till February 2013. We identified and annotated six new conformers, and we assigned four of these conformers to two structurally important DNA families: guanine quadruplexes and Holliday (four-way) junctions. We also compared populations of the assigned conformers in the dataset of X-ray and NMR structures.
In the present work we developed a machine learning workflow for the automatic classification of dinucleotide conformations. Dinucleotides with unassigned conformations can be either classified into one of already known 24 classes or they can be flagged as unclassifiable. The proposed machine learning workflow permits identification of new classes among so far unclassifiable data, and we identified and annotated six new conformations in the X-ray structures released since our previous analysis. The results illustrate the utility of machine learning approaches in the classification of local DNA conformations.