Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Proceedings of the 21st International Conference on Genome Informatics (GIW2010)

Open Access Research

Integer programming-based method for grammar-based tree compression and its application to pattern extraction of glycan tree structures

Yang Zhao, Morihiro Hayashida and Tatsuya Akutsu*

Author Affiliations

Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto, 611-0011, Japan

For all author emails, please log on.

BMC Bioinformatics 2010, 11(Suppl 11):S4  doi:10.1186/1471-2105-11-S11-S4

Published: 14 December 2010

Abstract

Background

A bisection-type algorithm for the grammar-based compression of tree-structured data has been proposed recently. In this framework, an elementary ordered-tree grammar (EOTG) and an elementary unordered-tree grammar (EUTG) were defined, and an approximation algorithm was proposed.

Results

In this paper, we propose an integer programming-based method that finds the minimum context-free grammar (CFG) for a given string under the condition that at most two symbols appear on the right-hand side of each production rule. Next, we extend this method to find the minimum EOTG and EUTG grammars for given ordered and unordered trees, respectively. Then, we conduct computational experiments for the ordered and unordered artificial trees. Finally, we apply our methods to pattern extraction of glycan tree structures.

Conclusions

We propose integer programming-based methods that find the minimum CFG, EOTG, and EUTG for given strings, ordered and unordered trees. Our proposed methods for trees are useful for extracting patterns of glycan tree structures.