Comparing segmentations by applying randomization techniques
1 HIIT Basic Research Unit, Department of Computer Science, P.O.Box 68, FI-00014 University of Helsinki, Finland
2 Laboratory of Computer and Information Science, Helsinki University of Technology, FI-02015 TKK, Finland
BMC Bioinformatics 2007, 8:171 doi:10.1186/1471-2105-8-171Published: 23 May 2007
There exist many segmentation techniques for genomic sequences, and the segmentations can also be based on many different biological features. We show how to evaluate and compare the quality of segmentations obtained by different techniques and alternative biological features.
We apply randomization techniques for evaluating the quality of a given segmentation. Our example applications include isochore detection and the discovery of coding-noncoding structure. We obtain segmentations of relevant sequences by applying different techniques, and use alternative features to segment on. We show that some of the obtained segmentations are very similar to the underlying true segmentations, and this similarity is statistically significant. For some other segmentations, we show that equally good results are likely to appear by chance.
We introduce a framework for evaluating segmentation quality, and demonstrate its use on two examples of segmental genomic structures. We transform the process of quality evaluation from simply viewing the segmentations, to obtaining p-values denoting significance of segmentation similarity.