Your browser version may not work well with NCBI's Web applications. More information here...
1: Bioinformatics. 1999 Dec;15(12):974-9.Click here to read Links

SEGMENT: identifying compositional domains in DNA sequences.

Department of Genetics, Faculty of Sciences, University of Granada, Spain. oliver@ugr.es

MOTIVATION: DNA sequences are formed by patches or domains of different nucleotide composition. In a few simple sequences, domains can simply be identified by eye; however, most DNA sequences show a complex compositional heterogeneity (fractal structure), which cannot be properly detected by current methods. Recently, a computationally efficient segmentation method to analyse such nonstationary sequence structures, based on the Jensen-Shannon entropic divergence, has been described. Specific algorithms implementing this method are now needed. RESULTS: Here we describe a heuristic segmentation algorithm for DNA sequences, which was implemented on a Windows program (SEGMENT). The program divides a DNA sequence into compositionally homogeneous domains by iterating a local optimization procedure at a given statistical significance. Once a sequence is partitioned into domains, a global measure of sequence compositional complexity (SCC), accounting for both the sizes and compositional biases of all the domains in the sequence, is derived. SEGMENT computes SCC as a function of the significance level, which provides a multiscale view of sequence complexity.

PMID: 10745986 [PubMed - indexed for MEDLINE]