Linking clusters and Feature selection of new methylation classes. Summarization and feature selection of CGI profiles. A) Identification of 9 CGI profiles by linking CGI sequence attribute clusters (lower left corner) and methylation clusters (upper left corner) by the probability of the intersection (PI), which is calculated based on the hypergeometric measurement (blue color). The attributes were normalized within the colourmap intervals. Notably, the relations are built based on the PI (line color; dark blue: low p-value; light blue: high p-value), which substantially differs from the typical support of intersection measurement (line weight; thin: few; tick: many). For example, the fifth relation (5th column from left) is supported by just ~40 observations (thin line) but most of the CGI sequence attribute observations correspond to the 4th methylation class and only few belong to others classes. This approach can generate cohesive relations even if they aren't highly supported. The nine methylation profiles are summarized by similarity of their prototypes, constituting 4 final methylation classes (I-IV). These classes were used to label all CGI sequence attributes observations. B) Feature selection for each class based on the dataset labeled in A). This process has been carried out locally by using decision trees (Matlab) where the desired class (labeled read leaf) was distinguished from all of the others (unlabeled black leaf).
Previti et al. BMC Bioinformatics 2009 10:116 doi:10.1186/1471-2105-10-116