Figure 3.

Decision tree generated by training the J48 algorithm on the normal morbidity datasets This decision tree was generated by training the J48 algorithm on the normal morbidity datasets (see “Methods”). The uppermost ellipse is the node root of tree that represents the most important condition for discriminating morbid genes from non-morbid genes. In this case, such condition is the number of transcription factors regulating the gene (regin). The remaining ellipses are internal nodes that represent additional conditions for considering a gene as morbid or non-morbid. In the left branch of tree, such conditions are a central position in a metabolic pathway (inbetmet), the extracellular or plasma membrane localization of respective encoded proteins and tendency of encoded proteins to form clusters with others (c). The rectangles depict genes that, under certain conditions (represented by the root node and internal nodes), are respectively and predominantly classified as morbid (True) and non-morbid (Unknown). In the round brackets inside rectangles, the number before the slash indicates the total number of genes that are actually morbid or non-morbid and the number after the slash indicates how many genes were incorrectly predicted.

Costa et al. BMC Genomics 2010 11(Suppl 5):S9   doi:10.1186/1471-2164-11-S5-S9