Iterative class discovery and feature selection using Minimal Spanning Trees
-
* Corresponding author: Sudhir Varma varmas@mail.nih.gov
- Equal contributors
Biometric Research Branch, National Cancer Institute, Rockville, USA
BMC Bioinformatics 2004, 5:126 doi:10.1186/1471-2105-5-126
Published: 8 September 2004Additional files
Additional File 1:
Comparison of clustering measures Synthetic data was created with 100 samples and 1000 genes containing clusters embedded in the first 50 genes. The other 950 genes were normally distributed noise. There are three clusters in the first 50 genes: Samples 1 through 20, samples 21 through 70 and samples 71 through 100. For each binary partition of the points S1 = {1, 2..., i}, S2 = {i+1, i+2..., 100}, we calculated the clustering measure. The figure shows the value of the measure for each split point. It can be seen that the Average Linkage and Xie-Beni [22] measures have weak minima and they suffer from extreme values for unbalanced splits. The Log-Likelihood measure has performance similar to the F-S measure but has extreme values for unbalanced splits.
Format: EPS Size: 13KB Download file
Additional File 2:
Detection of true partition for different data parameters Sets of synthetic data were generated for 1000 and 10000 total number of genes with varying fraction of signal genes ε and distance between cluster means Dc. The figure shows detection of planted partition for various values of ε and Dc. Blue points are data for which the percentage match between the first discovered partition and the planted partition is less than 75%. The red points are data for which the match is greater than 75%. Detection (match > 75%) depends only on the distance between the clusters for both hierarchical and iterative clustering.
Format: EPS Size: 128KB Download file
