Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Research article

Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data

Ian B Jeffery1*, Desmond G Higgins1 and Aedín C Culhane2

Author Affiliations

1 Bioinformatics, Conway Institute, University College Dublin, Belfield, Dublin 4, Ireland

2 Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Mayer 232, 44 Binney Street, Boston, MA 02115, USA

For all author emails, please log on.

BMC Bioinformatics 2006, 7:359  doi:10.1186/1471-2105-7-359

Published: 26 July 2006

Additional files

Additional File 1:

Overlap in gene lists produced by different feature selection methods where n = 5 samples per class. Each feature selection method was applied to datasets containing 5 samples per class. The overlap of genes ranked in the top 100 by each method was compared using a binary distance metric. Dendrograms show the results of average linkage hierarchical cluster analysis of these scores for each dataset. Percentage matricies below each of the dendrograms show the percentage similarity between each of the feature selection methods.

Format: PDF Size: 43KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional File 2:

Overlap in gene lists produced by different feature selection methods where n = 10 samples per class. Each feature selection method was applied to datasets containing 10 samples per class. The overlap of genes ranked in the top 100 by each method was compared using a binary distance metric. Dendrograms show the results of average linkage hierarchical cluster analysis of these scores for each dataset. Percentage matricies below each of the dendrograms show the percentage similarity between each of the feature selection methods.

Format: PDF Size: 43KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional File 3:

Overlap in gene lists produced by different feature selection methods where n = 50% of the samples per class. Each feature selection method was applied to datasets containing 50% of the samples per class. The overlap of genes ranked in the top 100 by each method was compared using a binary distance metric. Dendrograms show the results of average linkage hierarchical cluster analysis of these scores for each dataset. Percentage matricies below each of the dendrograms show the percentage similarity between each of the feature selection methods.

Format: PDF Size: 44KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional File 4:

Overlap in gene lists produced by different feature selection methods when applied to each dataset. Each feature selection method was applied to each of the full datasets. The overlap of genes ranked in the top 100 by each method was compared using a binary distance metric. Dendrograms show the results of average linkage hierarchical cluster analysis of these scores for each dataset. Percentage matricies below each of the dendrograms show the percentage similarity between each of the feature selection methods.

Format: PDF Size: 42KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional File 5:

The RCI scores for each of the individual datasets and individual classification methods where the top 80 genes are used and n = 5 samples per class. RCI values showing the success of the top 80 genes, selected by the feature selection methods, to form classifiers which can predict the class of blind test data for each of the 9 datasets. These figures show the results for each of the classification methods when a reduced training set of 10 (5 from each class) is used.

Format: PDF Size: 62KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional File 6:

The RCI scores for each of the individual datasets and individual classification methods where the top 80 genes are used and n = 10 samples per class. RCI values showing the success of the top 80 genes, selected by the feature selection methods, to form classifiers which can predict the class of blind test data for each of the 9 datasets. These figures show the results for each of the classification methods when a reduced training set of 20 (10 from each class) is used.

Format: PDF Size: 62KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional File 7:

The RCI scores for each of the individual datasets and individual classification methods where the top 80 genes are used and n = 50% of the samples per class. RCI values showing the success of the top 80 genes, selected by the feature selection methods, to form classifiers which can predict the class of blind test data for each of the 9 datasets. These figures show the results for each of the classification methods when a datasets split equally into training and test sets is used.

Format: PDF Size: 61KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional File 8:

The RCI scores for each of the individual datasets and individual classification methods where the top 40 genes are used and n = 5 samples per class. RCI values showing the success of the top 40 genes, selected by the feature selection methods, to form classifiers which can predict the class of blind test data for each of the 9 datasets. These figures show the results for each of the classification methods when a reduced training set of 10 (5 from each class) is used.

Format: PDF Size: 62KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional File 9:

The RCI scores for each of the individual datasets and individual classification methods where the top 40 genes are used and n = 10 samples per class. RCI values showing the success of the top 40 genes, selected by the feature selection methods, to form classifiers which can predict the class of blind test data for each of the 9 datasets. These figures show the results for each of the classification methods when a reduced training set of 20 (10 from each class) is used.

Format: PDF Size: 61KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional File 10:

The RCI scores for each of the individual datasets and individual classification methods where the top 40 genes are used and n = 50% of the samples per class. RCI values showing the success of the top 40 genes, selected by the feature selection methods, to form classifiers which can predict the class of blind test data for each of the 9 datasets. These figures show the results for each of the classification methods when a datasets split equally into training and test sets is used.

Format: PDF Size: 61KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional File 11:

The RCI scores for each of the individual datasets and individual classification methods where the top 20 genes are used and n = 5 samples per class. RCI values showing the success of the top 20 genes, selected by the feature selection methods, to form classifiers which can predict the class of blind test data for each of the 9 datasets. These figures show the results for each of the classification methods when a reduced training set of 10 (5 from each class) is used.

Format: PDF Size: 62KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional File 12:

The RCI scores for each of the individual datasets and individual classification methods where the top 20 genes are used and n = 10 samples per class. RCI values showing the success of the top 20 genes, selected by the feature selection methods, to form classifiers which can predict the class of blind test data for each of the 9 datasets. These figures show the results for each of the classification methods when a reduced training set of 20 (10 from each class) is used.

Format: PDF Size: 61KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional File 13:

The RCI scores for each of the individual datasets and individual classification methods where the top 20 genes are used and n = 50% of the samples per class. RCI values showing the success of the top 20 genes, selected by the feature selection methods, to form classifiers which can predict the class of blind test data for each of the 9 datasets. These figures show the results for each of the classification methods when a datasets split equally into training and test sets is used.

Format: PDF Size: 61KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional File 14:

The percentage accuracy scores for each of the individual datasets and individual classification methods where the top 80 genes are used and n = 5 samples per class. The percentage accuracy of the top 80 genes, selected by the feature selection methods, to form classifiers which can predict the class of blind test data for each of the 9 datasets. These figures show the results for each of the classification methods when a reduced training set of 10 (5 from each class) is used.

Format: PDF Size: 62KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional File 15:

The percentage accuracy scores for each of the individual datasets and individual classification methods where the top 80 genes are used and n = 10 samples per class. The percentage accuracy of the top 80 genes, selected by the feature selection methods, to form classifiers which can predict the class of blind test data for each of the 9 datasets. These figures show the results for each of the classification methods when a reduced training set of 20 (10 from each class) is used.

Format: PDF Size: 62KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional File 16:

The percentage accuracy scores for each of the individual datasets and individual classification methods where the top 80 genes are used and n = 50% of the samples per class. The percentage accuracy of the top 80 genes, selected by the feature selection methods, to form classifiers which can predict the class of blind test data for each of the 9 datasets. These figures show the results for each of the classification methods when a datasets split equally into training and test sets is used.

Format: PDF Size: 61KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional File 17:

The percentage accuracy scores for each of the individual datasets and individual classification methods where the top 40 genes are used and n = 5 samples per class. The percentage accuracy of the top 40 genes, selected by the feature selection methods, to form classifiers which can predict the class of blind test data for each of the 9 datasets. These figures show the results for each of the classification methods when a reduced training set of 10 (5 from each class) is used.

Format: PDF Size: 76KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional File 18:

The percentage accuracy scores for each of the individual datasets and individual classification methods where the top 40 genes are used and n = 10 samples per class. The percentage accuracy of the top 40 genes, selected by the feature selection methods, to form classifiers which can predict the class of blind test data for each of the 9 datasets. These figures show the results for each of the classification methods when a reduced training set of 20 (10 from each class) is used.

Format: PDF Size: 61KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional File 19:

The percentage accuracy scores for each of the individual datasets and individual classification methods where the top 40 genes are used and n = 50% of the samples per class. The percentage accuracy of the top 40 genes, selected by the feature selection methods, to form classifiers which can predict the class of blind test data for each of the 9 datasets. These figures show the results for each of the classification methods when a datasets split equally into training and test sets is used.

Format: PDF Size: 59KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional File 20:

The percentage accuracy scores for each of the individual datasets and individual classification methods where the top 20 genes are used and n = 5 samples per class. The percentage accuracy of the top 20 genes, selected by the feature selection methods, to form classifiers which can predict the class of blind test data for each of the 9 datasets. These figures show the results for each of the classification methods when a reduced training set of 10 (5 from each class) is used.

Format: PDF Size: 62KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional File 21:

The percentage accuracy scores for each of the individual datasets and individual classification methods where the top 20 genes are used and n = 10 samples per class. The percentage accuracy of the top 20 genes, selected by the feature selection methods, to form classifiers which can predict the class of blind test data for each of the 9 datasets. These figures show the results for each of the classification methods when a reduced training set of 20 (10 from each class) is used.

Format: PDF Size: 62KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional File 22:

The percentage accuracy scores for each of the individual datasets and individual classification methods where the top 20 genes are used and n = 50% of the samples per class. The percentage accuracy of the top 20 genes, selected by the feature selection methods, to form classifiers which can predict the class of blind test data for each of the 9 datasets. These figures show the results for each of the classification methods when a datasets split equally into training and test sets is used.

Format: PDF Size: 61KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data