Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Research article

Missing value imputation for epistatic MAPs

Colm Ryan1*, Derek Greene1, Gerard Cagney2 and Pádraig Cunningham1

Author Affiliations

1 School of Computer Science and Informatics, University College Dublin, Dublin, Ireland

2 Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland

For all author emails, please log on.

BMC Bioinformatics 2010, 11:197  doi:10.1186/1471-2105-11-197

Published: 20 April 2010

Additional files

Additional file 1:

A table in pdf format showing the percentage of each type of data missing in the five datasets.

Format: PDF Size: 27KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 2:

A table in pdf format, containing accuracy figures for two alternative simple imputation methods - 'Gene Means' and 'Medians'.

Format: PDF Size: 44KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 3:

An image in pdf format, showing the accuracy of KNNImpute with respect to choice of K. This was generated using a symmetric implementation of the KNNImpute algorithm described in Troyanskaya et al. Neighbors are weighted in direct proportion to their similarity to the query gene. Similarity is measured using correlation. Unlike the weighting scheme we use for our wNN approach, KNNImpute is still very sensitive to the choice of K.

Format: PDF Size: 1.1MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 4:

A zip file containing Python code implementing the nearest neighbor algorithms described in this article, instructions for its use, and a sample input file. This file is made available in order to ensure that the code is available as long as the journal article. However the authors request that those wishing to use the code visit [21], where any updates to the code will be made available.

Format: ZIP Size: 609KB Download file

Open Data

Additional file 5:

An image in pdf format, showing the accuracy of LLS for higher values of K. As K is increased past 50, performance starts to degrade significantly, indicating the importance of local features.

Format: PDF Size: 130KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 6:

A table in .xls format, showing the accuracy of imputation on genes with varying percentages of missing values. Interactions are sorted into bins based on the percentage of missing values in their corresponding genes. An interaction between a pair of genes with 14% and 55% missing values would be counted in both the '10 - 20' and '50 - 60' bins. NRMSE and correlation are then calculated for each bin. These figures are calculated for every interaction in the RNA and ESP dataset - using K = 50 and K = 20 for the wNN and LLS methods respectively.

Format: XLS Size: 24KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 7:

An image in pdf format, showing the fraction of each class of interaction which share an annotation. Generated on the Chromosome E-MAP, using LLS imputation. Labels are as in Figure 9.

Format: PDF Size: 236KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 8:

An image in pdf format, showing the fraction of each class of interaction which share an annotation. Generated on the ESP E-MAP, using wNN imputation. Labels are as in Figure 9.

Format: PDF Size: 221KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 9:

A table in .xls format, showing protein complexes identifed using hierarchical clustering before and after imputation. Precision, recall and a p-value are given for each cluster which has a statistically significant overlap with a known protein complex. Values which differ before and after imputation are in bold.

Format: XLS Size: 19KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data