Open Access Research article

Bayesian probit regression model for the diagnosis of pulmonary fibrosis: proof-of-principle

Eric B Meltzer1, William T Barry2,3, Thomas A D'Amico4, Robert D Davis4, Shu S Lin4,5,6, Mark W Onaitis4, Lake D Morrison1, Thomas A Sporn6, Mark P Steele1 and Paul W Noble1*

1 Department of Medicine, Division of Pulmonary, Allergy and Critical Care Medicine, Department of Medicine, Duke University Medical Center, Durham, North Carolina, USA

2 Department of Biostatistics and Bioinformatics, Duke University Medical Center, Durham, North Carolina, USA

3 Institute for Genome Science and Policy, Duke University Medical Center, Durham, North Carolina, USA

4 Department of Surgery, Division of Cardiovascular and Thoracic Surgery, Duke University Medical Center, Durham, North Carolina, USA

5 Department of Immunology, Duke University Medical Center, Durham, North Carolina, USA

6 Department of Pathology, Duke University Medical Center, Durham, North Carolina, USA

For all author emails, please log on.

BMC Medical Genomics 2011, 4:70 doi:10.1186/1755-8794-4-70

Published: 5 October 2011

Additional files

Additional file 1:

Supplemental Methods. Complete summary of the statistical methods and data integration steps used to develop and validate the multi-gene models.

Format: DOC Size: 49KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data

Additional file 2:

Model Selection (Figure S1). In order to optimize the fitted models for IPF Biopsies and IPF Explants, (A) and (C) the total sum of deviance was calculated for the observed phenotype versus posterior probabilities, and (B) and (D) the misclassification rate was computed under leave-one-out re-sampling for model sizes from 50 to 250 genes.

Format: PDF Size: 116KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 3:

Mapping the ALL IPF Gene Signature to GSE10667 (Table S1). 148 out of 151 (98.0%) possible features from the training dataset were mapped to corresponding features of the validation dataset on a many-by-many basis.

Format: XLS Size: 75KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 4:

Mapping the IPF Biopsy Gene Signature to GSE10667 (Table S2). 151 out of 153 (98.7%) possible features from the training dataset were mapped to corresponding features of the validation dataset on a many-by-many basis.

Format: XLS Size: 77KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 5:

Mapping the IPF Explant Gene Signature to GSE10667 (Table S3). 69 out of 70 (98.6%) possible features from the training dataset were mapped to corresponding features of the validation dataset on a many-by-many basis.

Format: XLS Size: 43KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 6:

Software codes in the R programming language (Bioconductor). Includes the algorithm for Bayesian Probit Regression. These codes are written for a specific machine. Please contact the authors for instructions on how to run these codes on another machine.

Format: R Size: 26KB Download file

Open Data

Additional file 7:

Differentially Expressed Genes, IPF Biopsies versus IPF Explants (Table S4). Between IPF biopsies and IPF explants, 13 probesets, corresponding to11 unique genes, are differentially expressed at a FDR threshold of 10%. A positive t-statistic indicates up-regulation in the explants relative to the biopsies.

Format: XLS Size: 29KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 8:

Complete Gene List for the All IPF Model (Table S5). The top 151 probe sets identified by Student t-test correspond to 136 unique genes. A positive t-statistic indicates up-regulation in IPF relative to Normal.

Format: XLS Size: 45KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 9:

Complete Gene List for the IPF Biopsy Model (Table S6). The top 153 probe sets identified by Student t-test correspond to 131 unique genes. A positive t-statistic indicates up-regulation in Biopsies relative to Normal.

Format: XLS Size: 45KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 10:

Complete Gene List for the IPF Explant Model (Table S7). The top 70 probe sets identified by Student t-test correspond to 65 unique genes. A positive t-statistic indicates up-regulation in Explants relative to Normal.

Format: XLS Size: 35KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data