Biostatistics Epidemiology Research Design Core, Center for Clinical and Translational Sciences, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA

Department of Computer Science, Sam Houston State University, Huntsville, Texas 77341, USA

Statistical Science Department, Southern Methodist University, Dallas, TX 75275, USA

Department of Pathology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA

Conjugate and Medicinal Chemistry Laboratory, Division of Nuclear Medicine and Molecular Imaging, Department of Radiology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA

Rush University Cancer Center, Rush University Medical Center, Chicago, IL 60612, USA

Abstract

Background

In microarray experiments with small sample sizes, it is a challenge to estimate p-values accurately and decide cutoff p-values for gene selection appropriately. Although permutation-based methods have proved to have greater sensitivity and specificity than the regular t-test, their p-values are highly discrete due to the limited number of permutations available in very small sample sizes. Furthermore, estimated permutation-based p-values for true nulls are highly correlated and not uniformly distributed between zero and one, making it difficult to use current false discovery rate (FDR)-controlling methods.

Results

We propose a model-based information sharing method (MBIS) that, after an appropriate data transformation, utilizes information shared among genes. We use a normal distribution to model the mean differences of true nulls across two experimental conditions. The parameters of the model are then estimated using all data in hand. Based on this model, p-values, which are uniformly distributed from true nulls, are calculated. Then, since FDR-controlling methods are generally not well suited to microarray data with very small sample sizes, we select genes for a given cutoff p-value and then estimate the false discovery rate.

Conclusion

Simulation studies and analysis using real microarray data show that the proposed method, MBIS, is more powerful and reliable than current methods. It has wide application to a variety of situations.

Background

Microarray technology has been successfully used by biological and biomedical researchers to investigate gene expression profiles at the genome-wide level. Usually, the sample sizes are small compared to the number of genes to be investigated, making estimation of standard error for statistical tests very inaccurate. Furthermore, thousands of hypotheses (one corresponding to each gene or set of genes, in general) are tested at once, which greatly increases the probability of Type I error. This problem is also called the "multiple comparison problem" in hypothesis testing. A very small cutoff p-value is then needed to avoid picking a large number of false positives (FP); however, the price of that decision is failing to find many true positives whose p-values are larger than the cutoff value. When the sample sizes are extremely small, the problem worsens because as the sample size decreases so do the detection power and the ability to estimate p-values.

When the sample sizes are large enough, even if the data across two conditions are not normally distributed, we can still use a two-sample t-test to estimate the p-value for each gene. In practice, to avoid the normal distribution assumption, we may also choose non-parametric (rank-based) or permutation-based procedures. However, when sample sizes are very small, the t-test is not reliable due to the poor estimation for variances; many genes will have small p-values only because their estimated variances are too small. Furthermore, the t-test method treats each gene independently and does not utilize information shared among them. To borrow information from other genes, modified t-test methods have been proposed

where _{i }_{i }_{i }_{0 }is a constant, which is used to avoid too large absolute values of regular t-statistics due to very small estimated standard errors.

When we use test statistics in (1), we will lose the information about the distribution of true nulls since we do not know the distribution of (1). To overcome this problem, permutation-based procedures have been proposed

The absolute values of statistics in (1) are usually smaller than that of regular t-statistics. When sample sizes are extremely small, the total number of distinguished permutations is limited and, therefore, permutation-based methods, such as SAM, will have larger p-values than those from regular t-test, especially for differentially expressed (DE) genes. For example, in experiments where there are only three replicates for two conditions (a typical scenario) there exist only ten different available permutations. The coarseness of the possible selections creates a problem for finding a reasonable cut-off p-value.

To select DE genes, we use a cutoff p-value and pick those genes whose p-values are smaller than the given cutoff value. Understood in this process and in any gene selection is the trade-off between false positives (type I error) and false negatives (type II error). If we want to control family-wise error rate (FWER), we need a very small cutoff p-value that will fail to find many true positives. Some researchers have proposed a strategy of, instead of controlling FWER, controlling false discovery rate (FDR) to allow some FPs in the set of selected genes, but to control the mean of the ratio of number of FPs to the number of total declared DE genes

As part of SAM, Storey's FDR-controlling method has been proven to be more accurate than Benjamini and Hochberg's procedure and has been used extensively in microarray data analysis

In this paper, we show that when sample sizes are extremely small, the t-test has poor performance in terms of sensitivity and specificity and SAM (and "qvalue") may not be applicable due to the difficulty of controlling FDR for GeneChip array data. To circumvent those problems, we propose a new model-based method we call model-based information sharing method (MBIS). To evaluate the performance of our new method, we compare it with others by using both simulation data and real data.

Method

Fold change, equal variance, and data transformation

The ratio of the expression levels across two conditions is called fold change (FC); it has been used in the early comparative experiments

One way to obtain equal variance from gene to gene is to transform the data, usually with a logarithmic transformation. After this transformation, a FC (log scale) can be calculated from the difference of means across two conditions. However, different data sets may require different variance-stabilization transformations. Several variance-stabilization and normalization transformation methods, which try to transform expression values to be equal variance and normally distributed for each gene, have been proposed

Model-based information sharing (MBIS)

MBIS makes the assumption that an appropriate data transformation is available and has been applied to the raw gene expression data. This transformation has furthermore stabilized the variance. Therefore, the variance for each gene is a constant, denoted by ^{2}, after transformation. If we can estimate ^{2 }from data, then we can calculate p-value easily for each gene.

Estimation of ^{2}

Suppose there are _{1 }and _{2 }replicates for condition one and two, respectively, and ^{2 }and has a Chi-square distribution with degrees of freedom _{1 }+ _{2 }- 2. Therefore the average of the estimated variances from all genes is also an unbiased estimate for ^{2}:

where

Based on this normal distribution, we calculate the p-value for gene

where _{i }

Estimation of total number of non-DE genes _{0}

For a given value _{u}_{0 }is _{μ}_{0}.

Gene selection and estimations for false positives and FDR

For a given cutoff p-value, _{0}, we pick those genes with p-values smaller than _{0 }as DE genes. Suppose

SAM, t-test and q-value

For the SAM method, we use the R package, SAMr

Simulation design

To restrict ourselves to small experiments, we assume the sample sizes for both conditions are 3, 5 and 8. We simulate 10,000 genes with normal distributions for two conditions. For non-DE genes, we assume they are normally distributed with a mean equal to 0; for DE genes, their absolute mean difference is uniformly distributed: with three ranges representing different degrees of differential expression:

Real data set

We use Affymetrix GeneChip data sets selected from the GSE2350 series

To see which method gives more biologically meaningful results, we use the web-based tool, CLASSIFI algorithm

Results

Simulation results

Figure

ROC Curves

**ROC Curves**. ROC curves of MBIS, SAM with s0.perc = -1, 20, 40, 60, 80 and 100, and t-test from a simulated data set. There are three replicates for each condition. One thousand out of 10,000 genes are simulated differentially expressed with mean differences uniformly distributed between 3 and 6. The simulated variance for each gene is uniformly distributed between 1 and 1.5.

Table

Simulation results of numbers of TPs, and FPs from different methods (nde = 1000, rep = 3, b = 1.5, diff = c(3,6))

**q-value**

**MBIS**

**SAM-T**

**S0 = 0**

**20**

**40**

**60**

**80**

**100**

0.05

TP

957

244

0

0

0

0

0

0

FP

94

19

0

0

0

0

0

0

Est. FP

95

16

Obs. FDR

0.09

0.07

0

0

0

0

0

0

0.10

TP

976

669

0

0

0

0

0

0

FP

203

99

0

0

0

0

0

0

Est. FP

211

106

Obs. FDR

0.17

0.13

0

0

0

0

0

0

0.15

TP

983

821

0

771

835

821

877

891

FP

324

228

0

16

26

16

27

26

Est. FP

289

232

Obs. FDR

0.25

0.22

0

0

0.02

0.03

0.02

0.03

0.20

TP

992

896

474

893

910

909

917

932

FP

488

379

44

80

92

81

85

75

Est. FP

474

388

Obs. FDR

0.33

0.30

0.08

0.08

0.09

0.08

0.08

0.07

0.25

TP

994

924

704

916

926

929

935

949

FP

632

529

116

145

142

134

141

129

Est. FP

620

552

Obs. FDR

0.39

0.36

0.14

0.14

0.13

0.13

0.13

0.12

For the SAM methods with various s0.perc, when the preset q-value is small, we failed to get any true positives. For example, when given q-value 0.1, none of the SAM methods can get any true positives. Interestingly, when the given q-value is small, a regular t-test performs better than a t-test with a permutation in SAM; this implies permutation-based methods are not appropriate in this situation. Table

Results from real data set

For the real data set, we use MBIS, regular t-test, and SAM to calculate the p-values for each gene and then use "qvalue" to select DE genes with cutoff q-values equal to 0.01, 0.025, 0.05, 0.075 and 0.1, respectively. By using "qvalue," we calculate the corresponding cutoff p-values from each cutoff q-value for these three methods. Since we know the distributions of nulls from MBIS and t-test (they have a uniform distribution for the p-values of nulls), and we can also estimate the number of true negatives for a given cutoff p-value, we can estimate the number of false positives and the false positive rates.

Table

Results from real data for given cutoff q-values

**q-value**

**0.01**

**0.025**

**0.05**

**0.075**

**0.1**

p- cutoff

(from "qvalue")

MBIS

0.00685

0.0240

0.0617

0.108

0.162

T

0.00144

0.0155

0.0613

0.123

0.192

SAM

0

0

0.00741

0.0560

0.0969

# DE genes

MBIS

3075

4306

5550

6458

7276

T

561

2402

4748

6345

7435

SAM

0

0

**3695**

**4734**

**5335**

# common

DE genes

MBIS, T

459

1954

3861

5261

6330

MBIS, SAM

0

0

**3694**

**4734**

**5335**

T, SAM

0

0

3327

4504

5228

Est. FDR

MBIS

0.0177

0.0443

0.0884

0.133

0.177

T

0.0186

0.0468

0.0937

0.141

0.187

The selected gene sets from MBIS and the t-test are usually different. For example, when the cutoff q-value is equal to 0.05, MBIS and the t-test select 5550 and 4748 genes, respectively; the number of common genes by these two methods is 3694. In other words, about 1000 genes are selected by the t-test that are not in the list from the MBIS. However, SAM selected genes also usually selected by MBIS.

From the CLASSIFI output with cutoff q-value 0.05, the median p-values (-log10 scale) are 15.30, 7.05 and 6.01 for MBIS, SAM, and t-test, respectively, indicating that SAM performs better than the t-test but worse than MBIS in terms of co-clustering for genes with similar function according to GO.

Since the cutoff p-values from the same cutoff q-value are different for these three methods, we then use the same cutoff p-values for each method and compare their selected genes. Table

Results from real data for given cutoff p-values

**p-value**

**0.05**

**0.025**

**0.01**

**0.005**

**0.0025**

q-cutoff

(from "qvalue")

MBIS

0.0422

0.0257

0.0132

0.00788

0.00468

T

0.0446

0.0313

0.0210

0.0158

0.0122

SAM

0.0738

0.0600

0.0556

0.0546

0.0544

# DE genes

MBIS

5290

4352

3383

2835

2383

T

4355

3096

1849

1230

792

SAM

**3613**

**2223**

**958**

**482**

**242**

# common

DE genes

MBIS, T

3503

2411

1371

890

556

MBIS, SAM

**3608**

**2223**

**958**

**482**

**242**

T, SAM

3145

1870

767

396

202

Est. FDR

MBIS

0.0742

0.0451

0.0232

0.0138

0.00823

T

0.0834

0.0586

0.0393

0.0295

0.0229

Discussion

When sample sizes are small, information shared by genes is helpful and should be used. While t-test treats each gene independently, both SAM and MBIS, use information shared among genes. When the equal variance assumption in MBIS is met, the estimated variance for gene _{1 }+ _{2 }- 2:

The variance for

And the square of standard error estimated in t-test has variance:

However, (2) has a Chi-square distribution with degrees of freedom _{1 }+ _{2 }- 2), and its variance is:

The square of standard error estimated for our new method is:

In a typical microarray experiment, the number of genes,

In comparing (7) with (9), we can see that, while the regular t-test method gives a much larger variance for each estimated variance (each individual t-test will lose two degrees of freedom due to variance estimation), MBIS, a method that utilizes information among genes, has a more precise estimate for the common variance. Therefore, MBIS always outperforms the t-test.

On the other hand, the Chi-square distribution is right skewed, implying that its mean is larger than its median. If _{i }

When sample sizes are extremely small, as we mentioned before, SAM will have relatively larger p-values due to a limited number of permutations available, affecting the estimation of q-values by "qvalue". "qvalue" does not perform very well in this situation. For a given cutoff q-value, the corresponding cutoff p-value calculated by "qvalue" could be too large (as seen in the results from t-test and MBIS in simulation and real data) or too conservative (as in the results from SAM), a finding consistent with those from Jung and Jang

Another difficulty for "qvalue" is that the number of selected genes can be very sensitive to the cutoff q-value, especially the very small preset q-value (see Table

Although we assume equal variance in the MBIS, we also evaluate this new method under situations when this assumption is violated. By simulation, we have shown that, when the variances of gene expressions are near constant, MBIS still outperforms both the t-test and SAM, making our method applicable in various situations.

From our experience, variances estimated from raw expression data are highly variable. We should transform data before applying MBIS. Several variance-stabilization and normalization transformation procedures, such as logarithm, Box-Cox transformation, generalized logarithm

Conclusions

For microarray data with extremely small sample sizes, a modified t-test like SAM performs better than a regular t-test in terms of sensitivity and specificity. However, to control FDR, for small preset q-values, SAM fails to select enough true positives and performs worse than the t-test. To circumvent this problem, we propose a model-based information sharing method (MBIS) that uses information shared by genes. We show, using both simulation and real microarray data, that this new method outperforms the t-test and SAM.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

ZC devised the basic idea of the new method and drafted the manuscript; QL participated in study design and manuscript preparation; MK participated in the analyses based on CALSSIFI; RHS participated in developing this new algorithm; MM, XH and YD assisted the study and co-wrote the manuscript. All authors read and approve the final manuscript.

Acknowledgements

The authors thank Ms. Linda Harrison and Ms. Kimberly Lawson for their editorial assistance. ZC would like to thank the support from the NIH grant (UL1 RR024148), awarded to the University of Texas Health Science Center at Houston.