Table 1

A hypothetical EST count table demonstrating CMH analysis and also a contrived example of Simpson's paradox.

Tissue I

Tissue II

Pooled




Normal

Cancer

Normal

Cancer

Normal

Cancer


Gene A

280

580

20

20

300

600

Other genes

20,000

80,000

380,000

620,000

400,000

700,000


This hypothetical case serves both as an example of how Cochran-Mantel-Haenszel (CMH) is applied as well as the occurrence of Simpson's paradox. Gene A is the gene under investigation. Expressions from all other genes are pooled into the "other genes" row. Bold typeface indicates columns showing higher cancer vs. normal propensities. CMH is applied on the stratified tissue columns (but not on the pooled data). A casual observation involving only the pooled data would suggest Gene A as having higher expression in cancer (X2 test p-value close to 0 when analyzing only the pooled). However, a closer inspection on each of the tissue columns reveals otherwise. The observed difference between cancer and normal of the "other genes" is theoretically mostly due to sampling bias.

Wu et al. BMC Genomics 2012 13(Suppl 7):S12   doi:10.1186/1471-2164-13-S7-S12

Open Data