Statistical learning techniques applied to epidemiology: a simulated case-control comparison study with logistic regression
1 H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, 33612, USA
2 Binghamton University, Bioengineering Department, Binghamton, NY, USA
BMC Bioinformatics 2011, 12:37 doi:10.1186/1471-2105-12-37Published: 27 January 2011
When investigating covariate interactions and group associations with standard regression analyses, the relationship between the response variable and exposure may be difficult to characterize. When the relationship is nonlinear, linear modeling techniques do not capture the nonlinear information content. Statistical learning (SL) techniques with kernels are capable of addressing nonlinear problems without making parametric assumptions. However, these techniques do not produce findings relevant for epidemiologic interpretations. A simulated case-control study was used to contrast the information embedding characteristics and separation boundaries produced by a specific SL technique with logistic regression (LR) modeling representing a parametric approach. The SL technique was comprised of a kernel mapping in combination with a perceptron neural network. Because the LR model has an important epidemiologic interpretation, the SL method was modified to produce the analogous interpretation and generate odds ratios for comparison.
The SL approach is capable of generating odds ratios for main effects and risk factor interactions that better capture nonlinear relationships between exposure variables and outcome in comparison with LR.
The integration of SL methods in epidemiology may improve both the understanding and interpretation of complex exposure/disease relationships.