Control flow of data cleaning, sampling, training and prediction. This control flow shows only one run in a k-fold cross validation. One fold of the data is used for testing, and the remaining k-1 folds are used for training. To make prediction consistent with the real class distribution, maintain the original class distribution in the test data and only perform data cleaning on the training data. Repeat the same process on each fold of the data as the test data, and use the rest as the training data.
Hu et al. BMC Medical Informatics and Decision Making 2012 12:131 doi:10.1186/1472-6947-12-131