Figure 1.

Rule generation workflow. The initial 182 patients dataset is randomly divided into training and test sets. The training set is used by the supervised learning procedure to iteratively calculate the LLM parameter:" maximum error allowed for a rule" by performing a complete 10-fold cross validation. The whole training set is randomly subdivided into 10 non-overlapping subsets, nine of which are used to train the classifier by employing ADID and LLM. The classifier is subsequently used to predict the outcome of the patients in the excluded subset. This procedure is repeated 10 times until every subset is classified once. Each parameter value is then evaluated according to the mean classification accuracy obtained in the cross validation. The parameter value, which obtained the highest mean accuracy, is selected to generate the final optimal classification rules. The rules are then tested on an independent cohort to assess their ability to predict patients' outcome.

Cangelosi et al. BMC Bioinformatics 2014 15(Suppl 5):S4   doi:10.1186/1471-2105-15-S5-S4