Reducing the probability of false positive research findings by pre-publication validation – Experience with a large multiple sclerosis database
1 Sylvia Lawry Centre for MS Research, Hohenlindener Str. 1, 81677 Munich, Germany
2 Department of Statistics, University of Dortmund, 44221 Dortmund, Germany
3 Institute for Medical Biometry, Epidemiology and Computer Science, Clinic of Gutenberg's University Mainz, Germany
4 Department of Statistics, University of Dortmund, 44221 Dortmund, Germany
5 University Dept of Clinical Neurology, Oxford University, UK
BMC Medical Research Methodology 2008, 8:18 doi:10.1186/1471-2288-8-18Published: 10 April 2008
Published false positive research findings are a major problem in the process of scientific discovery. There is a high rate of lack of replication of results in clinical research in general, multiple sclerosis research being no exception. Our aim was to develop and implement a policy that reduces the probability of publishing false positive research findings.
We have assessed the utility to work with a pre-publication validation policy after several years of research in the context of a large multiple sclerosis database.
The large database of the Sylvia Lawry Centre for Multiple Sclerosis Research was split in two parts: one for hypothesis generation and a validation part for confirmation of selected results. We present case studies from 5 finalized projects that have used the validation policy and results from a simulation study.
In one project, the "relapse and disability" project as described in section II (example 3), findings could not be confirmed in the validation part of the database. The simulation study showed that the percentage of false positive findings can exceed 20% depending on variable selection.
We conclude that the validation policy has prevented the publication of at least one research finding that could not be validated in an independent data set (and probably would have been a "true" false-positive finding) over the past three years, and has led to improved data analysis, statistical programming, and selection of hypotheses. The advantages outweigh the lost statistical power inherent in the process.