Email updates

Keep up to date with the latest news and content from BMC Medical Informatics and Decision Making and BioMed Central.

Open Access Open Badges Research article

Concordance and predictive value of two adverse drug event data sets

Aurel Cami12* and Ben Y Reis12

Author Affiliations

1 Division of Emergency Medicine, Boston Children’s Hospital, 1 Autumn Street, 5th Floor, Boston, MA 02215, USA

2 Department of Pediatrics, Harvard Medical School, Boston, MA, USA

For all author emails, please log on.

BMC Medical Informatics and Decision Making 2014, 14:74  doi:10.1186/1472-6947-14-74

Published: 22 August 2014



Accurate prediction of adverse drug events (ADEs) is an important means of controlling and reducing drug-related morbidity and mortality. Since no single “gold standard” ADE data set exists, a range of different drug safety data sets are currently used for developing ADE prediction models. There is a critical need to assess the degree of concordance between these various ADE data sets and to validate ADE prediction models against multiple reference standards.


We systematically evaluated the concordance of two widely used ADE data sets – Lexi-comp from 2010 and SIDER from 2012. The strength of the association between ADE (drug) counts in Lexi-comp and SIDER was assessed using Spearman rank correlation, while the differences between the two data sets were characterized in terms of drug categories, ADE categories and ADE frequencies. We also performed a comparative validation of the Predictive Pharmacosafety Networks (PPN) model using both ADE data sets. The predictive power of PPN using each of the two validation sets was assessed using the area under Receiver Operating Characteristic curve (AUROC).


The correlations between the counts of ADEs and drugs in the two data sets were 0.84 (95% CI: 0.82-0.86) and 0.92 (95% CI: 0.91-0.93), respectively. Relative to an earlier snapshot of Lexi-comp from 2005, Lexi-comp 2010 and SIDER 2012 introduced a mean of 1,973 and 4,810 new drug-ADE associations per year, respectively. The difference between these two data sets was most pronounced for Nervous System and Anti-infective drugs, Gastrointestinal and Nervous System ADEs, and postmarketing ADEs. A minor difference of 1.1% was found in the AUROC of PPN when SIDER 2012 was used for validation instead of Lexi-comp 2010.


In conclusion, the ADE and drug counts in Lexi-comp and SIDER data sets were highly correlated and the choice of validation set did not greatly affect the overall prediction performance of PPN. Our results also suggest that it is important to be aware of the differences that exist among ADE data sets, especially in modeling applications focused on specific drug and ADE categories.

Adverse drug events; Prediction; Concordance