Open Access Open Badges Research article

Use of the i2b2 research query tool to conduct a matched case–control clinical research study: advantages, disadvantages and methodological considerations

Emilie K Johnson12*, Sarabeth Broder-Fingert3, Pornthep Tanpowpong4, Jonathan Bickel56, Jenifer R Lightdale7 and Caleb P Nelson1

Author Affiliations

1 Department of Urology, Boston Children’s Hospital, 300 Longwood Ave, HU 3rd Floor, Boston, MA 02115, USA

2 Harvard-Wide Pediatric Health Services Research Fellowship, Boston, MA, USA

3 Department of Pediatrics, Massachusetts General Hospital for Children, Boston, USA

4 Department of Pediatrics, Ramathibodi Hospital, Mahidol University, Bangkok, Thailand

5 Informatics Program, Boston Children’s Hospital, Boston, USA

6 Information Systems Department, Boston Children’s Hospital, Boston, USA

7 Division of Gastroenterology and Nutrition, Boston Children’s Hospital, Boston, USA

For all author emails, please log on.

BMC Medical Research Methodology 2014, 14:16  doi:10.1186/1471-2288-14-16

Published: 30 January 2014



A major aim of the i2b2 (informatics for integrating biology and the bedside) clinical data informatics framework aims to create an efficient structure within which patients can be identified for clinical and translational research projects.

Our objective was to describe the respective roles of the i2b2 research query tool and the electronic medical record (EMR) in conducting a case-controlled clinical study at our institution.


We analyzed the process of using i2b2 and the EMR together to generate a complete research database for a case–control study that sought to examine risk factors for kidney stones among gastrostomy tube (G-tube) fed children.


Our final case cohort consisted of 41/177 (23%) of potential cases initially identified by i2b2, who were matched with 80/486 (17%) of potential controls. Cases were 10 times more likely to be excluded for inaccurate coding regarding stones vs. inaccurate coding regarding G-tubes. A majority (67%) of cases were excluded due to not meeting clinical inclusion criteria, whereas a majority of control exclusions (72%) occurred due to inadequate clinical data necessary for study completion. Full dataset assembly required complementary information from i2b2 and the EMR.


i2b2 was critical as a query analysis tool for patient identification in our case–control study. Patient identification via procedural coding appeared more accurate compared with diagnosis coding. Completion of our investigation required iterative interplay of i2b2 and the EMR to assemble the study cohort.

Case–control studies; Methodology; Administrative data; Informatics