Email updates

Keep up to date with the latest news and content from BMC Medical Informatics and Decision Making and BioMed Central.

Open Access Highly Accessed Research article

Applying data mining techniques to improve diagnosis in neonatal jaundice

Duarte Ferreira1*, Abílio Oliveira1 and Alberto Freitas23

Author Affiliations

1 Centro Hospitalar Tâmega e Sousa, EPE, Lugar do Tapadinho, Penafiel, 4564-007, Portugal

2 Department of Health Information and Decision Sciences, Faculty of Medicine, University of Porto, Porto, Portugal

3 CINTESIS - Center for Research in Health Technologies and Information Systems, Porto, Portugal

For all author emails, please log on.

BMC Medical Informatics and Decision Making 2012, 12:143  doi:10.1186/1472-6947-12-143

Published: 7 December 2012



Hyperbilirubinemia is emerging as an increasingly common problem in newborns due to a decreasing hospital length of stay after birth. Jaundice is the most common disease of the newborn and although being benign in most cases it can lead to severe neurological consequences if poorly evaluated. In different areas of medicine, data mining has contributed to improve the results obtained with other methodologies.

Hence, the aim of this study was to improve the diagnosis of neonatal jaundice with the application of data mining techniques.


This study followed the different phases of the Cross Industry Standard Process for Data Mining model as its methodology.

This observational study was performed at the Obstetrics Department of a central hospital (Centro Hospitalar Tâmega e Sousa – EPE), from February to March of 2011. A total of 227 healthy newborn infants with 35 or more weeks of gestation were enrolled in the study. Over 70 variables were collected and analyzed. Also, transcutaneous bilirubin levels were measured from birth to hospital discharge with maximum time intervals of 8 hours between measurements, using a noninvasive bilirubinometer.

Different attribute subsets were used to train and test classification models using algorithms included in Weka data mining software, such as decision trees (J48) and neural networks (multilayer perceptron). The accuracy results were compared with the traditional methods for prediction of hyperbilirubinemia.


The application of different classification algorithms to the collected data allowed predicting subsequent hyperbilirubinemia with high accuracy. In particular, at 24 hours of life of newborns, the accuracy for the prediction of hyperbilirubinemia was 89%. The best results were obtained using the following algorithms: naive Bayes, multilayer perceptron and simple logistic.


The findings of our study sustain that, new approaches, such as data mining, may support medical decision, contributing to improve diagnosis in neonatal jaundice.

Data mining; Classification and prediction; Neonatal hyperbilirubinemia; Prognosis