Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Research article

A semiparametric modeling framework for potential biomarker discovery and the development of metabonomic profiles

Samiran Ghosh1*, David F Grant2*, Dipak K Dey3 and Dennis W Hill2

Author Affiliations

1 Department of Mathematical Sciences, Indiana University Purdue University, Indianapolis, Indiana, USA

2 Department of Pharmaceutical Sciences, University of Connecticut, Storrs, Connecticut, USA

3 Department of Statistics, University of Connecticut, Storrs, Connecticut, USA

For all author emails, please log on.

BMC Bioinformatics 2008, 9:38  doi:10.1186/1471-2105-9-38

Published: 23 January 2008



The discovery of biomarkers is an important step towards the development of criteria for early diagnosis of disease status. Recently electrospray ionization (ESI) and matrix assisted laser desorption (MALDI) time-of-flight (TOF) mass spectrometry have been used to identify biomarkers both in proteomics and metabonomics studies. Data sets generated from such studies are generally very large in size and thus require the use of sophisticated statistical techniques to glean useful information. Most recent attempts to process these types of data model each compound's intensity either discretely by positional (mass to charge ratio) clustering or through each compounds' own intensity distribution. Traditionally data processing steps such as noise removal, background elimination and m/z alignment, are generally carried out separately resulting in unsatisfactory propagation of signals in the final model.


In the present study a novel semi-parametric approach has been developed to distinguish urinary metabolic profiles in a group of traumatic patients from those of a control group consisting of normal individuals. Data sets obtained from the replicates of a single subject were used to develop a functional profile through Dirichlet mixture of beta distribution. This functional profile is flexible enough to accommodate variability of the instrument and the inherent variability of each individual, thus simultaneously addressing different sources of systematic error. To address instrument variability, all data sets were analyzed in replicate, an important issue ignored by most studies in the past. Different model comparisons were performed to select the best model for each subject. The m/z values in the window of the irregular pattern are then further recommended for possible biomarker discovery.


To the best of our knowledge this is the very first attempt to model the physical process behind the time-of flight mass spectrometry. Most of the state of the art techniques does not take these physical principles in consideration while modeling such data. The proposed modeling process will apply as long as the basic physical principle presented in this paper is valid. Notably we have confined our present work mostly within the modeling aspect. Nevertheless clinical validation of our recommended list of potential biomarkers will be required. Hence, we have termed our modeling approach as a "framework" for further work.