Email updates

Keep up to date with the latest news and content from BMC Medical Genomics and BioMed Central.

Open Access Highly Accessed Technical advance

A metadata approach for clinical data management in translational genomics studies in breast cancer

Irene Papatheodorou1*, Charles Crichton2, Lorna Morris1, Peter Maccallum1, METABRIC Group, Jim Davies2, James D Brenton1 and Carlos Caldas1

Author Affiliations

1 Department of Oncology, University of Cambridge and Cancer Research UK Cambridge Research Institute, Li Ka Shing Centre, Cambridge, CB2 0RE, UK

2 Oxford University Computing Laboratory, Wolfson Building, Parks Road, Oxford, OX1 3QD, UK

For all author emails, please log on.

BMC Medical Genomics 2009, 2:66  doi:10.1186/1755-8794-2-66

Published: 30 November 2009



In molecular profiling studies of cancer patients, experimental and clinical data are combined in order to understand the clinical heterogeneity of the disease: clinical information for each subject needs to be linked to tumour samples, macromolecules extracted, and experimental results. This may involve the integration of clinical data sets from several different sources: these data sets may employ different data definitions and some may be incomplete.


In this work we employ semantic web techniques developed within the CancerGrid project, in particular the use of metadata elements and logic-based inference to annotate heterogeneous clinical information, integrate and query it.


We show how this integration can be achieved automatically, following the declaration of appropriate metadata elements for each clinical data set; we demonstrate the practicality of this approach through application to experimental results and clinical data from five hospitals in the UK and Canada, undertaken as part of the METABRIC project (Molecular Taxonomy of Breast Cancer International Consortium).


We describe a metadata approach for managing similarities and differences in clinical datasets in a standardized way that uses Common Data Elements (CDEs). We apply and evaluate the approach by integrating the five different clinical datasets of METABRIC.