Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Methodology article

Cluster analysis of protein array results via similarity of Gene Ontology annotation

Cheryl Wolting12*, C Jane McGlade12 and David Tritchler13

Author Affiliations

1 Department of Medical Biophysics, University of Toronto, Toronto, Canada

2 Arthur and Sonia Labatt Brain Tumour Research Centre, Department of Cell Biology, Hospital for Sick Children, 555 University Avenue, Toronto M5G 1X8, Canada

3 Ontario Cancer Institute, Princess Margaret Hospital, 610 University Avenue, Toronto M5G 2M9, Canada

For all author emails, please log on.

BMC Bioinformatics 2006, 7:338  doi:10.1186/1471-2105-7-338

Published: 12 July 2006



With the advent of high-throughput proteomic experiments such as arrays of purified proteins comes the need to analyse sets of proteins as an ensemble, as opposed to the traditional one-protein-at-a-time approach. Although there are several publicly available tools that facilitate the analysis of protein sets, they do not display integrated results in an easily-interpreted image or do not allow the user to specify the proteins to be analysed.


We developed a novel computational approach to analyse the annotation of sets of molecules. As proof of principle, we analysed two sets of proteins identified in published protein array screens. The distance between any two proteins was measured as the graph similarity between their Gene Ontology (GO) annotations. These distances were then clustered to highlight subsets of proteins sharing related GO annotation. In the first set of proteins found to bind small molecule inhibitors of rapamycin, we identified three subsets containing four or five proteins each that may help to elucidate how rapamycin affects cell growth whereas the original authors chose only one novel protein from the array results for further study. In a set of phosphoinositide-binding proteins, we identified subsets of proteins associated with different intracellular structures that were not highlighted by the analysis performed in the original publication.


By determining the distances between annotations, our methodology reveals trends and enrichment of proteins of particular functions within high-throughput datasets at a higher sensitivity than perusal of end-point annotations. In an era of increasingly complex datasets, such tools will help in the formulation of new, testable hypotheses from high-throughput experimental data.