Cluster analysis of protein array results via similarity of Gene Ontology annotation
1 Department of Medical Biophysics, University of Toronto, Toronto, Canada
2 Arthur and Sonia Labatt Brain Tumour Research Centre, Department of Cell Biology, Hospital for Sick Children, 555 University Avenue, Toronto M5G 1X8, Canada
3 Ontario Cancer Institute, Princess Margaret Hospital, 610 University Avenue, Toronto M5G 2M9, Canada
BMC Bioinformatics 2006, 7:338 doi:10.1186/1471-2105-7-338Published: 12 July 2006
With the advent of high-throughput proteomic experiments such as arrays of purified proteins comes the need to analyse sets of proteins as an ensemble, as opposed to the traditional one-protein-at-a-time approach. Although there are several publicly available tools that facilitate the analysis of protein sets, they do not display integrated results in an easily-interpreted image or do not allow the user to specify the proteins to be analysed.
We developed a novel computational approach to analyse the annotation of sets of molecules. As proof of principle, we analysed two sets of proteins identified in published protein array screens. The distance between any two proteins was measured as the graph similarity between their Gene Ontology (GO) annotations. These distances were then clustered to highlight subsets of proteins sharing related GO annotation. In the first set of proteins found to bind small molecule inhibitors of rapamycin, we identified three subsets containing four or five proteins each that may help to elucidate how rapamycin affects cell growth whereas the original authors chose only one novel protein from the array results for further study. In a set of phosphoinositide-binding proteins, we identified subsets of proteins associated with different intracellular structures that were not highlighted by the analysis performed in the original publication.
By determining the distances between annotations, our methodology reveals trends and enrichment of proteins of particular functions within high-throughput datasets at a higher sensitivity than perusal of end-point annotations. In an era of increasingly complex datasets, such tools will help in the formulation of new, testable hypotheses from high-throughput experimental data.