Gene analogue finder: a GRID solution for finding functionally analogous gene products
- Equal contributors
1 Dipartimento Interateneo di Fisica, Università e Politecnico di Bari, via Amendola 173, 70126 Bari Italy
2 INFN Bari, Via Amendola 173, Bari, Italy
3 Istituto di Tecnologie Biomediche, CNR, Via Amendola 122/D, Bari, Italy
BMC Bioinformatics 2007, 8:329 doi:10.1186/1471-2105-8-329Published: 3 September 2007
To date more than 2,1 million gene products from more than 100000 different species have been described specifying their function, the processes they are involved in and their cellular localization using a very well defined and structured vocabulary, the gene ontology (GO). Such vast, well defined knowledge opens the possibility of compare gene products at the level of functionality, finding gene products which have a similar function or are involved in similar biological processes without relying on the conventional sequence similarity approach. Comparisons within such a large space of knowledge are highly data and computing intensive. For this reason this project was based upon the use of the computational GRID, a technology offering large computing and storage resources.
We have developed a tool, GENe AnaloGue FINdEr (ENGINE) that parallelizes the search process and distributes the calculation and data over the computational GRID, splitting the process into many sub-processes and joining the calculation and the data on the same machine and therefore completing the whole search in about 3 days instead of occupying one single machine for more than 5 CPU years. The results of the functional comparison contain potential functional analogues for more than 79000 gene products from the most important species. 46% of the analyzed gene products are well enough described for such an analysis to individuate functional analogues, such as well-known members of the same gene family, or gene products with similar functions which would never have been associated by standard methods.
ENGINE has produced a list of potential functionally analogous relations between gene products within and between species using, in place of the sequence, the gene description of the GO, thus demonstrating the potential of the GO. However, the current limiting factor is the quality of the associations of many gene products from non-model organisms that often have electronic associations, since experimental information is missing. With future improvements of the GO, this limit will be reduced. ENGINE will manifest its power when it is applied to the whole GODB of more than 2,1 million gene products from more than 100000 organisms. The data produced by this search is planed to be available as a supplement to the GO database as soon as we are able to provide regular updates.