Application of a sensitive collection heuristic for very large protein families: Evolutionary relationship between adipose triglyceride lipase (ATGL) and classic mammalian lipases
1 IMP - Research Institute of Molecular Pathology, Dr. Bohr-Gasse 7, A-1030 Vienna, Republic of Austria
2 Department of Chemistry and Chemical Biology, Harvard University, 12 Oxford str., M-105, 02138 Cambridge, MA, USA
BMC Bioinformatics 2006, 7:164 doi:10.1186/1471-2105-7-164Published: 21 March 2006
Manually finding subtle yet statistically significant links to distantly related homologues becomes practically impossible for very populated protein families due to the sheer number of similarity searches to be invoked and analyzed. The unclear evolutionary relationship between classical mammalian lipases and the recently discovered human adipose triglyceride lipase (ATGL; a patatin family member) is an exemplary case for such a problem.
We describe an unsupervised, sensitive sequence segment collection heuristic suitable for assembling very large protein families. It is based on fan-like expanding, iterative database searches. To prevent inclusion of unrelated hits, additional criteria are introduced: minimal alignment length and overlap with starting sequence segments, finding starting sequences in reciprocal searches, automated filtering for compositional bias and repetitive patterns. This heuristic was implemented as FAMILYSEARCHER in the ANNIE sequence analysis environment and applied to search for protein links between the classical lipase family and the patatin-like group.
The FAMILYSEARCHER is an efficient tool for tracing distant evolutionary relationships involving large protein families. Although classical lipases and ATGL have no obvious sequence similarity and differ with regard to fold and catalytic mechanism, homology links detected with FAMILYSEARCHER show that they are evolutionarily related. The conserved sequence parts can be narrowed down to an ancestral core module consisting of three β-strands, one α-helix and a turn containing the typical nucleophilic serine. Moreover, this ancestral module also appears in numerous enzymes with various substrate specificities, but that critically rely on nucleophilic attack mechanisms.