Table 1

Automated Candidate Gene Prediction Systems

Semi-Automated Systems
GeneSeeker is a semi-automated web-server tool which selects positional candidates based on expression and phenotypic data from human and mouse. Queries must be formulated by the end-user using Boolean expressions [13,33]. ♠ ◇
Systems Biology Techniques
Prioritizer uses pathway and interaction data from KEGG [17,34], Reactome [35], and HPRD [36]. Interactions are also predicted using a Bayesian technique based on GO keywords [23] and other databases [5].
In Gentrepid Common Pathway Scanning (CPS), pathways are associated with phenotypes using either known disease genes, or by searching for enrichment of pathways across multiple disease intervals associated with the phenotype [4]. ♠◇
Oti et al use protein-protein interaction data from HPRD [36], Y2H [37,38], and PCP [39,40] giving coverage of 10 894 human genes [24].
Genotype-Phenotype Mapping Methods
G2D [32] uses biomedical literature to associate pathological conditions with GO terms [23]. Candidate genes are identified by homology to GO-annotated disease-associated genes. ♠◇
Gentrepid Common Module Profiling (CMP) searches for enrichment of particular domains in gene clusters associated with a particular phenotype. Domains are extracted either from known disease genes or by comparison of multiple disease intervals [4]. ♠◇
POCUS searches for over-representation of functional annotation among multiple loci associated with the same disease. Functional annotation is based on keywords from InterPro domains [22] and GO [23]. No a priori knowledge of the phenotype is required [3]. ♠
Techniques based on a bipartite distribution of "disease" and "non-disease" genes
The eVOC system uses text mining of biomedical literature to associate a phenotype with anatomy terms and links these with human expression data to produce a ranked list of disease genes. The classifier is a machine-learning technique, based on a bipartite training set of 17 known "disease genes" and 306 "non-disease genes" [30]. ♠
DGP (Disease Gene Prediction) is a web tool which selects genes based on protein sequence properties. The properties analysed by DGP include protein length, degree of sequence conservation, the extent of phylogenetic relationship and paralogy patterns [31,41]. ♠
PROSPECTR (PRiOrization by Sequence and Phylogenetic Extent of CandidaTe Regions) uses an alternating decision tree to discriminate "disease genes" from "non-disease genes" using a classifier based on sequence features such as gene length, protein length, and similarity of homologs in other species [12]. ♠
Hybrid techniques
SUSPECTS combines a genotype-phenotype mapping method based on disease-gene-associated keywords from InterPro and GO, and expression libraries, with the PROSPECTR Boolean classifier. Disease genes are prioritized [21]. ♠ ◇

♠ Assessed here, ◇ Webserver.

Teber et al. BMC Bioinformatics 2009 10(Suppl 1):S69   doi:10.1186/1471-2105-10-S1-S69