Table 1

The feature set. The list of features which were made available to the machine learning application (Weka) to build the alternating decision tree.

Feature
Source
Description

Gene length
EnsemblMart 22.1
Length of gene in bp.
CDS length
EnsemblMart 22.1
Length of coding sequence in bp.
cDNA length
EnsemblMart 22.1
Length of complementary DNA in bp.
Protein length
EnsemblMart 22.1
Length of protein in aa.
Length of 3' UTR
EnsemblMart 22.1
The length of the 3' untranslated region (UTR) in bp
Length of 5' UTR
EnsemblMart 22.1
The length of the 5' untranslated region (UTR) in bp
Distance to nearest neighbouring gene
EnsemblMart 22.1
Distance to the next known gene on the same chromosome on either strand in bp.
Number of exons
EnsemblMart 22.1
Number of exons in the gene.
GC
EnsemblMart 22.1
GC content (as a %) of gene
Transmembrane
EnsemblMart 22.1
Prediction of transmembrane domains (1 for yes or 0 for no)
Signal peptide
EnsemblMart 22.1
Prediction of signal peptide (1 for yes or 0 for no)
Paralog
EnsemblMart 22.1
If the gene has a paralog in the human genome (1 for yes or 0 for no)
Paralog % identity
EnsemblMart 22.1
% protein identity of best paralog in the human genome. Genes without paralogs have "unknown" entered here.
Mouse homolog % identity
Homologene
% protein identity of mouse homolog. Genes without a mouse homolog have "0" entered here.
Rat homolog % identity
Homologene
% protein identity of rat homolog. Genes without a rat homolog have "0" entered here.
Worm homolog % identity
Homologene
% protein identity of worm homolog (potentially 0, see above)
Fly homolog % identity
Homologene
% protein identity of fly homolog (potentially 0, see above)
Yeast homolog % identity
Homologene
% protein identity of yeast homolog (potentially 0, see above)
Arabidopsis homolog % identity
Homologene
% protein identity of Arabidopsis homolog (potentially 0, see above)
Mouse homolog Ka
Homologene
Measure of non-synonymous changes between human and mouse homolog.
Mouse homolog Ks
Homologene
Measure of synonymous changes between human and mouse homolog.
Mouse homolog Ka / Ks
Homologene
Ratio of above two fields.
CpG island at 3' end of gene
EnsemblMart 22.1
If a CpG island exists at the 3' end of the gene (1 or 0)
CpG island at 5' end of gene
EnsemblMart 22.1
If a CpG island exists at the 5' end of the gene (1 or 0)

Adie et al. BMC Bioinformatics 2005 6:55   doi:10.1186/1471-2105-6-55