|
The feature set. The list of features which were made available to the machine learning application (Weka) to build the alternating decision tree. |
||
| Feature |
Source |
Description |
|
|
||
| Gene length |
EnsemblMart 22.1 |
Length of gene in bp. |
| CDS length |
EnsemblMart 22.1 |
Length of coding sequence in bp. |
| cDNA length |
EnsemblMart 22.1 |
Length of complementary DNA in bp. |
| Protein length |
EnsemblMart 22.1 |
Length of protein in aa. |
| Length of 3' UTR |
EnsemblMart 22.1 |
The length of the 3' untranslated region (UTR) in bp |
| Length of 5' UTR |
EnsemblMart 22.1 |
The length of the 5' untranslated region (UTR) in bp |
| Distance to nearest neighbouring gene |
EnsemblMart 22.1 |
Distance to the next known gene on the same chromosome on either strand in bp. |
| Number of exons |
EnsemblMart 22.1 |
Number of exons in the gene. |
| GC |
EnsemblMart 22.1 |
GC content (as a %) of gene |
| Transmembrane |
EnsemblMart 22.1 |
Prediction of transmembrane domains (1 for yes or 0 for no) |
| Signal peptide |
EnsemblMart 22.1 |
Prediction of signal peptide (1 for yes or 0 for no) |
| Paralog |
EnsemblMart 22.1 |
If the gene has a paralog in the human genome (1 for yes or 0 for no) |
| Paralog % identity |
EnsemblMart 22.1 |
% protein identity of best paralog in the human genome. Genes without paralogs have "unknown" entered here. |
| Mouse homolog % identity |
Homologene |
% protein identity of mouse homolog. Genes without a mouse homolog have "0" entered here. |
| Rat homolog % identity |
Homologene |
% protein identity of rat homolog. Genes without a rat homolog have "0" entered here. |
| Worm homolog % identity |
Homologene |
% protein identity of worm homolog (potentially 0, see above) |
| Fly homolog % identity |
Homologene |
% protein identity of fly homolog (potentially 0, see above) |
| Yeast homolog % identity |
Homologene |
% protein identity of yeast homolog (potentially 0, see above) |
| Arabidopsis homolog % identity |
Homologene |
% protein identity of Arabidopsis homolog (potentially 0, see above) |
| Mouse homolog Ka |
Homologene |
Measure of non-synonymous changes between human and mouse homolog. |
| Mouse homolog Ks |
Homologene |
Measure of synonymous changes between human and mouse homolog. |
| Mouse homolog Ka / Ks |
Homologene |
Ratio of above two fields. |
| CpG island at 3' end of gene |
EnsemblMart 22.1 |
If a CpG island exists at the 3' end of the gene (1 or 0) |
| CpG island at 5' end of gene |
EnsemblMart 22.1 |
If a CpG island exists at the 5' end of the gene (1 or 0) |
Adie et al. BMC Bioinformatics 2005 6:55 doi:10.1186/1471-2105-6-55 |
||