The training set deficit for EST projects. Around 85% of species with representation in dbEST (>100 ESTs) have less than 100 complete CDS entries in the EMBL database. These species comprise ~45% of all ESTs. Sixty-six species, with 246263 dbEST sequences, have no full-length CDS. Source: dbEST and EMBL database (July 2004).
Wasmuth and Blaxter BMC Bioinformatics 2004 5:187 doi:10.1186/1471-2105-5-187