Schematic Representation of Identification of 1227 Feline cDNA Sequences. An initial set of 3035 cDNA sequences were clustered in nucleotide and protein space to identify the longest representative sequence for each cluster. The intersection of the set of cDNA and protein clusters resulted in a set of 2831 cDNA sequence clusters. All sequences within this set that contained N's were removed resulting in a set of 2081 high quality, non-redundant cDNA sequences. These sequences were blasted against the (1) set of ensembl human known cDNA and protein sequences and (2) feline known cDNA and protein sequences. Global alignments were generated for each cDNA blast hit and manually inspected for quality. The final set of 1227 cDNA sequences corresponded to 913 known feline cDNA sequences and 314 novel feline sequences. Blasting to dog, human and mouse sequences identified a total of 914 orthologs, corresponding to 70 novel and 844 known sequences.
Irizarry et al. BMC Genomics 2012 13:31 doi:10.1186/1471-2164-13-31