Methodology of the approach and statistics. Initially, 726 protein sequences were considered from Mycoplasma gallisepticum genome, which have 620 unassigned regions of different lengths. 434 unassigned regions are at least 70 residues long. Out of 434, only 364 passed through transmembrane and coiled coil filtering and 359 sequences after secondary structure filtering. The remaining unassigned regions (359) sequences were subject to PSI-BLAST searches, but only 230 unassigned regions picked up at least two hits. We extracted full-length sequences for each hit in PSI-BLAST and used for HMMpfam search. Here again, only 62 unassigned regions were associated indirectly with pre-existing domains which correspond to 48 different domain families.
Reddy et al. BMC Research Notes 2010 3:98 doi:10.1186/1756-0500-3-98