Different levels of computational coverage in protein sequences. Three representative proteins from the human genome are shown: (1) a tyrosine kinase (GI: 307508) has a comprehensive coverage by five Pfam domains (shown as colored rectangles with their respective names). Sequence regions that are less than 50 aa long are shown as grey lines; (2) a hypothetical protein (GI: 341913853) has no matches to any known protein domain or region and is considered part of “dark matter” (shown as a black line with a question mark above); (3) a leucine-rich repeat-containing protein is characterized only partly by a match to the LLR_8 (leucine-rich repeat) domain; however two large portions of its sequence (90% of total amino acid residues) show no matches to any domain or region, and therefore should be considered a part of “dark matter” (black lines with question marks above).
Rekapalli et al. BMC Genomics 2012 13:634 doi:10.1186/1471-2164-13-634