Schematic overview of the classification process of all human membrane proteins. The classification process had two general steps: an automatic and a manual or semi-manual. The automatic step can be divided into four parts, represented by blue boxes. First a dataset representing the human proteome was downloaded from the International Protein Index Transmembrane proteins were predicted from the proteome by using three different TM helix prediction softwares: Phobius, SOSUI, and TMHMM. Proteins predicted to contain at least one TM helix by two of the softwares were assigned for further analysis. Splice variants were removed using BLAT to align all protein sequences to the human genome. The longest protein sequence for each genomic location, defined as a gene, was selected and clustered using a local implementation of the ISODATA algorithm. Pfam and GO terms, describing molecular function, were downloaded from IPI and used to provide an initial view of the created clusters' function and family affiliation. This information was used to divide them into three functional classes (receptors, enzymes, and transporters) and one miscellaneous class. In the manual classification step the clusters were compared with group databases, specialized in the three functional groups, and to family databases that provide information about protein families and their members. These resources are shown by the green bars in the figure. By combining the results from the clustering with members found in databases a final result could be compiled for the different protein families and groups.
Almén et al. BMC Biology 2009 7:50 doi:10.1186/1741-7007-7-50