Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

Open Access Methodology article

RSpred, a set of Hidden Markov Models to detect and classify the RIFIN and STEVOR proteins of Plasmodium falciparum

Nicolas Joannin1*, Yvonne Kallberg23, Mats Wahlgren1* and Bengt Persson23

Author affiliations

1 Department of Microbiology, Cell and Tumor biology (MTC), Karolinska Institutet, SE-17177 Stockholm, Sweden

2 Department of Cell and Molecular Biology (CMB), Karolinska Institutet, SE-17177 Stockholm, Sweden

3 IFM Bioinformatics and Swedish e-Science Research Centre (SeRC), Linköping University, SE-58183 Linköping, Sweden

For all author emails, please log on.

Citation and License

BMC Genomics 2011, 12:119  doi:10.1186/1471-2164-12-119

Published: 18 February 2011

Abstract

Background

Many parasites use multicopy protein families to avoid their host's immune system through a strategy called antigenic variation. RIFIN and STEVOR proteins are variable surface antigens uniquely found in the malaria parasites Plasmodium falciparum and P. reichenowi. Although these two protein families are different, they have more similarity to each other than to any other proteins described to date. As a result, they have been grouped together in one Pfam domain. However, a recent study has described the sub-division of the RIFIN protein family into several functionally distinct groups. These sub-groups require phylogenetic analysis to sort out, which is not practical for large-scale projects, such as the sequencing of patient isolates and meta-genomic analysis.

Results

We have manually curated the rif and stevor gene repertoires of two Plasmodium falciparum genomes, isolates DD2 and HB3. We have identified 25% of mis-annotated and ~30 missing rif and stevor genes. Using these data sets, as well as sequences from the well curated reference genome (isolate 3D7) and field isolate data from Uniprot, we have developed a tool named RSpred. The tool, based on a set of hidden Markov models and an evaluation program, automatically identifies STEVOR and RIFIN sequences as well as the sub-groups: A-RIFIN, B-RIFIN, B1-RIFIN and B2-RIFIN. In addition to these groups, we distinguish a small subset of STEVOR proteins that we named STEVOR-like, as they either differ remarkably from typical STEVOR proteins or are too fragmented to reach a high enough score. When compared to Pfam and TIGRFAMs, RSpred proves to be a more robust and more sensitive method. We have applied RSpred to the proteomes of several P. falciparum strains, P. reichenowi, P. vivax, P. knowlesi and the rodent malaria species. All groups were found in the P. falciparum strains, and also in the P. reichenowi parasite, whereas none were predicted in the other species.

Conclusions

We have generated a tool for the sorting of RIFIN and STEVOR proteins, large antigenic variant protein groups, into homogeneous sub-families. Assigning functions to such protein families requires their subdivision into meaningful groups such as we have shown for the RIFIN protein family. RSpred removes the need for complicated and time consuming phylogenetic analysis methods. It will benefit both research groups sequencing whole genomes as well as others working with field isolates. RSpred is freely accessible via http://www.ifm.liu.se/bioinfo/ webcite.