Taenia solium taeniasis/cysticercosis is a zoonotic helminth infection mainly found in rural regions of Africa, Asia and Latin America. In endemic areas, diagnosis of cysticercosis largely depends on serology, but these methods have their drawbacks and require improvement. This implies better knowledge of the proteins secreted and excreted by the parasite. In a previous study, we used a custom protein database containing protein sequences from related helminths to identify T. solium metacestode excretion/secretion proteins. An alternative or complementary approach would be to use expressed sequence tags combined with BLAST and protein mapping to supercontigs of Echinococcus granulosus, a closely related cestode. In this study, we evaluate this approach and compare the results to those obtained in the previous study.
We report 297 proteins organized in 106 protein groups based on homology. Additional classification was done using Gene Ontology information on biological process and molecular function. Of the 106 protein groups, 58 groups were newly identified, while 48 groups confirmed previous findings. Blast2GO analysis revealed that the majority of the proteins were involved in catalytic activities and binding.
In this study, we used translated expressed sequence tags combined with BLAST and mapping strategies to both confirm and complement previous research. Our findings are comparable to recent studies on other helminth genera like Echinococcus, Schistosoma and Clonorchis, indicating similarities between helminth excretion/secretion proteomes.
Keywords:Expressed sequence tag; Excretion/secretion proteins; Taenia solium; Proteomics
Taenia solium taeniasis/cysticercosis is a zoonotic helminth infection mainly found in poor and rural regions of Africa, Asia and Latin America where it has a large impact on public health [1-3]. The adult tapeworm develops in the small intestine of humans (taeniasis). Mature proglottids full of eggs break off from the distal end of the worm and leave the body with the stool. Both humans and pigs can act as intermediate hosts when the infective larval stages (oncospheres) inside the eggs are ingested and liberated in the stomach. The oncospheres then enter the blood flow through the intestinal mucosa. Cysticercosis is caused when oncospheres lodge themselves in the subcutaneous and muscle tissues and the central nervous system, where they develop into metacestode larval stages (cysts). In humans, epilepsy and other neurological symptoms can be provoked by immunological reactions against degenerating cysts that have developed in the central nervous system (neurocysticercosis).
Diagnosis of porcine and human (neuro) cysticercosis largely depends on antigen and/or antibody detection, but these serological methods also have their specific drawbacks . Improving current diagnostic assays automatically implies better knowledge of the proteins secreted and excreted by the metacestodes.
Proteomic experiments involving liquid chromatography and tandem mass spectrometry (LC-MS/MS) typically attempt to match the generated experimental spectra to in silico spectra from a (target) protein database. Ideally, this database contains every protein likely to be in the sample, but obtaining such an all-including protein database proves difficult when there is little to no genomic information available, as was the case for T. solium until recently . In our previous study, we bypassed this limitation by using a custom database with known proteins from related helminths (Taenia, Echinococcus, Schistosoma and Trichinella) as a target database in the LC-MS/MS experiments . We deliberately did not use translated expressed sequence tags (ESTs), because we wanted to investigate to usefulness of a target database made up of protein sequences originating mostly from (closely) related helminths.
The usefulness of ESTs for the identification of helminth proteins has already been described for e.g. Haemonchus contortus[7,8] and Echinococcus granulosus. In the case of T. solium, ESTs from different parasite stages have been made available by different research groups, both published [10,11] and unpublished (Huang J. et al., Analysis of Taenia solium and Taenia saginata adult gene expression profile, 2009 and Aguilar-Diaz H. et al., Taenia solium larva/adult ESTs, 2007). In this study, we use T. solium ESTs combined with the Basic Local Alignment Search Tool (BLAST) and protein mapping to supercontigs of E. granulosus (a member of the Taeniidae family) to investigate whether we could increase the number of T. solium metacestode excretion/secretion protein identifications from the previous study.
Materials and methods
Generation of the data set
The in vitro production of the T. solium metacestode excretion/secretion proteins from Peru and Zambia at 24h and 48h and the generation of line spectra mzXML files have been previously described .
Database design and data analysis
To construct the target database, 30,700 expressed sequence tags were downloaded from the National Center for Biotechnology Information (NCBI) website in April 2012 and a six frame translation was performed using transeq . A Sus scrofa database with 1,388 Swiss-Prot sequences (http://www.uniprot.org/ webcite) and the common Repository of Adventitious Proteins database (112 protein sequences; http://ftp.thegpm.org/fasta/cRAP/crap.fasta webcite) were also included to assist detection of host proteins and accidental contaminations, respectively. A decoy database with 185,700 reversed sequences was created using decoyfasta. These databases were fused into one final database. Database searching with X!Tandem (2010.10.01.1)  and subsequent analyses with PeptideProphet [14,15], iProphet  and ProteinProphet  were also performed as previously described . All above mentioned tools, except transeq, are included with the Trans-Proteomic Pipeline v4.5 RAPTURE rev 2 . The identified translated ESTs were further filtered to a false discovery rate of < 1% and ESTs with an individual probability of zero were discarded. The remaining ESTs were blasted against the NCBI nonredundant database (E-value < 10 −10) and for each recognized EST, the best matching protein was retained. The resulting proteins were then screened by mapping the proteins to the E. granulosus supercontigs using TBLASTN (http://www.sanger.ac.uk/cgi-bin/blast/submitblast/Echinococcus webcite). Identifications with a Score > 200 were considered valid. Identifications with a lower score were manually evaluated and proteins originating from T. solium were retained. This step also helped to filter out host contaminations. Finally, proteins were grouped based on homology. All proteins that could not be grouped and were identified by only one EST were also discarded. Finally, Blast2GO was used for Gene Ontology (GO) annotations (biological process, molecular function and cellular component) and the construction of level 2 pie charts . In order to gain more specific information, the largest categories were analyzed to levels 3 and 4.
Results and discussion
Identified proteins and gene ontology annotation
In this study, 297 proteins (from 1,787 translated ESTs) were identified and organized in 106 protein groups based on homology (Additional file 1). For simplicity, each protein group is represented by one protein. The groups were further organized by Gene Ontology annotation information on biological process and molecular function. A total of 48 protein groups are labelled with an asterisk, indicating that they were also identified in the previous study (Additional file 2) . For brevity, Table 1 shows only the 58 newly identified protein groups. For a number of proteins/protein groups, no Gene Ontology information was available. Nonetheless, many of them, like the 8 kDa protein family , have been extensively studied and used in diagnostic assays.
Additional file 1. List of all 297 proteins identified in this study, grouped based on homology, including the 1,787 translated ESTs that are linked to those proteins as well as the protein that represents each group and the TBLASTN scores of the queries to the Echinococcus granulosus supercontigs.
Format: PDF Size: 172KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 2. Protein groups (n = 106) identified in Taenia solium metacestode excretion/secretion proteins, organized by Gene Ontology annotation information on biological process and molecular function. Groups marked with an asterisk have been identified in the previous analysis as well. For simplicity, all protein groups are represented by one protein.
Format: PDF Size: 93KB Download file
This file can be viewed with: Adobe Acrobat Reader
Table 1. Protein groups (n = 58) newly identified in Taenia solium metacestode excretion/secretion proteins, organized by Gene Ontology annotation information on biological processes and molecular functions
Most of the identified protein groups could be categorized in miscellaneous binding activities (e.g. Actin binding, calcium binding and metal ion binding), various metabolic processes, gluconeogenesis (Triosephosphate isomerase, Enolase, Phosphoenolpyruvate carboxykinase and Phosphoglucose isomerase), glycolysis (Glyceraldehyde-3-phosphate dehydrogenase, Phosphoglycerate kinase, Phosphoglycerate mutase and Fructosebisphosphate aldolase) and proteins with (endo) peptidase activity, including cysteine-type (Calpain, UDP-glucose 4-epimerase and Cathepsin), threonine-type (Proteasome subunits) and serine-type endopeptidase activity (Trypsin-like protein). Endopeptidase inhibitors with both serine-type (Kunitz protein 8 and Leukocyte elastase inhibitor) and cysteine-type endopeptidase inhibitor activity (Immunogenic protein Ts11) and components of the enzymatic antioxidant system of Taeniidae (Cu/Zn Superoxide dismutase, Glutathione S-transferase and Peroxiredoxin) were also identified .
Gene Ontology level 2 pie charts were created for biological process (Figure 1A), molecular function (Figure 1B) and cellular component (Figure 1C). To avoid overly busy charts, the sequence filter was set to 10. The two largest categories of the biological process chart were cellular and metabolic processes. Others included biological regulation, response to stimulus, multicellular organismal processes and cellular component organization or biogenesis. Further investigation of the general cellular and metabolic processes revealed primary and cellular metabolic processes at level 3 and protein, cellular macromolecule and cellular nitrogen compound metabolic processes at level 4 (Additional file 3, tab 1). Molecular function was clearly divided between binding and catalytic activity. GO level 3 showed protein binding and hydrolase activity while level 4 entailed mostly nucleotide binding, hydrolase activity (acting on acid anhydrides), cation binding, peptidase activity, cytoskeletal and identical protein binding (Additional file 3, tab 2). The level 2 pie chart for the cellular component indicated cell and organelle as the largest categories. Further analyses showed mostly cell part and membrane-bound organelle, and intracellular (part) GO terms at levels 3 and 4, respectively (Additional file 3, tab 3). Human Keratin and porcine Trypsin were identified in all samples. As Keratin is a common contamination and Trypsin was deliberately added during the LC-MS/MS experiments, both were omitted from the final results.
Figure 1. Gene Ontology level 2 pie charts displaying the biological processes (A), the molecular functions (B) and the cellular components (C) of the 297 proteins that were identified in the Taenia solium metacestode excretion/secretion proteins. Values within parentheses are the number of sequences associated with each Gene Ontology term. The biological processes are mostly metabolic and cellular processes, while the molecular functions are predominantly catalytic activity and binding. The cellular components reveal a number of intercellular proteins. All charts were created using Blast2GO with the sequence filter set to 10.
Additional file 3. Gene Ontology information on biological process (tab 1), molecular function (tab 2) and cellular component (tab 3) including graph levels, GO terms, number of sequences (#Seq), node scores and parents.
Format: XLS Size: 441KB Download file
This file can be viewed with: Microsoft Excel Viewer
The presence of intracellular/non-secreted proteins in the ESPs is interesting and has been observed in other ESP studies before [22,23]. Although it is highly likely that the majority of those proteins are indeed excreted or secreted by the parasite, the possibility that they are the result of leakage due to cyst damage or death should not be excluded.
In general, the findings reported in this study are comparable to recent studies on other helminth genera like Echinococcus, Schistosoma and Clonorchis, indicating that excretion/secretion proteomes are not very different between helminth genera/species.
Comparison between the two studies
When comparing the level 2 GO terms identified in both studies (Table 2), all GO terms from the previous study were identified here as well. Additionally, we identified 6 new GO terms with the EST analyses: rhythmic process (GO:0048511), antioxidant activity (GO:0016209), molecular transducer activity (GO:0060089), protein binding transcription factor activity (GO:0000988), receptor activity (GO:0004872) and synapse (GO:0045202). Although a direct comparison between numbers should be avoided (due to proteins having multiple GOs and the presence of homologous proteins in the proteins groups, especially in the previous study where it is a logical result of the target database construction), the general levels of abundance (= proteins in each GO term) are largely comparable between the two studies e.g. in both studies, cellular process, metabolic process and biological stimulation are the largest groups for ‘biological process’ while binding and catalytic activity are the largest groups for ‘molecular function’ and cell and organelle are the largest groups for ‘cellular component’. The 6 new GO terms were identified by a very small number of proteins and may be a result of proteins being linked to multiple GO terms. This is supported by the fact that the proteins linked to these GO terms are homologous to other proteins identified in both studies, so none of these GO terms was identified by a ’new’ protein group.
Table 2. Gene Ontology level 2 annotations identified in this study alongside the ones identified in the previous study
In this study, we have used a library of translated ESTs combined with BLAST and mapping strategies not only to confirm previously identified T. solium metacestode excretion/secretion proteins, but to identify several new proteins as well, thereby effectively increasing the overall number of protein identifications.
The larger and more complete the EST database, the better proteomic coverage likely obtained. No ESTs from other Taeniidae were used in this study, since the available T. solium ESTs were already a merge of EST submissions by different groups and were therefore likely to offer decent proteome coverage. However, in cases where only a small EST library is available with low coverage, one could also include protein sequences and/or ESTs from related organisms in a combined database. This may be particularly advantageous in proteomic studies on less studied, unsequenced, organisms. It should be noted that research on non-sequenced organisms mostly relies on homology to already existing proteins from other (preferably closely related) organisms. Therefore, there is no possibility of finding unique proteins, unless (i) de novo sequencing is performed on the good quality unmatched experimental spectra or (ii) ESTs that were identified by spectra but remained unmatched during BLAST are further investigated.
Finally, it is important to realize that, although the mapping to the E. granulosus supercontigs helped to remove S. scrofa host proteins (e.g. Albumin, Protegrin and Hemopexin), some may still be present. Heat shock protein 70, for example, is identified both in S. scrofa and E. granulosus.
In future T. solium work, it is sensible to make use of the T. solium genome sequence that was recently published . However, since no curated protein database or convenient mapping solution is currently available and, for many other helminths, no complete genome sequence is available, the method described here is still valid.
Availability of supporting data
BLAST: Basic Local Alignment Search Tool; ESPs: Excretion/Secretion Proteins; ESTs: Expressed Sequence Tags; GO: Gene Ontology; LC-MS/MS: Liquid Chromatography and tandem Mass Spectrometry; NCBI: National Center for Biotechnology Information.
The authors declare that they have no competing interests.
BV carried out the LC-MS/MS experiments and the data analyses and drafted the manuscript. AMD and MP supervised the LC-MS/MS experiments and initial bioinformatic efforts. SG, PD, KP, KK designed the study. JL participated in the analysis of the ESTs. All authors have participated in the manuscript preparation. All authors read and approved the final manuscript.
The authors thank the University of Zambia, School of Veterinary Medicine (Zambia) and the Universidad National Mayor de San Marcos (Lima, Peru) for assistance and use of facilities, and Hans Dalebout (LUMC, Leiden, The Netherlands) for technical support.
The research leading to these results has received funding from The Research Foundation - Flanders (FWO) (project number: G.0192.10N) and the European Union’s Seventh Framework Program (FP7/2007-2013) under grant agreement no. 221948 (ICONZ). The contents of this publication are the sole responsibility of the authors and do not necessarily reflect the views of the European Commission.
García HH, Gilman RH, Gonzalez AE, Verastegui M, Rodrìguez S, Gavidia CM, Tsang VC, Falcon N, Lescano AG, Moulton LH, Bernal T, Tovar M, Cysticercosis Working Group in Perú: Hyperendemic human and porcine Taenia solium infection in Perú.
Phiri IK, Ngowi H, Afonso S, Matenga E, Boa M, Mukaratirwa S, Githigia S, Saimo M, Sikasunge C, Maingi N, Lubega GW, Kassuku A, Michael L, Siziya S, Krecek RC, Noormahomed E, Vilhena M, Dorny P, Willingham 3rd AL: The emergence of Taenia solium cysticercosis in Eastern and Southern Africa as a serious agricultural problem and public health risk.
Tsai IJ, Zarowiecki M, Holroyd N, Garciarrubio A, Sanchez-Flores A, Brooks KL, Tracey A, Bobes RJ, Fragoso G, Sciutto E, Aslett M, Beasley H, Bennett HM, Cai J, Camicia F, Clark R, Cucher M, De Silva N, Day TA, Deplazes P, Estrada K, Fernández C, Holland PWH, Hou J, Hu S, Huckvale T, Hung SS, Kamenetzky L, Keane JA, Kiss F, Koziol U, Lambert O, Liu K, Luo X, Luo Y, Macchiaroli N, Nichol S, Paps J, Parkinson J, Pouchkina-Stantcheva N, Riddiford N, Rosenzvit M, Salinas G, Wasmuth JD, Zamanian M, Zheng Y, Cai X, Soberón X, Olson PD, Laclette JP, Brehm K, Berriman M, The Taenia solium Genome Consortium: The genomes of four tapeworm species reveal adaptations to parasitism.
Yatsuda AP, Krijgsveld J, Cornelissen AWCA, Heck AJR, de Vries E: Comprehensive analysis of the secreted proteins of the parasite Haemonchus contortus reveals extensive sequence variation and differential immune recognition.
Millares P, Lacourse EJ, Perally S, Ward DA, Prescott MC, Hodgkinson JE, Brophy PM, Rees HH: Proteomic profiling and protein identification by MALDI-TOF mass spectrometry in unsequenced parasitic nematodes.
Almeida CR, Stoco PH, Wagner G, Sincero TCM, Rotava G, Bayer-Santos E, Rodrigues JB, Sperandio MM, Maia AAM, Ojopi EPB, Zaha A, Ferreira HB, Tyler KM, Dávila AMR, Grisard EC, Dias-Neto E: Transcriptome analysis of Taenia solium cysticerci using Open Reading Frame ESTs (ORESTES).
PLoS Neglected Trop Dis 2010, 4(12):e919. Publisher Full Text
Shteynberg D, Deutsch EW, Lam H, Eng JK, Sun Z, Tasman N, Mendoza L, Moritz RL, Aebersold R, Nesvizhskii AI: iProphet: Multi-level integrative analysis of shotgun proteomic data improves peptide and protein identification rates and error estimates.
Hancock K, Khan A, Williams FB, Yushak ML, Pattabhi S, Noh J, Tsang VC: Characterization of the 8-kilodalton antigens of Taenia solium metacestodes and evaluation of their use in an enzyme-linked immunosorbent assay for serodiagnosis.
Mulvenna J, Sripa B, Brindley PJ, Gorman J, Jones MK, Colgrave ML, Jones A, Nawaratna S, Laha T, Suttiprapa S, Smout MJ, Loukas A: The secreted and surface proteomes of the adult stage of the carcinogenic human liver fluke Opisthorchis viverrini.
Zheng M, Hu K, Liu W, Hu X, Hu F, Huang L, Wang P, Hu Y, Huang Y, Li W, Liang C, Yin X, He Q, Yu X: Proteomic analysis of excretory secretory products from Clonorchis sinensis adult worms: molecular characterization and serological reactivity of a excretory-secretory antigen-fructose-1,6-bisphosphatase.