PChopper: high throughput peptide prediction for MRM/SRM transition design

Afzal, Vackar; Huang, Jeffrey T-J; Atrih, Abdel; Crowther, Daniel J

doi:10.1186/1471-2105-12-338

Software
Open access
Published: 15 August 2011

PChopper: high throughput peptide prediction for MRM/SRM transition design

Vackar Afzal^1,2,
Jeffrey T-J Huang^1,3,
Abdel Atrih^1,2 &
…
Daniel J Crowther^1,3

BMC Bioinformatics volume 12, Article number: 338 (2011) Cite this article

5341 Accesses
4 Citations
1 Altmetric
Metrics details

Abstract

Background

The use of selective reaction monitoring (SRM) based LC-MS/MS analysis for the quantification of phosphorylation stoichiometry has been rapidly increasing. At the same time, the number of sites that can be monitored in a single LC-MS/MS experiment is also increasing. The manual processes associated with running these experiments have highlighted the need for computational assistance to quickly design MRM/SRM candidates.

Results

PChopper has been developed to predict peptides that can be produced via enzymatic protein digest; this includes single enzyme digests, and combinations of enzymes. It also allows digests to be simulated in 'batch' mode and can combine information from these simulated digests to suggest the most appropriate enzyme(s) to use. PChopper also allows users to define the characteristic of their target peptides, and can automatically identify phosphorylation sites that may be of interest. Two application end points are available for interacting with the system; the first is a web based graphical tool, and the second is an API endpoint based on HTTP REST.

Conclusions

Service oriented architecture was used to rapidly develop a system that can consume and expose several services. A graphical tool was built to provide an easy to follow workflow that allows scientists to quickly and easily identify the enzymes required to produce multiple peptides in parallel via enzymatic digests in a high throughput manner.

Background

Selective reaction monitoring-mass spectrometry (SRM-MS) has become a key proteomics technology. It is used in the quantification of post-translational modifications, discrimination of homologous protein isoforms and often as the final step in biomarker discovery. A typical SRM assay consists of two parts, the first involves selecting enzymes that can produce peptides with some target characteristics, and the second involves experimental testing to verify the predictions from the first phase. The manual processes associated with the first phase often makes it prohibitively time-consuming to manually identify the optimal enzyme to give best peptide characteristics and SRM transitions for mass spectrometry, especially if there are multiple protein targets involved. In response to this, a number of software tools have been developed to assist with this process [1–4]. A further in depth review of current software has been performed in [5].

In more complex situations such as quantification of post-translational modifications, there are often multiple target sites on multiple proteins of interest and it is at this point that the limitations of existing software solutions become apparent, and indeed fall short of what is required. In this publication, we shall present PChopper, which has been developed to aid in SRM-assay design with a focus on studies investigating protein phosphorylation stoichiometry, although the tool can be used to support batch SRM-assay design for any study. PChopper is not limited exclusively to trypsin based digests in comparison with most currently available software solutions. PChopper can simulate digests involving a single enzyme, or any combination of two supported enzymes. Each digest can also be parameterised with the target characteristics required of the resultant peptides. Digests can be performed in batch mode, and the output from each digest can be combined into a single dashboard for export.

Implementation

Architecture

PChopper utilises a Service Oriented Architecture (SOA) [6] to consume and expose several services. This allows for rapid development since several core services are immediately available with no internal maintenance or development overhead (additional SOA benefits are outlined elsewhere [7, 8]). However the use of a service oriented architecture is not without caveats; it creates external system dependencies that PChopper must rely on, but cannot control. Despite this drawback, a service oriented approach was adopted as the benefits outweighed the risks. PChopper also exposes two application endpoints. The first is a graphical user interface that provides an easy to follow workflow for running simulated digests and the second is an API-based programmatic endpoint that allows other developers to make use of the PChopper engine programmatically. Figure 1 provides an overview of the system architecture.

Workflow

PChopper provides a web based graphical interface, with an easy to follow workflow for running simulated digests. The workflow begins by specifying the name of the experiment. PChopper uses the term 'experiment' to describe the sequence that is to be digested, and the desired characteristics of the resultant peptides. For example, an experiment may involve a digest of AKT1, targeting phosphorylation sites at positions 473 and 308 so might be named 'AKT1 - S473, T308'. Once an experiment has been added, the user is prompted for a gene/protein name. This search term is then passed to the PhoshpoELM web-service as shown in Figure 2. The web-service then returns a list of matching entries, or an empty result if the search term could not be mapped to a gene/protein. For unsuccessful searches users are shown a popup stating that no search results could be found, and are prompted to search using a different term. For successful searches users are presented with a list of potential matches and are asked to select the correct entry based on the additional information that the search yielded. When the user has selected an entry, the amino acid sequence for the selected entry is displayed and the user can progress to the next step in the workflow (see Figure 3). The second step in the workflow involves asking the user to select the sites within the sequence they would like to target. This would typically be used for selecting regions within the sequence that are of interest, or sites within the sequence with post translational modifications that are of interest. Users have the option of selecting these manually and additionally PChopper can automatically identify known phosphorylation sites for human and mouse sequences. This automated process identifies all known phosphorylation sites, and the user can simply remove sites that are not of interest (see Figure 4). The third step in the workflow involves asking the user to specify any additional characteristics of the resultant peptides (length, exclusion criterion) and additional digest parameters. Users can adjust these based on their own requirements, or they can simply select the default settings and run the digest (see Figure 5). Once a digest has been performed, users are presented with the results in a matrix format (see Figure 6). Detailed information on each of the resultant peptides is also available on the peptide details tab (see Figure 7). This workflow can then be repeated for multiple proteins, and the results can be combined from the 'Advanced Options' screen. (see Figure 8 and 9).

Result Formats

Once a simulated digest has been run, users are presented with an enzyme versus target site matrix. Each entry within the matrix shows the peptide that was produced by an enzyme for a specific target site. Additional details are also available for each of the resultant peptides. These include:

1.
The starting position of the peptide within the sequence
2.
The end position of the peptide within the sequence
3.
The length of the peptide
4.
The predicted charge state
5.
The % of hydrophobic amino acids
6.
The mass of the phospho-peptide
7.
The mass of the non phospho-peptide
8.
The predicted m/z ratio of the phospho-peptide
9.
The predicted m/z ratio of the non phospho-peptide
10.
The predicted retention time of the peptide (via the API)

In situations where users would like to monitor multiple sites on multiple proteins, it is useful to know the enzyme (or combination of enzymes) that are required to produce peptides with the required characteristics. In large studies this is especially true. PChopper's advanced results combination engine allows results from multiple digests to be combined into a single detailed summary view. From this view users can quickly identify the enzymes that can or cannot be used to target specific sites of interest. Users can then manually select/deselect enzymes, and export the combined results in csv (spreadsheet compatible) format. Additionally PChopper can automatically identify the most appropriate combination of enzymes and present this to the user in the form of a summarised datasheet. An additional datasheet is available as an export option, which provides full details on the digest, the protein/sequence that was digested, the enzymes that yielded peptides and the details of each of the peptides produced.

Implementations Details

PChopper was developed as a Java application consisting of three distinct modules. Module 1 is responsible for running simulated digests and has no external dependencies other than the Java runtime environment. This has the advantage of cleanly separating the core business logic from any presentation or interaction logic. To run simulated digests, the module requires a protein sequence and a set of parameters describing the characteristics of the final peptide sequences. The system then 'digests' the sequence using the system's supported enzymes. The combination of a protein sequence and its digest parameters is called an 'experiment' and PChopper has the capability of running multiple experiments to identify suitable enzymes for use in monitoring multiple sites in multiple proteins.

PChopper makes use of PeptideCutter's digest predictions, and stores them in a redefined XML format. PeptideCutter [2] is a web based tool from the ExPASy Proteomics Server that can predict potential cleavage sites caused by proteases and chemicals. When running a simulated digest, known digest cleavage patterns for 34 supported enzymes as defined by PeptideCutter are loaded from an XML file. The XML file stores the patterns as regular expressions as shown in Figure 10. Defining the patterns in this manner allows for separation of the patterns from the pattern processing engine, making the patterns easier to update and extend with new patterns as and when they become available. The patterns are applied by running a regular expression match of each cleavage pattern against the sequence being processed to identify the start of a pattern match. To determine the actual location of a cleavage site, the DistanceToCleavagePoint is added to the start position of the regular expression match index i.e. for the regular expression WKP, a distance of zero would define the cleavage as occurring before the W, a distance of 1 would define it as occurring between W and K, and so on. Once the cleavage sites are known, the peptides are defined as the amino acid sequences occurring between any two consecutive sets of identified cleavage sites, or between the first/last cleavage site and the beginning/end of the protein sequence. These peptides are then filtered based on the criterion specified by the user and presented as the output of the core module. Examples of filter criterion available in PChopper are presented in Table 1. The reasoning behind these filter criterion are described in [9].

Table 1 Available filters and parameters for simulated digests

Full size table

The second module has been developed as a search library whose primary role is to provide protein sequences and corresponding phosphorylation sites as parameters to Module 1. In keeping with the SOA theme, this module makes use of an existing search service, and wraps several of the methods behind an internal façade and makes them available via a simple Java interface. The service is provided by Phospho.Elm [10], which is a publicly available database of experimentally verified phosphorylation sites. It was chosen due to its wide usage [11, 12], acclaimed accuracy [13–15] and because it exposes a web service [16]. It is also worth noting that Phospho.Elm is commonly used as a baseline for testing other phosphorylation prediction methods [14, 11, 17]. Figure 2 illustrates the information flow associated with this part of the system.

The third module has been developed as an interaction module to hide the complexities of interfacing Module 1 with the Module 2. This module has been designed in two parts, one focussing on human interactions and the other focussing on machine/programmatic interactions. For programmatic interactions a REST-based application end point was developed [18, 19] which interfaces and wraps the methods available from modules one and two, allowing them to be invoked via simple http requests. For example, a GET request to the URL protein/akt1/digest results in the system invoking a simulated digest for AKT1, with the results being returned as an XML report. Details of the additional advantages of REST-based architectures are described in [8, 19, 20]. For a full list of available REST methods provided by PChopper, see Tables 1, 2, 3 and 4. For human interactions, a Flex based application endpoint was developed to provide a simple and intuitive system interface. The Flex GUI endpoint allows for a rich web-based solution that eliminates the need for client side installations and dependencies on natively installed software libraries. Since Flex compiles to Flash, it ensures the highest possible accessibility when compared to other rich browser-based plugins. The use of Flash as a runtime environment also eliminates the traditional problems associated with developing a web based system, such as having to account for differences in how browsers interpret and execute HTML and JavaScript functions. However, Flash inhibits the use of PChopper on some tablet PCs as there is currently limited support for Flash. Another limitation of Flash is that it cannot be easily indexed by search engines such as Google. While deep linking can be utilised to allow Flash content to be indexed, it is not a concern for PChopper as the applications 'states' do not require indexing..

Table 2 REST: Obtaining protein information

Full size table

Table 3 REST: Obtaining a protein sequence

Full size table

Table 4 REST: Running a simulated digest

Full size table

Results

To demonstrate the capabilities of PChopper, we provide an example where monitoring of 52 phosphorylation sites in nine proteins (AKT1, AKT2, AKT3, GSK3α, GSK3β, FOXO1, TSC2, MAPK3, IRS1) is required. This would be a typical study where the phosphorylation sites of multiple enzymes in a signalling pathway need to be analysed in parallel and where we believe existing software would struggle to provide a simple solution. The proteins were analysed using experiments with the following parameters:

No 'M' or 'C' in final peptides
Peptide length between 5 and 30
Ignore cleavages next to phosphorylation sites: True
Only include results with all sites: False

The results of these nine experiments were presented to the user in the web-based viewer, and it allowed them to quickly and easily view the results from the nine experiments, and also to combine the results from the nine individual experiments in a single unified summary view. Additionally users can selectively export datasheets for additional information on each of the simulated experiments. Features of the single/combined results and the datasheets are outlined below.

Single Digest Results

The results for any particular digest are presented immediately after a digest is completed. The results screen shows a list of enzymes, and the peptides that can be produced for each of the target sites. By scanning along a particular row in this table, it is very easy to identify the enzyme (or combination of enzymes) that are required to produce peptides for each of the required target sites (see Figure 6). A tab with further peptide details allows users to view the properties of each of the predicted peptides (see Figure 7).

Combined Digest Results

PChopper can combine the results from multiple experiments into a single unified view. This view lists all proteins and their associated target sites, and maps these against the list of enzymes that were used to produce a selection matrix (see Figure 8). This matrix uses colour coding to help easily identify enzymes that can (or cannot) be used to produce a peptide containing a particular target site. A green box labelled 'Y' is used to indicate that an enzyme was able to produce a peptide which included the target site, and a red box labelled 'N' is used to indicate that the enzyme was not able to produce a peptide with the target site. Users can then select and de-select enzymes and export these as a CSV report. The CSV report reconfigures the data to group the results by enzyme, making it easier to see the enzymes that can be used to target specific sites of interest. Figure 8 shows the complete matrix, Figure 9 shows the cut down matrix.

Datasheets

The details of each experiment can be downloaded as a datasheet. The datasheet contains additional information not included in the summary CSV file. For each simulated experiment the datasheet contains the following metadata used for the simulated digest:

The name of the experiment
The search term that was used to find the protein sequence
The name of the matched protein that was used to retrieve the sequence
The fragment filter criterion
The peptide length criterion
The sequence of the target protein, with the phosphorylation sites highlighted
A list of all enzymes that yielded peptides that had the required characteristics.

The datasheets can be downloaded as a PDF report, and saved for future reference. Additional file 1 and additional file 2 are the datasheets associated with this series of experiments.

Retention time calculations

Some scientists utilise retention time predictions in the prediction of SRM candidates. A challenge is that while tools are available to predict retention times for tryptic peptides, we are not aware of a tool which robustly predicts retention time for peptides including post-translational modifications, a key focus of PChopper.

At this point we have not implemented a retention time prediction algorithm in the GUI of PChopper, but we have made available the method published by Palmblad et al though the API [21]. Retention time prediction is generated as a property of each predicted peptide (see table 4). It should be noted that this method makes assumptions about the experimental conditions which may not be universally applicable.

Conclusions

PChopper was developed to assist with designing studies for SRM-based protein phosphorylation analysis. While it includes features that are specific to phosphorylation, it is not constrained solely to digests involving this post-translational modification. PChopper can be used to target other post-translational modifications (that the user would have to enter manually) or simply to target regions within a protein sequence that are of interest. This can be done using a single enzyme, or with combinations of multiple enzymes. It was implemented using SOA architecture to produce a tool that is capable of quickly and easily predicting suitable enzymes and resulting peptides for SRM experiments. While there are other systems available such as MRMaid, PeptideCutter, SkyLine, ATAQS PChopper is unique from these. MRMaid does not include support for phosphopeptides as it actively filters out peptides with mass-altering post-transcriptional modifications. PeptideCutter can predict cleavage sites for enzymatic digests, but it lacks the ability to highlight peptides with phosphorylated amino acids. Skyline provides a complete end to end design workflow for SRM, but it is implemented using Microsoft's .Net client framework, making it inaccessible to platforms that cannot run .Net client applications, in comparison PChopper is fully web based. Similarly ATAQS does provide a complete end to end design workflow and additionally provides an application programming interface, however it is non-declarative and is bound to the implementation technologies; in comparison PChopper's programmatic access is declarative and is programming language agnostic.

Availability and requirements

Project name: PChopper
Project home page: http://pchopper.lifesci.dundee.ac.uk
Operating system(s): Platform independent
Programming language: Java, Flex
Other requirements: Web Browser with Flash player 10
License: GPL
Any restrictions to use by non-academics: None

References

Mead Ja, Bianco L, Ottone V, Barton C, Kay RG, Lilley KS, Bond NJ, Bessant C: MRMaid, the web-based tool for designing multiple reaction monitoring (MRM) transitions. Molecular & cellular proteomics: MCP 2009, 8: 696–705. 10.1074/mcp.M800192-MCP200
Article PubMed Central CAS Google Scholar
Walke JM: The Proteomics Protocols Handbook. Humana Press; 2005:571–607.
Book Google Scholar
MacLean B, Tomazela DM, Shulman N, Chambers M, Finney GL, Frewen B, Kern R, Tabb DL, Liebler DC, MacCoss MJ: Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics (Oxford, England) 2010, 26: 966–8. 10.1093/bioinformatics/btq054
Article CAS Google Scholar
Brusniak M-YK, Kwok S-T, Christiansen M, Campbell D, Reiter L, Picotti P, Kusebauch U, Ramos H, Deutsch EW, Chen J, Moritz RL, Abersold R: ATAQS: A computational software tool for high throughput transition optimization and validation for selected reaction monitoring mass spectrometry. BMC Bioinformatics 2011, 12: 78. 10.1186/1471-2105-12-78
Article PubMed Central PubMed Google Scholar
Cham J, Bianco L, Bessant C: Free computational resources for designing selected reaction monitoring transitions. Proteomics 2010, 10: 1106–1126. 10.1002/pmic.200900396
Article CAS Google Scholar
Papazoglou MP, Georgakopoulos D: Service -oriented computing. Communications of the ACM 2003, 46: 24–28.
Article Google Scholar
OReilly T: What is Web 2.0: Design patterns and business models for the next generation of software.2005. [http://papers.ssrn.com]
Google Scholar
Schroth C, Janner T: Web 2.0 and SOA: Converging Concepts Enabling the Internet of Services. IT Professional 2007, 9: 36–41.
Article Google Scholar
Anderson L, Hunter CL: Quantitative Mass Spectrometric Multiple Reaction Monitoring Assays for Major Plasma Proteins. Mol Cell Proteomics 2006, 5: 573–88.
Article CAS PubMed Google Scholar
Diella F, Cameron S, Gemünd C, Linding R, Via A, Kuster B, Sicheritz-Ponten T, Blom N, Ginson TJ: Phospho.ELM: a database of experimentally verified phosphorylation sites in eukaryotic proteins. BMC bioinformatics 2004, 5: 79. 10.1186/1471-2105-5-79
Article PubMed Central PubMed Google Scholar
Lee T-Y, Huang H-D, Hung J-H, Huang H-Y, Yang Y-S, Wang T-H: dbPTM: an information repository of protein post-translational modification. Nucleic acids research 2006, 34: D622-D627. 10.1093/nar/gkj083
Article PubMed Central CAS PubMed Google Scholar
Davey NE, Edwards RJ, Shields DC: Estimation and efficient computation of the true probability of recurrence of short linear protein sequence motifs in unrelated proteins. BMC bioinformatics 2010, 11: 14. 10.1186/1471-2105-11-14
Article PubMed Central PubMed Google Scholar
Gould CM, Diella F, Via A, Puntervoll P, Gemünd C, Chabanis-Davidson S, Michael S, Sayadi A, Bryne JC, Chica C, Seiler M, Davey NE, Haslam N, Weatheritt RJ, Budd A, Hughes T, Pas J, Rychlewski L, Trave G, Aasland R, Helmer-Citterich M, Linding R, Gibson TJ: ELM: the status of the 2010 eukaryotic linear motif resource. Nucleic acids research 2010, 38: D167-D180. 10.1093/nar/gkp1016
Article PubMed Central CAS PubMed Google Scholar
Dang TH, Van Leemput K, Verschoren A, Laukens K: Prediction of kinase-specific phosphorylation sites using conditional random fields. Bioinformatics (Oxford, England) 2008, 24: 2857–64. 10.1093/bioinformatics/btn546
Article CAS Google Scholar
Xue Y, Ren J, Gao X, Jin C, Wen L, Yao X: GPS 2.0: Prediction of kinase-specific phosphorylation sites in hierarchy. Mol Cell Proteomics 2008, 7: 1598–1608. 10.1074/mcp.M700574-MCP200
Article PubMed Central CAS PubMed Google Scholar
Diella F, Gould CM, Chica C, Via A, Gibson TJ: Phospho.ELM: a database of phosphorylation sites - update 2008. Nucleic Acids Research 2008, 36: D240-D244.
Article PubMed Central CAS PubMed Google Scholar
Zhou FF, Xue Y, Chen GL, Yao X: GPS: a novel group-based phosphorylation predicting and scoring method. Biochemical and Biophysical Research Communications 2004, 325: 1443–1448. 10.1016/j.bbrc.2004.11.001
Article CAS PubMed Google Scholar
Battle R, Benson E: Bridging the semantic Web and Web 2.0 with Representational State Transfer (REST). Web Semantics: Science, Services and Agents on the World Wide Web 2008, 6: 61–69. 10.1016/j.websem.2007.11.002
Article Google Scholar
Fielding RT, Taylor RN: Principled design of the modern Web architecture. ACM Transactions on Internet Technology (TOIT) 2002, 2: 115–150. 10.1145/514183.514185
Article Google Scholar
Goth G: Critics Say Web Services Need a REST. IEEE Distributed Systems Online 2004, 5: 1–1.
Google Scholar
Palmblad M, Ramström M, Markides KE, Håkansson P, Bergquist J: Prediction of Chromatographic Retention and Protein Identification in Liquid Chromatography/Mass Spectrometry. Analytical Chemistry 2002, 74: 5826–5830. 10.1021/ac0256890
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This work was supported by the Translational Medicine Research Collaboration - a consortium made up of the Universities of Aberdeen, Dundee, Edinburgh and Glasgow, the four associated NHS Health Boards (Grampian, Tayside, Lothian and Greater Glasgow & Clyde), Scottish Enterprise and Pfizer.

The authors would like to thank Selcuk Bozdag and Tim Bath for comments on the manuscript. They would also like to thank the University of Dundee School of Life Sciences for hosting the application. DC and JH were employed by Pfizer while the research was completed. DC is now employed by Sanofi Aventis. Finally they would like to acknowledge Erick Ghaumez who designed the freely available 'Summer Sky' flex theme.

Author information

Authors and Affiliations

Translational Medicine Research Collaboration, Dundee, DD1 9SY, UK
Vackar Afzal, Jeffrey T-J Huang, Abdel Atrih & Daniel J Crowther
College of Life Sciences, University of Dundee, DD1 5EH, UK
Vackar Afzal & Abdel Atrih
Sanofi-Aventis, Industriepark Höchst, 65926 Frankfurt am Main, Germany
Jeffrey T-J Huang & Daniel J Crowther

Authors

Vackar Afzal
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey T-J Huang
View author publications
You can also search for this author in PubMed Google Scholar
Abdel Atrih
View author publications
You can also search for this author in PubMed Google Scholar
Daniel J Crowther
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel J Crowther.

Additional information

Authors' contributions

VA was the developer for the application. DC was the project manager for the system. JH and AA were involved in the requirements for the biological aspect of the system specification. All authors contributed to the final manuscript.

Electronic supplementary material

Additional file 1:Combined results from 9 experiments that target 52 sites. (CSV 42 KB)

Additional file 2:The datasheet for the 9 experiments. (PDF 117 KB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Authors’ original file for figure 8

Authors’ original file for figure 9

Authors’ original file for figure 10

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Afzal, V., Huang, J.TJ., Atrih, A. et al. PChopper: high throughput peptide prediction for MRM/SRM transition design. BMC Bioinformatics 12, 338 (2011). https://doi.org/10.1186/1471-2105-12-338

Download citation

Received: 02 March 2011
Accepted: 15 August 2011
Published: 15 August 2011
DOI: https://doi.org/10.1186/1471-2105-12-338

PChopper: high throughput peptide prediction for MRM/SRM transition design

Abstract

Background

Results

Conclusions

Background

Implementation

Architecture

Workflow

Result Formats

Implementations Details

Results

Single Digest Results

Combined Digest Results

Datasheets

Retention time calculations

Conclusions

Availability and requirements

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Authors' contributions

Electronic supplementary material

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us