Abstract
Background
Ion mobilitymass spectrometry (IMMS), an analytical technique which combines the features of ion mobility spectrometry (IMS) and mass spectrometry (MS), can rapidly separates ions on a millisecond timescale. IMMS becomes a powerful tool to analyzing complex mixtures, especially for the analysis of peptides in proteomics. The highthroughput nature of this technique provides a challenge for the identification of peptides in complex biological samples. As an important parameter, peptide drift time can be used for enhancing downstream data analysis in IMMSbased proteomics.
Results
In this paper, a model is presented based on least square support vectors regression (LSSVR) method to predict peptide ion drift time in IMMS from the sequencebased features of peptide. Four descriptors were extracted from peptide sequence to represent peptide ions by a 34component vector. The parameters of LSSVR were selected by a grid searching strategy, and a 10fold crossvalidation approach was employed for the model training and testing. Our proposed method was tested on three datasets with different charge states. The high prediction performance achieve demonstrate the effectiveness and efficiency of the prediction model.
Conclusions
Our proposed LSSVR model can predict peptide drift time from sequence information in relative high prediction accuracy by a test on a dataset of 595 peptides. This work can enhance the confidence of protein identification by combining with current protein searching techniques.
Background
Ion mobility spectrometry (IMS) has gained significant attentions over the past few decades for rapid, highresolution separations power, which can separate ions on a millisecond timescale [13]. As a separation technique which based on differences in size and shape of analytes, IMS has proven powerful in the fields of metabolomics, glycomics and proteomics [1,2]. Ion mobililtymass spectrometry (IMMS), an analytical technique by which IMS coupled with mass spectrometry (MS), have emerged as powerful tools for analyzing biological mixtures, especially for current proteomics studies [47]. By combination of the advantages of IMS and MS, IMMS opens up avenues for the detailed structural analysis of large and heterogeneous protein complexes, providing information on the stoichiometry, topology and cross section of their composition [8,9].
A typical proteomics experimental setup using IMMS consists of five components: sample introduction, compound ionization, ion mobility separation, mass separation as well as peptide and protein ion detection [10]. Although these five components all play essential roles in the process, ion mobility separation is crucial for its impact on the consequent mass analysis and peptide ion detection [11]. Ion mobility separation, by which the peptide ions with different crosssections and molecular charges will be separated, adds a new dimension of separation and makes IMMS an attractive method for analyzing complex proteomics samples. Peptide ion separation can be enhanced by changing different gases, altering electric field strengths, and adopting nonlinear electric field gradients, by which peptide identification can be facilitated to achieve high confidence [12]. Even though these efforts improve the separation capability of IMMS, they are still timeconsuming, and it is difficult to reproduce under different experimental conditions.
Although IMMS separates peptide ions based on differing crosssections and molecular charge, the experimental measurement behaves in the way that peptides spend different time through the drift tube. It has been reported that the measurement of peptide ion drift time using IMMS is very reproducible [1318]. Any two measurements of mobilities (or cross sections) recorded on the same instrument usually agree to within 1% relative uncertainty. Measurements performed by different groups usually agree to within 2%. As a characteristic of different ions, peptide ion drift time can be used to enhance confidence in protein identifications.
There are several efforts which attempt to computationally determine the mobile behaviour of peptide ions in IMS. Valentine et al. predict peptide ion cross sections using intrinsic size parameters (ISPs) and tested it on 271 singlycharged peptides [19]. A quantitative structureproperty relationship (QSPR) based approach was proposed for prediction of peptide drift time by Liu et al. and found the structure effect and the charge states of peptide ion contribute a lot to the drift time [20]. Shah et al. employed partial least squares (PLS) and support vector regression (SVR) based approaches to predict the drift time of massive peptide ions with different charge states and demonstrated both techniques significantly outperform the ISPs based calculation by a test on a high confidence database of 8,675 peptide sequences [21]. Zhang et al. presented a quantitative structurespectrum relationship (QSSR) study to predict peptide drift time and found the sequencebased approach can get better fitting ability and predictive power but worse interpretability than the structurebased approach [22]. Our previous works also attempted to address the same problem by employing artificial neural networks and multiply linear regression models [2325]. Although these studies contributed the drift time prediction of peptide ions a lot, ISP based calculations did not show the high performance in peptides with high charged states, and structurebased methods have to construct and optimize the geometrical structures of peptides which will bring inevitable errors into prediction models.
In this paper, a least squaresupport vectors regression (LSSVR) model is presented to predict peptide ion drift time in IMMS just from the sequencebased features of peptide. The sequence pattern of each peptide was represented as a 36component vector, which was consisted of for descriptors, i.e., molecular weight, sequence length, amino acid composition and pseudo amino acid composition. In construction of the LSSVR regression, a 10fold crossvalidation strategy was employed to determine the optimized values of the regression parameters. Our proposed LSSVR method was applied into three peptide ions datasets with different charge states, i.e., +1, +2, +3.
Results and discussion
In this work, all the raw data generated from the IMMS were processed using MassLynx V4.1, an instrument control software, to obtain the drift time for each peptide ion peak. MassLynx is a powerful software for analyzing and processing the data acquired from mass spectrometers which are developed Waters Corporation. The peptides generated from tryptic digestion of 20 pure proteins were used for our model development and testing in this study. Peptide charge status was manually assigned based on the m/z spacing between isotopic peaks. As a result, the total of 595 peptides assigned ions which came from the 20 proteins became the dataset for this work. Within this dataset, 212 peptides were singly charged, 306 were doubly and 77 were triply charged. More details can be found in our previous work [12,26].
IMS separate ions based on the fact ions with different shapes and charge states travel though the drift tube at different velocities. In the drift tube, the ions were pulled by a weak electric field and opposed by the inset buffer gas. The charge state is a very important factor for the drift time. Therefore, we developed the SVR models for singly, doubly and triplycharged peptides, respectively. In this work we denotes dataset of singlycharged peptides as DataS, doublycharged peptides as DataD, and triplycharged as DataT.
Table 1 shows the distributions of peptide molecular weight, sequence length and drift time in each of the three datasets. It can be seen that the smallest peptide just formed by 3 amino acids with singlycharge state, and the largest one have 34 amino acids from DataD and DataT, which indicate that peptides with large molecular weight and long amino acid sequences, tend to have high charge states. The peptide ion drift time is also significantly related to the overall ion charge state. The mean value of peptide drift time for the singlycharged peptides is 7.48 s while that of the doublycharged and the triplycharged peptides are 3.07 s and 2.28 s, respectively. The peptides with high charge states drift through the cell in a relative high velocity. Another reason is the higher charge states the peptide is, the higher probability that they form a 3dimensional spatial structure will be.
Table 1. Distribution of peptide molecular weight, sequence length and drift time in original datasets with different charge states
Prediction performance evaluation
In this study, we developed the LSSVR models for predicting peptide drift time for the singly, doubly, and triplycharged peptides, respectively. A 10fold crossvalidation strategy was employed in the training and test process of the regression models, by which all observations in each datasets are used for both training and validation. This crossvalidation can provide reliable learning of our model from the original data.
The purpose of this work is to predict ion drift time of peptides by elucidating the relationship between the dependent variable, i.e., peptide drift time, and the sequencebased peptide features we used, i.e., peptide molecular weight, sequence length, AAC and PseAAC. For regression analysis, there are many criteria by which they can be evaluated and compared. The root mean square error (RMSE) and coefficient of determination (R^{2}) are selected in this work to evaluate the predictive performance of our LSSVR models.
where n is the number of peptide in the dataset, dt is the experimentally observed peptide ion drift time, dt the predicted drift time by LSSVR models, is the overall average value of peptide drift time. R^{2 }takes any value between 0 and 1, with a value closer to 1 indicating the regression model is of better performance.
Furthermore, in order to assess the prediction accuracy of LSSVR models, a prediction variation threshold, η_{t}, was defined by the relative variation of the predicted drift time from the experimentally observed values. If the relative variation between observed and predicted drift time is smaller than η_{t}, the prediction will be seen as reliable, otherwise, unreliable.
Where η is the prediction variation, dt' is the predicted peptide ion drift time and dt is the experimentally observed peptide ion drift time.
Parameters selection
As what state in Methods part, LSSVR models with Gaussian kernel was adopted to predict peptides drift time. There are two important parameters for this kind of regression model, i.e., the width of Gaussian kernel parameter σ, and the regularization factor γ. The correct setting of these two parameters of the LSSVR models is of critical importance in enabling us to achieve good regression performances. In this work, the gridsearching scheme is used to determine these two parameters based on cross validation strategy. Specifically, the σ^{2 }and γ were tuned simultaneously in a grid ranging from 2^{5}, 2^{4}, ..., 2^{15 }for σ^{2 }and from 2^{5}, 2^{4}, ..., 2^{9 }for γ. The prediction accuracy of LSSVR models for each peptide dataset was seen as the objective function to determine the optimum combination of σ^{2 }and γ, where the value of η_{t }was set as 0.15.
The accuracy curves for different combination of the σ^{2 }and γ in the three peptide datasets were shown in the Figure 1. It can be seen that the regression performance of LSSVR models are heavily depend on the selection of the parameters σ^{2 }and γ. When γ is fixed, the prediction accuracy goes up with the increase of σ^{2 }to an apex and then goes down. For DataS, the top 5 prediction accuracy values correspond to the combinations [σ^{2}, γ] of [2^{10}, 2^{6}], [ 2^{11}, 2^{7}], [ 2^{12}, 2^{8}], [ 2^{13}, 2^{9}], and [2^{9}, 2^{5}]. The top 5 LSSVR models for DataD have the combination parameters of [2^{9}, 2^{5}], [2^{10}, 2^{6}], [ 2^{11}, 2^{7}], [ 2^{11}, 2^{8}], and [2^{9}, 2^{6}]. For the peptide dataset with triplycharge, DataT, the top 5 combinations are [2^{11}, 2^{8}], [2^{12}, 2^{9}], [2^{10}, 2^{7}], [2^{11}, 2^{8}], and [2^{12}, 2^{9}]. Overall the three datasets, the value [2^{11}, 2^{8}] can achieve the best prediction accuracy for the LSSVR models when η_{t }= 0.15. Therefore, the σ^{2 }of 2^{11 }and γ of 2^{8 }were selected for the subsequent analysis in this work.
Figure 1. Prediction accuracy curves of LSSVR models in three peptide ion datasets when η_{t }= 0.15, where σ^{2 }ranges from 2^{5 }to 2^{15 }and γ ranges from 2^{5 }to 2^{9}. (A) DataS, (B) DataD and (C) DataT.
Prediction performance
A 10fold cross validation was implemented in the construction of LSSVR models, by which the different separation of the original dataset will bring the changes of predicted drift time for each peptide. For evaluating the uncertainty in the regression performance of our model which come from the randomness of the dataset separations, the regression procedure was repeated for ten times. The mean of the prediction drift times for each peptide from these ten times experiments were taken as the finally predicted value. Also the variation of the ten times was studied to exam the stability of our proposed LSSVR models.
The prediction performance was shown in Table 2. It can be seen that our models ca achieved very good prediction ability for different peptide dataset, i.e., 0.9811 for DataS, 0.9379 for DataD, and 0.8312 for DataT. Comparing to DataS and DataD, the prediction accuracy of the triplycharge peptide ions in DataT is a little bit poor. One reason for this situation is that the dataset's size is small, i.e., 77 peptide in DataT, which can not provide sufficient information in the model training. Another reason, we believe, is that the charge state of DataT is higher than that of DataS and DataD, which usually cause the peptide longer. The mean length of peptides in DataT is 18.3, which is 1.4 times of that in DataD, and 2.3 times in DataS. The longer of the peptide length is, the more chance the peptide form the secondary structure will be. Obviously, the changes in space conformation will contribute the peptide's velocity in drift cell and therefore, affect the peptide ion's drift time.
Table 2. Prediction performance of LSSVR models under a variation threshold of 15% in three peptide ion's datasets
It can be found from Table 2 that the prediction accuracy from the mean of the predicted drift times is better than the mean accuracy of the ten repeat experiments. It can get 0.0075, 0.0039 and 0.0479 for DataS, DataD, and DataT, respectively, which indicated that the combination regression model will improve the predictive power of predictors. From Table 2, it can also be seen that the standard deviation of the prediction accuracy of the ten repeat experiments is very small, i.e., 0.081, 0.061 and 0.025 for the three datasets. It demonstrate our LSSVR models are stable and statistically valid because a small change in the data, such as the different split of the training and test dataset, may lead to large changes of the prediction performance.
The relative small RMSE and R^{2 }shown in Table 2 also indicted the powerful regression performance of LSSVR models in prediction of peptide ion's drift times in IMMS. We got very small RMSE values for DataD and DataT, and a little higher value, 0.52, for DataS, which is reasonable for the big range of the original drift time, from 2.17 s to 24.5 s. The R^{2 }values of around 0.97 for DataS and DataD, 0.87 for DataT are shown high correlation between the predicted and experimental observed peptide drift times. More details about the regression results can be found in Figure 2, where the line showed the linear fitting between the predicted and observed drift time in a leastsquares sense. The high correlation coefficients, i.e., 0.987 for both DataS and DataD, and 0.943 for DataT, signifies the LSSVR model we proposed here can capture the general properties by which different peptides fly through drift cell in different velocities.
Figure 2. Regression performance between the observed and predicted drift times for the peptide ions with different charge states. (A) DataS, (B) DataD, and (C) DataT. The linear function in each subfigure is the linear fitted function between the observed and predicted drift time for every datapoint in each dataset, and the line is the corresponding fitted curve. R denotes the correlation coefficient of observed vs. predicted drift time.
After the LSSVR models had finished the regression analysis for the three datasets with different charge states ions, the variation threshold η_{t }will decide which peptide can be predicted correctly. Figure 3 displays the relation between the fraction of peptide ions whose drift time are predicted correctly and the accuracy threshold η_{t}. It can be seen that our proposed method can get best prediction performance in the DataS. The reason we believe is the peptides in DataS are small and have higher probability they adopt elongated conformations in order to minimize coulomb repulsion, while the peptides in DataT usually are large and have higher probability to form secondary structure when they go through the drift cell in IMMS instrument. It can be found even the variation threshold is set as 0.10, there are more than 90% peptides can be predicted correctly, by which the prediction performance of our LSSVR model can be demonstrated. If the conformation information can be added into the regression model, the predictive power for doubly and triplycharge peptides will be increased undoubtedly.
Figure 3. Fraction of peptide ions correctly predicted at different accuracy variation levels. A higher curve indicates a larger number of peptides for a given threshold value.
Conclusions
To enhance the confidence of peptide identification, a LSSVR model was developed in this study to predict peptide ion drift time for IMMS measurements. In LSSVR, there are two parameters, i.e., the width of Gaussian kernel parameter σ, and the regularization factor γ, have to be selected for their influence on the regression accuracy. A grid searching strategy was employed to optimize the selection of these two parameters. Based on the peptide sequence, a 34component vector was extracted as representation to construct our LSSVR models on three peptide ion datasets with different charge states. With the prediction accuracy threshold η was set to 0.15, we achieved very high performance, i.e., 0.9811 and 0.9379, for the peptide ions with singly and doublycharge, which indicated the prediction capability of the LSSVR models. It is reasonable that there is a relative lower prediction accuracy of 0.8312 for DataT, for the peptides with higher charge states have a higher probability that they can form a secondary structure. This kind of situation will be improved if the structure information can be added into our proposed LSSVR models; even more computational cost will be requested.
Methods
Peptide dataset
The total of 595 peptides of 20 pure proteins used in this work was reported in our previous work [12]. The proteins were purchased from Sigma Aldrich and used without further purification. The peptide fragments were produced from the pure proteins according to the details of the sample preparation section in the report, and then were analyzed by direct electrospray into the Synapt HDMS instrument (Waters). Peptide ion assignments were obtained from a peptide mass fingerprint for each tryptic digest. As a result, in the dataset with 595 peptide ions, there are 212 peptides were singly charged, 306 were doubly charged and 77 were triply charged. More details about the experimental processing of samples can be obtained from the work [12,26].
Support vector regression
Support vector machines, a specific class of machine learning algorithms which was firstly proposed by Vapnik and his coworkers in 1995 [12], have proven very effective for solving pattern classification problems, even for the dataset in small size. For a binary classification problem, the main idea of SVM is to select a hyperplane that separates the positive from negative samples while maximizing the minimum margin. Currently, SVM has been became one of the most popular machine learning methods, which has been applied to various domains of interest, such as bioinformatics, cheminformatics, image processing, data mining, knowledge discovery, and etc. In many applications, SVM can achieve excellent performance for the character that the capacity of the SVM system is controlled by parameters that do not depend on the dimensionality of feature space [2732].
In the same way as with classification task, SVM can also be applied to the case of regression which is called support vector regression (SVR). In statistics, regression analysis is a statistical technique for estimating the relationships among variables. All the regression tasks can be formulated as to seek an estimation function which can approximate the observations within an acceptable error range. In this study, least square support vector regression (LSSVR), a version of SVR which can reduce the complexity of optimization processes, was adopted for the drift time prediction[33].
Given a training dataset D = {x_{i}, y_{i}}(i = 1, 2, ..., n), x R ∈ ^{n }, y ∈ R, where x_{i }is the input vector, y_{i }is its corresponding target vector and n is the size of the dataset, SVR can construct regression model by using nonlinear mapping function ϕ(·) as follows:
where w is the vector of coefficients and b a constant. Usually, w and b are obtained by minimizing the upper bound of generalization error. Accordingly, the regression problem in LSSVR can be transformed into the following optimization problem[34]:
where γ is the regularization parameter, is applied to control the minimization of estimation error and the function smoothness, and e_{i }is the error between actual output and predictive output of the i th input data. The high value of γ denotes the good fitting of the training data points is stressed, and in the case of noisy data a smaller γ value should be taken to avoid overfitting. In order to solve the optimization problem, the Lagrangian function is formulated as following:
where α = (α_{1}, α_{2}, ..., α_{l}) is the Lagrange multiplier. The KKT conditions are used for optimality by differentiating L with the variable w,b,e,α, which is shown as follows.
By solving the upper linear system, the final solution of the primal problem can be represented in the following form.
where K(•) is kernel function which can satisfy Mercer's condition corresponds to a dot product ion some feature spaces [34]. The most used kernel functions include the Gaussian RBF K(x, x_{i }) = exp(x − x_{i} / 2σ^{2}) with a width of σ, sigmoid and the polynomial kernel K(x, x_{i}) = (a_{1}xx_{i}+a_{2})^{d }with an order of and constants a_{1 }and a_{2}. Gaussian RBF kernel is employed in this study, and the kernel parameter σ2 and γ, therefore, should be determined firstly. Currently, many approaches have been applied in parameter optimization of SVR, such as experience [27], grid searching [35], particle swarm optimization(PSO) [36], genetic algorithm(GA) [37], simulated annealing algorithm [38]. Considering computing complexity, crossvalidation grid searching, the most used method, is selected to determine the parameters σ2 and γ in LSSVR model.
Peptide representation
To implement LSSVR model to predict peptide drift time in IMMS, each peptide have be represented as a vector with specific peptide features. Because each peptide is not consistent in the length, and the shape is affected by the charge state of the peptides, only features were extracted from the peptide sequence, therefore, are used to represent the peptide in this work.
Peptide molecular weight
In IMMS, the ions are pulled by a uniform electric field through the buffer gas in the drift cell. Therefore, the molecular weight of peptide is one of the most important parameters which can affect ion mobility. Karasek et al. found there is a linear relationship between the reduced mobility of alkylamines and molecular weight under a specific experimental setting [39]. Also, other researches reported that the reduced mobility is inversely proportional to ion mass [40]. For a peptide P whose sequence is consisted of N amino acid residues as follows:
Where R_{i }denote the i th amino acid in the peptide. The molecular weight of P can be calculated as:
where mw_{i }is the molecular weight of i th amino acid in the peptide sequence.
Sequence length
The sequence length (SL) of peptide, N, plays an important role in the formation of peptide's structure. The longer of the peptide sequence is, the more chance the peptide folds into a secondary or tertiary structure. Except charge states, IMS distinguishes ions based on the ion shapes which is affected by the sequence length. The previous work indicated that peptides only with primary structure will have smaller ion mobility than that with secondary structure, and smaller more than that with tertiary structure.
Amino acid composition
All the peptide information is contained in its complete amino acid sequence. Therefore, it is the best choice for representing each peptide by its complete sequence. Amino acid composition (AAC) is one of the popular approaches to address protein or peptide representation problem because it is simple, yet powerful feature in prediction of protein structure, interaction, and functional sites. Generally, there are only twenty standard amino acid residues are considered in AAC. Therefore, AAC is a 20components vector, where each component shows the occurrence number of an amino acid type in the peptide sequence (in many works, ACC is expressed by the occurrence frequencies, not numbers). For peptide P, ACC can be expressed by
Where a_{i }denotes the normalized frequency of i th type of amino acid in peptide P.
Pseudoamino acid composition
Though AAC can represent peptides in a very simply way, it ignores all the information of amino acid sequenceorder effects, which decide the local environment of each amino acid in the peptide. Therefore, Pseudo amino acid composition (PseAAC) was originally introduced by KuoChen for representing proteins and had demonstrated its effectiveness in improving protein subcellular localization prediction, membrane protein type prediction and other works [41]. For peptide P, PseAAC could be formulated as
Where p_{1}, p_{2}, ..., p_{20}, are associated with the conventional amino acid composition of P, which already represented by sequence length and ACC in above, and are the λ correlation factors that reflect the 1st tier, 2nd tier, ..., and the λth tier sequence order correlation patterns. Therefore, only in PseAAC_{P }have been adopted for representing peptides. In this work, six characters of 20 amino acid, i.e., hydrophobicity, hydophilicity, mass, pK1(alphaCOOH), pK2(NH3) and pI(at 25 °), have been used for calculated PseAAC_{P }, and λ is set up to 2.
Feature normalization
From the above section, it can be found that four types of sequencebased features were applied to represent peptides. However, these four features are of different physical dimension of quantity and different value ranges. The imbalanced expression level of different features will result in a variation in contribution of each of them to the drift time predictor. To remove the bias of expression level, all of the feature values have to be normalized to equally reflect (as much as possible) the influence of each feature. In this work, all values of each feature always fall within a fixed interval [1, 1] by
where f is the raw value of feature, f_{normalized }denotes the normalized value of this feature, f_{min }and f_{max }are the minimum and maximum values of the corresponding feature category.
Regression model construction
In our experiment, regression predictor is designed using LSSVR model to solve drift time prediction from peptide sequencebased features. Based on the description of peptide representation, the LSSVR model for predicting peptide drift time are constructed on a vector consisted of four sequencebased features, of which MW is of with 1 dimension, SL with 1 dimension and AAC with 20 dimensions. For PseAAC, the dimension is 12 for we employed 2tier sequence correlation pattern with 6 amino acid characters. As a result, each peptide is represented in the predictor by a 34component vector. For the peptide datasets, i.e., DataS, DataD and DataT, we construct three LSSVR model for each dataset because the determinative effect of charge state to ion mobility.
Crossvalidation
To evaluate the prediction performance of each regression model, a 10fold crossvalidation strategy was adopted for regression analysis. Specifically for singlycharged peptides, DataS is randomly partitioned into 10 subdatasets, of which a single subdataset is retained as the validation data for testing the model, and the remaining 9 subdatasets are used as training data. After training processes were finished, the LSSVR model can be applied to the prediction task. This process is then repeated 9 times with each of the ten subdatasets used exactly once as the testing data. The 10 results from the folds are combined to evaluate the prediction performance.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
BW and JZ conceived of the study; ZJ, SD and CL participated in the experiment design; BW, JZ and PC carried it out and drafted the manuscript. All authors revised the manuscript critically.
Acknowledgements
This work was funded by the National Science Foundation of China (No.61272269 and No.61133010).
Declarations
This article has been published as part of BMC Bioinformatics Volume 14 Supplement 8, 2013: Proceedings of the 2012 International Conference on Intelligent Computing (ICIC 2012). The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/14/S8.
References

Henderson SC, Valentine SJ, Counterman AE, Clemmer DE: ESI/ion trap/ion mobility/timeofflight mass spectrometry for rapid and sensitive analysis of biomolecular mixtures.
Anal Chem 1999, 71(2):291301. PubMed Abstract  Publisher Full Text

Hoaglund CS, Valentine SJ, Sporleder CR, Reilly JP, Clemmer DE: Threedimensional ion mobility/TOFMS analysis of electrosprayed biomolecules.
Anal Chem 1998, 70(11):22362242. PubMed Abstract  Publisher Full Text

Kanu AB, Wu C, Hill HH Jr: Rapid preseparation of interferences for ion mobility spectrometry.
Anal Chim Acta 2008, 610(1):125134. PubMed Abstract  Publisher Full Text

Harry EL, Weston DJ, Bristow AW, Wilson ID, Creaser CS: An approach to enhancing coverage of the urinary metabonome using liquid chromatographyion mobilitymass spectrometry.
J Chromatogr B Analyt Technol Biomed Life Sci 2008, 871(2):357361. PubMed Abstract  Publisher Full Text

Budimir N, Weston DJ, Creaser CS: Analysis of pharmaceutical formulations using atmospheric pressure ion mobility spectrometry combined with liquid chromatography and nanoelectrospray ionisation.
Analyst 2007, 132(1):3440. PubMed Abstract  Publisher Full Text

Li H, Giles K, Bendiak B, Kaplan K, Siems WF, Hill HH Jr: Resolving structural isomers of monosaccharide methyl glycosides using drift tube and traveling wave ion mobility mass spectrometry.
Anal Chem 2012, 84(7):32313239. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Mechref Y, Hu Y, Garcia A, Hussein A: Identifying cancer biomarkers by mass spectrometrybased glycomics.
Electrophoresis 33(12):17551767. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Zinnel NF, Pai PJ, Russell DH: Ion mobilitymass spectrometry (IMMS) for topdown proteomics: increased dynamic range affords increased sequence coverage.
Anal Chem 2012, 84(7):33903397. PubMed Abstract  Publisher Full Text

Zhong Y, Hyung SJ, Ruotolo BT: Ion mobilitymass spectrometry for structural proteomics.
Expert Rev Proteomics 2012, 9(1):4758. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Uetrecht C, Rose RJ, van Duijn E, Lorenzen K, Heck AJ: Ion mobility mass spectrometry of proteins and protein assemblies.
Chem Soc Rev 39(5):16331655. PubMed Abstract  Publisher Full Text

Kanu AB, Dwivedi P, Tam M, Matz L, Hill HH Jr: Ion mobilitymass spectrometry.
J Mass Spectrom 2008, 43(1):122. PubMed Abstract  Publisher Full Text

Wang B, Valentine S, Plasencia M, Raghuraman S, Zhang X: Artificial neural networks for the prediction of peptide drift time in ion mobility mass spectrometry.
BMC Bioinformatics 2010, 11:182. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

van Duijn E, Barendregt A, Synowsky S, Versluis C, Heck AJ: Chaperonin complexes monitored by ion mobility mass spectrometry.
J Am Chem Soc 2009, 131(4):14521459. PubMed Abstract  Publisher Full Text

Thalassinos K, Grabenauer M, Slade SE, Hilton GR, Bowers MT, Scrivens JH: Characterization of phosphorylated peptides using traveling wavebased and drift cell ion mobility mass spectrometry.
Anal Chem 2009, 81(1):248254. PubMed Abstract  Publisher Full Text

Venne K, Bonneil E, Eng K, Thibault P: Improvement in peptide detection for proteomics analyses using NanoLCMS and highfield asymmetry waveform ion mobility mass spectrometry.
Anal Chem 2005, 77(7):21762186. PubMed Abstract  Publisher Full Text

Williams JP, Scrivens JH: Coupling desorption electrospray ionisation and neutral desorption/extractive electrospray ionisation with a travellingwave based ion mobility mass spectrometer for the analysis of drugs.
Rapid Commun Mass Spectrom 2008, 22(2):187196. PubMed Abstract  Publisher Full Text

Verbeck GF, Ruotolo BT, Gillig KJ, Russell DH: Resolution equations for highfield ion mobility.
J Am Soc Mass Spectrom 2004, 15(9):13201324. PubMed Abstract  Publisher Full Text

Baker ES, Tang K, Danielson WF, Prior DC, Smith RD: Simultaneous fragmentation of multiple ions using IMS drift time dependent collision energies.
J Am Soc Mass Spectrom 2008, 19(3):411419. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

HoaglundHyzer CS, Counterman AE, Clemmer DE: Anhydrous protein ions.
Chem Rev 1999, 99(10):30373080. PubMed Abstract  Publisher Full Text

Valentine SJ, Counterman AE, Clemmer DE: A database of 660 peptide ion cross sections: use of intrinsic size parameters for bona fide predictions of cross sections.
J Am Soc Mass Spectrom 1999, 10(11):11881211. PubMed Abstract  Publisher Full Text

Liu XH, Liang J, Fan JC, Shang ZC: Prediction of Ion Drift Times for a ProteomeWide Peptide Set Using Partial Least Squares Regression, LeastSquares Support Vector Machine and Gaussian Process.
Qsar & Combinatorial Science 2009, 28(1112):13861393. PubMed Abstract  Publisher Full Text

Shah AR, Agarwal K, Baker ES, Singhal M, Mayampurath AM, Ibrahim YM, Kangas LJ, Monroe ME, Zhao R, Belov ME, et al.: Machine learning based prediction for peptide drift times in ion mobility spectrometry.
Bioinformatics 26(13):16011607. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Zhang Y, Jin Q, Wang S, Ren R: Modeling and prediction of peptide drift times in ion mobility spectrometry using sequencebased and structurebased approaches.
Comput Biol Med 41(5):272277. PubMed Abstract  Publisher Full Text

Wang B, Valentine S, Raghuraman S, Plasencia M, Zhang X: Prediction of peptide drift time in ion mobilitymass spectrometry.

Wang B, Valentine S, Plasencia M, Zhang XA: Prediction of Drift Time in Ion MobilityMass Spectrometry Based on Peptide Molecular Weight.
Protein and Peptide Letters 2010, 17(9):11431147. PubMed Abstract  Publisher Full Text

Wang B, Valentine S, Plasencia M, Zhang X: Prediction of drift time in ion mobilitymass spectrometry based on Peptide molecular weight.
Protein Pept Lett 2010, 17(9):11431147. PubMed Abstract  Publisher Full Text

Vapnik VN: The nature of statistical learning theory. New York: Springer; 1995.

Wang B, Chen P, Huang DS, Li JJ, Lok TM, Lyu MR: Predicting protein interaction sites from residue spatial sequence profile and evolution rate.
Febs Letters 2006, 580(2):380384. PubMed Abstract  Publisher Full Text

Wang B, Wong HS, Huang DS: Inferring proteinprotein interacting sites using residue conservation and evolutionary information.
Protein and Peptide Letters 2006, 13(10):9991005. PubMed Abstract  Publisher Full Text

Chen P, Wang B, Wong HS, Huang DS: Prediction of protein Bfactors using multiclass bounded SVM.
Protein Pept Lett 2007, 14(2):185190. PubMed Abstract  Publisher Full Text

Huang DS, Zheng CH: Independent component analysisbased penalized discriminant method for tumor classification using gene expression data.
Bioinformatics 2006, 22(15):18551862. PubMed Abstract  Publisher Full Text

Zheng CH, Zhang L, Ng VT, Shiu SC, Huang DS: Molecular pattern discovery based on penalized matrix decomposition.
IEEE/ACM Trans Comput Biol Bioinform 8(6):15921603. PubMed Abstract  Publisher Full Text

Chen P, Liu C, Burge L, Li J, Mohammad M, Southerland W, Gloster C, Wang B: DomSVR: domain boundary prediction with support vector regression from sequence information alone.
Amino Acids 39(3):713726. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Suykens JAK: Least squares support vector machines. River Edge, NJ: World Scientific; 2002.

Khemchandani R, Chandra S: Regularized least squares fuzzy support vector regression for financial time series forecasting.
Expert Systems with Applications 2009, 36(1):132138. Publisher Full Text

Kavaklioglu K: Modeling and prediction of Turkey's electricity consumption using Support Vector Regression.
Applied Energy 2011, 88(1):368375. Publisher Full Text

Hong WC: Chaotic particle swarm optimization algorithm in a support vector regression electric load forecasting model.
Energy Conversion and Management 2009, 50(1):105117. Publisher Full Text

Fernandez M, MirandaSaavedra D: Genomewide enhancer prediction from epigenetic signatures using genetic algorithmoptimized support vector machines.
Nucleic Acids Res 40(10):e77. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Zhang QL, Shan GL, Duan XS, Zhang ZN: Parameters Optimization of Support Vector Machine based on Simulated Annealing and Genetic Algorithm.
2009 Ieee International Conference on Robotics and Biomimetics (Robio 2009), Vols 14 2009, 13021306.

Karasek FW, Hill HH Jr, Kim SH: Plasma chromatography of heroin and cocaine with massidentified mobility spectra.
J Chromatogr 1976, 117(2):327336. PubMed Abstract  Publisher Full Text

Tuovinen K, Paakkanen H, Hänninen O: Detection of pesticides from liquid matrices by ion mobility spectrometry.
Analytica Chimica Acta 2000, 404(1):7. Publisher Full Text