Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China

Acupuncture and Moxibustion College, Chengdu University of Traditional Chinese Medicine, Chengdu 610075, China

National Center for Biomedical Analysis, Beijing 100850, China

National Center for Mathematics and Interdisciplinary Sciences, Chinese Academy of Sciences, Beijing 100190, China

Abstract

Background

Acupuncture has been practiced in China for thousands of years as part of the Traditional Chinese Medicine (TCM) and has gradually accepted in western countries as an alternative or complementary treatment. However, the underlying mechanism of acupuncture, especially whether there exists any difference between varies acupoints, remains largely unknown, which hinders its widespread use.

Results

In this study, we develop a novel Linear Programming based Feature Selection method (LPFS) to understand the mechanism of acupuncture effect, at molecular level, by revealing the metabolite biomarkers for acupuncture treatment. Specifically, we generate and investigate the high-throughput metabolic profiles of acupuncture treatment at several acupoints in human. To select the subsets of metabolites that best characterize the acupuncture effect for each meridian point, an optimization model is proposed to identify biomarkers from high-dimensional metabolic data from case and control samples. Importantly, we use nearest centroid as the prototype to simultaneously minimize the number of selected features and the leave-one-out cross validation error of classifier. We compared the performance of LPFS to several state-of-the-art methods, such as SVM recursive feature elimination (SVM-RFE) and sparse multinomial logistic regression approach (SMLR). We find that our LPFS method tends to reveal a small set of metabolites with small standard deviation and large shifts, which exactly serves our requirement for good biomarker. Biologically, several metabolite biomarkers for acupuncture treatment are revealed and serve as the candidates for further mechanism investigation. Also biomakers derived from five meridian points, Zusanli (ST36), Liangmen (ST21), Juliao (ST3), Yanglingquan (GB34), and Weizhong (BL40), are compared for their similarity and difference, which provide evidence for the specificity of acupoints.

Conclusions

Our result demonstrates that metabolic profiling might be a promising method to investigate the molecular mechanism of acupuncture. Comparing with other existing methods, LPFS shows better performance to select a small set of key molecules. In addition, LPFS is a general methodology and can be applied to other high-dimensional data analysis, for example cancer genomics.

Background

Acupuncture, an important therapeutic method in Traditional Chinese Medicine (TCM), has been used to treat various diseases for thousand years in China. Recently it has been gradually accepted in western countries as an alternative or complementary treatment. However, how the acupuncture works remains an open question though acupuncture exists as one of the oldest continuous systems of medicine dating back 4,000 years. Extensive studies have been conducted on the mechanism of acupuncture to explain the effects of acupuncture on various systems and symptoms

In this paper, we use systems biology method to study the acupuncture treatment effect by identifying a subset of important molecules from high-throughput metabolic data. Specifically, we separate the acupuncture from moxibustion and only study the effect of acupuncture on normal people by investigating the difference between acupuncture at particular acupoint and without acupuncture. Towards this aim, we utilize ^{1}H nuclear magnetic resonance (^{1}H NMR) to investigate the effects of acupuncture at several meridian points on plasma metabolites. Then metabolite profiles (vectors) are generated from a collection of case samples(with acupuncture at meridian point) and control samples (without acupuncture). These high-dimensional profile data is very similar to SNP (sequence data), gene expression (transcriptome), mass spectrum (proteome), and small molecules (metabolome) data in different levels. Then the straightforward task is to identify differentially expressed molecules and further classify and predict the diagonostic category of a sample, based on its metabolite profile

Generally speaking, there are two difficulties in analyzing these high-dimensional profile data. First, a large number of features (metabolites in our case) are available to predict classes for a relatively small number of samples. The presence of a significant number of irrelevant features that are unrelated to the case status makes such analysis prone to the curse of dimensionality. Second, predictive accuracy is not the only goal and further biological validation and mechanism understanding call for explanatory power other than black box predictive results. Thus it is especially important to know which molecules largely contribute towards the classification. Ideally we can improve the generalization performance of classifier by identifying only the molecules that are significantly contribute to the classifier. This effect is attributable to the overcoming of the curse of dimensionality. For example, if it is possible to identify a small set of metabolites that is indeed capable of providing complete discriminatory information, inexpensive diagnostic assays for only a few metabolites might be developed and be widely deployed in clinical settings. Knowledge of a small set of diagnostically relevant metabolites may provide important insights into the mechanisms responsible for acupuncture treatment itself. Those molecules are usually termed as biomarkers. The procedure to reveal them is referred as feature selection, biomarker identification, or feature ranking.

Feature selection is known to be NP-hard ^{m}). Thus, this method is not practical for realistic applications.

Existing feature selection strategies can be roughly categorized into three types

In this paper, we proposed a novel linear programming (LP) model to address this important problem. Feature selection problem is cast into an optimization problem with two objectives, one is to minimize the number of chosen features and the other is to maximize the predictive accuracy based on the centroid classification framework. In other words, our feature selection method simultaneously improves classification accuracy and selects features. Comparing with several state-of-the-art feature selection methods, our Linear Programming based Feature Selection (LPFS) method can select a small set of features by applying strong regularization while keeping high accuracy. We then apply our method to analyze the metabolite profile data generated for acupuncture treatment. We identify important molecules (biomarkers) related to the acupuncture treatment for several meridian points. Further characterization of the biomarkers and the common and difference among several meridian points provide biological insights for acupuncture mechanisms at molecular level. Preliminary results in this paper were presented in our conference paper

Method

Analytic workflow

In this paper, the acupuncture treatment effect is investigated in the framework of systems biology. The basic analytic workflow is shown in Figure ^{1}H NMR from control samples and the samples with acupuncture treatment in meridian points. Then we develop a linear programming based feature selection method to compare the two groups of metabolite profiles. Finally, a small set of metabolites are selected as biomarkers for acupuncture treatment effect.

Our analytic workflow to identify biomarkers for acupuncture treatment effect

**Our analytic workflow to identify biomarkers for acupuncture treatment effect**. Metabolite profiles are originally generated by ^{1}H NMR from control samples and the samples with acupuncture treatment. Then we develop a linear programming based feature selection method to compare the two groups of metabolite profiles. Finally a small set of metabolites are selected as biomarkers for acupuncture effect.

Overview of the linear programming based feature selection

To investigate the high-dimensional data for acupuncture treatment effect, we develop a novel method, LPFS, to select a small set of metabolites to characterize acupuncture treatment effect. The schematic illustration of LPFS is shown in Figure

The schematic illustration of LPFS

**The schematic illustration of LPFS**. Under the nearest centroid framework, LPFS requires to minimize the classification error and the number of selected features, which leads to a multi-objective programming. To ensure the computational efficiency, a linear programming is solved to identify biomarkers.

Centroid classification prototype

A fast and simple algorithm for classification is the centroid method _{j }is defined as the arithmetic mean:

where _{i }is the _{j}. Recall that the training sample is a metabolite spectra represented as a multi-dimensional vector (denoted in bold). In a similar fashion, we can obtain a prototypical vector for all the other classes. During classification, the class label of an unknown sample ** s **is determined as:

where ** x**,

where ** x**,

with ║_{1 }= ∑_{i }|** y**. The value

Multi-objective optimization model

Suppose we have two groups in the training dataset, the case group and the control group as the gold-standard data to classify new samples. We denote them set _{1}, |_{2}, and the computed centroids are _{T }and _{F }respectively. A simple classification scheme is as follows. Given a new sample ** s**, we want to decide which group it belongs to. The

Let the feature number be ** x **= (

Suppose the test dataset is _{T }and control group _{F}. _{T }∪ _{F}, and |_{T}| = _{1}, |_{F}| = _{2}. Given a case sample _{l }= (_{l1}, _{l2}, ⋯, _{ln}), _{1}}, if it is classified correctly, we should have

Where _{k }= (_{k1}, _{k2}, ⋯, _{kn}) ∈ _{1 }and _{k }= (_{k1}, _{k2}, ⋯, _{kn}) ∈ _{2}.

Similarly given a control sample _{l }= (_{l1}, _{l2}, ⋯, _{ln}), _{1 }+ 1, _{1 }+ 2 ⋯, _{1 }+ _{2}}, if it is classified correctly, we should have

With the above constraints for variable ** x **= (

Thus the feature selection problem is formulated as an integer linear programming problem in Equation (10).

When we consider the noise in the measured data, not all the test samples can be classified exactly. We introduce the tolerable error ** y **= {

Given a case sample _{l }= (_{l1}, _{l2}, ⋯, _{ln}), _{1}}, we should have the following constraint considering the tolerable error

Similarly given a control sample _{l }= (_{l1}, _{l2}, ⋯, _{ln}), _{1 }+ 1, _{1 }+ 2 ⋯, _{1 }+ _{2}}, we should have the following constraint considering the tolerable error

Thus the objective function composes two parts, i. e., we want to choose as few as features ** x **= (

The first term of objective function in Equation (13) is to minimize the number of chosen features, and the second one is to minimize the total classification error.

Mixed integer linear programming

The optimal solutions of the two-objective optimization problem consist of a Pareto set, which can be solved by transforming the two objectives of (13) into a single objective. One typical technique is the

(14) is a mixed integer programming (ILP). The objective function in (14) is

By solving the proposed mixed integer linear programming model (14), we can get solutions for the feature selection variables _{i}, _{j}, _{1 }+ _{2}}. Checking if _{i }is equal to 1, we can know if the corresponding feature should be selected in the classifier. Meanwhile checking the values of all the _{j}, we can estimate the classification accuracy. For example, suppose the number of all _{j }= 0 is _{1 }and the number of all _{j }> 0 is _{2}. We can simply calculate the classification accuracy by _{1}/_{1}+_{2 }and _{2}/_{1}+_{2}.

Leave-one-out cross validation

The above model (14) is based on the general idea of cross validation, thus it depends on the choice of

Resubstitution error rate indicates only how good are our biomarkers on the training data. However, this model has "information leak" and will underestimate the classification error. In our implement, we choose the model for leave-one-out cross validation in Equation (16).

We adopt leave-one-out experiment since this particular form of cross validation is an unbiased estimator of the generalization performance of classifier. It makes the best use of the available data and involves no random subsampling. Every time we pick out one sample (_{1 }= 1 or _{2 }= 1) from the training data and try to classify it correctly. And by doing _{1 }+ _{2 }times test we add _{1 }+ _{2 }constraints.

Linear programming approximation

In general, mixed integer linear programming is difficult to solve. To ensure the computational efficiency, mixed ILP in Equation (16) can be relaxed to the corresponding linear programming (LP). Linear programming is the simplest type of mathematical programming and has been widely used in systems biology study

After relaxing to continuous value, the value of the optimal solution of _{i }(LPFS score) indicates the importance of feature _{1 }in our model to achieve the nonlinear classification effect. The parameter λ can be determined by checking the output leave-one-out predictive accuracy. We also notice that LPFS model can be extended to multi-classification task and n-fold cross validation.

Metabonomics profiling by ^{1}H NMR spectra

Venous blood (3ml) was collected into a heparin sodium tube and the plasma was collected by centrifugation at 1000× g at 4°C for 10 minutes. An aliquot of 300 _{2}O and 50 ^{2}H_{4}-propionic acid) in D_{2}O (1 mg/ml) in 5 mm NMR tube. The D_{2}O provided a field-frequency lock solvent for the NMR spectrometer and the TSP served as an internal reference of chemical shift. ^{1}H NMR spectra of the plasma samples were acquired on a Varian INOVA 600 MHz NMR spectrometer at 27°C by using Carr-Purcell-Meiboom-Gill (CPMG) spin-echo pulse sequence. with a total spin-spin relaxation delay (2n^{-1}, with a diffusion delay of 100 ms. A total of 128 transients and 16k data points were collected with a spectral width of 8000 Hz. A line-broadening factor of 1 Hz was applied to FIDs before Fourier transformation.

All plasma ^{1}H NMR spectra were manually phased and baseline corrected using VNMR 6.1C software (Varian, Inc.). For CPMG spectra, each spectrum over the range of 0.4 to 4.4 was data-reduced into integrated regions of equal width (0.01 ppm). For BPP-LED data, each spectrum over the range of 0.1 to 6.0 was segmented into regions of equal width (0.01 ppm). The regions containing the resonance from residual water (4.6-5.1) were excluded. The integral values of each spectrum was normalized to constant sum of all integrals in a spectrum in order to reduce any significant concentration differences between samples

Results

Metabonomics data generation

To investigate the acupuncture treatment effects, we originally obtained metabonomics data of plasma metabolites in healthy males at five meridian points using Proton NMR. Proton NMR (also named as Hydrogen-1 NMR, or ^{1}H NMR) applies nuclear magnetic resonance in NMR spectroscopy with respect to hydrogen-1 nuclei within the molecules of a substance, in order to determine the structure of the molecules

As a result, most organic compounds are characterized by chemical shift values, which are usually expressed in parts per million (ppm) by frequency and are in the range +14 to -4 ppm. Chemical shift values are not precise, but typically they are regarded mainly as orientational. The exact value of chemical shift depends on molecular structure and the solvent in which the spectrum is being recorded. These chemical shift values can be mapped to eight metabolic subsets (amino acids, carbohydrates, energy, glycans, lipids, nucleotides, secondary metabolites/xenobiotics, vitamins, and cofactors). In our experiment, 400 chemical shift values are measured for their concentration in plasma, and mathematically every sample is represented by a vector in 400 dimensional space.

Fifty healthy young males were randomly allocated to Zusanli (ST36), Liangmen (ST21), Juliao (ST3), Yanglingquan (GB34), and Weizhong (BL40) groups (The locations of the meridian points are shown in Figure ^{1}H NMR to derive metabolic profiles (see details in Method Section). Furthermore to exclude possible noises, all the seventy males are strictly trained to make sure their metabolic profiles are measured in very similar conditions. The detailed experimental method can be found in

Overall design of the biomarker identification experiments

**Overall design of the biomarker identification experiments**. a) Metabolite profiles are originally generated by ^{1}H NMR from five acupuncture points (Zusanli (ST36), Liangmen (ST21), Juliao (ST3), Yanglingquan (GB34), and Weizhong (BL40)). b) The metabolite profiles are grouped into 7 sets and the biomarker identification problem is designed as 7 binary classification experiments.

Classification experiments design

With the data, we design experiments to identify biomarkers for the acupuncture treatment of each meridian point. The overall design of biomarker identification experiments is shown in Figure

Global characterization of the data

We first perform hierarchical clustering on the 80 metabolic profiles. The results are shown in Figure

Global characterization of the metabolite profiles

**Global characterization of the metabolite profiles**. a) Hierarchical clustering of the metabolic profiles of the 80 samples. b) Centroids for the seven datasets. The horizontal units are expression values for the metabolites. The metabolites are sorted by their chemical shift values.

Furthermore, we calculate the centroids for the seven groups of samples in Figure

The above results together demonstrate that global pattern in metabolic profiles cannot discriminate the Zusanli, Yanglingquan, Liangmen, Juliao, Weizhong, Pre1, Pre2, and Control I groups. Thus it is necessary to find the local pattern in the profile data. Our strategy is to find a subset of metabolites as biomarkers to achieve clear discrimination.

Comparison with other approaches

Before we conduct the acupuncture biomarker identifications, we benchmark our LPFS method by comparing with several existing state-of-the-art methods. There are many existing feature selection methods and they can be roughly categorized into three types, filter, wrapper, and embedded methods. To make the comparison simple and comprehensive, we pick out some representative methods in each type to compare in the same dataset.

Filter methods select features as a preprocessing step and feature selection part is independent of a machine learning algorithm (classifier). This is computationally efficient. Fold change and t-test are the simplest and popular methods to identify biomarker. They are usually the representative methods for filter methods.

Let _{ij }and _{ij }denote the log expression values of metabolite

Where _{i }are the mean of case, mean of control, and the standard deviation of the samples for metabolite _{i }we can easily calculate a p-value. Usually a feature is selected if its corresponding p-value is smaller than a predefined threshold 0.05.

The standard definition of the fold-change

Where _{ij }are the raw expression values of metabolite

On the other hand, wrapper method ranks features based on their effects on classification accuracy. It takes dependencies of the feature subset on the learning algorithm into account and is computationally more demanding. Support Vector Machine-Recursive Feature Elimination (SVM-RFE) is one of the most successful wrapper method based algorithm in the feature selection

Since our LPFS method is an embedded method and simultaneously optimize classification accuracy and the number of selected features, we specifically choose to compare with an existing method with similar strategy, called sparse multinomial logistic regression approach (SMLR). It was developed to jointly and simultaneously identify the optimal nonlinear classifier, and select the optimal set of features via the optimization of a single posterior objective function (see

Without loss of generality, we take Exp1 in Figure

Firstly, we compare different methods in Figure

Methods comparison via volcano like plots

**Methods comparison via volcano like plots**. Comparison of our LPFS method with existing methods regarding to the identified biomarkers. All the 400 metabolites are plotted into a two dimensional plane. The selected biomarkers are highlighted in red. The x-axis denotes the difference of means and the y-axis denotes the standard derivation. Good biomakers should locate either in the left bottom corner or in the right bottom corner. a) volcano like plot of t-test method. The top 10 features are in red. b) volcano like plot of fold change method. The top 10 features are in red. c) volcano like plot of SMLR method. d) volcano like plot of SVM-RFE method. e) volcano like plot of our LPFS method.

The t-test based method identifies 172 metabolites if we choose a cutoff 1.73 (corresponding p-value 0.05). A strict threshold will still select 84 metabolites with cutoff 2.84 (corresponding p-value 0.005). We list the top 10 in Table

The top ten identified biomarkers by different methods on the ST36 meridian point.

**Student t-test**

**Fold change**

**SMLR**

**SVM-RFE**

**Our LPFS method**

**ID**

**ppm**

**t-score**

**ID**

**ppm**

**FC-score**

**ID**

**ppm**

**ID**

**ppm**

**ID**

**ppm**

**LPFS score**

**Metabolite name**

86

3.55

15.29

86

3.55

73.48

45

4

86

3.55

86

3.55

0.015

195

2.46

11.91

87

3.54

68.52

50

3.95

68

3.73

92

3.49

0.008

251

1.9

11.07

308

1.33

46.58

52

3.93

87

3.54

87

3.54

0.006

a-glucose/glycine

45

3.96

10.96

310

1.31

41.61

58

3.87

69

3.72

308

1.33

0.002

lactate

229

2.12

10.86

70

3.71

40.75

60

3.85

102

3.39

81

3.6

10.80

92

3.49

38.61

67

3.78

70

3.71

18

4.23

10.03

293

1.48

37.45

68

3.77

295

1.46

127

3.14

9.75

295

1.46

33.26

69

3.76

116

3.25

17

4.24

9.71

71

3.7

28.55

70

3.75

229

2.12

232

2.09

9.35

69

3.72

26.30

71

3.74

88

3.53

The fold change based method identifies 159 metabolites if we choose a commonly used cutoff 2

Identified biomarkers from different meridian points by our LPFS method.

**Zusanli ST36**

**Liangmen ST21**

**Juliao ST3**

**Yanglingquan GB34**

**Weizhong BL40**

**Metabolite**

**ppm**

**ID**

**Metabolite**

**ppm**

**ID**

**Metabolite**

**ppm**

**ID**

**Metabolite**

**ppm**

**ID**

**Metabolite**

**ppm**

**ID**

**3.55**

86

**2.11**

230

**3.55**

86

**3.55**

86

**3.78**

63

a-glucose/glycine

**3.54**

87

**0.88**

353

a-glucose/glycine

**3.54**

87

a-glucose/glycine

**3.54**

87

**3.99**

42

**3.49**

92

histidine/taurine

**3.25**

116

threonine

**1.32**

309

**3.88**

53

lactate

**1.33**

308

**3.55**

86

lipid

**1.3**

311

lactate

**1.33**

308

lysine/arginine

**1.91**

250

a-glucose/glycine

**3.54**

87

**3.92**

49

**3.2**

121

**3.49**

92

**3.2**

121

While SMLR selects 37 features to achieve the 100% leave-one-out predictive accuracy. These 37 metabolites are plotted in Figure

SVM-RFE selects 15 metabolites in total to achieve the 100% leave-one-out predictive accuracy. These metabolites are illustrated in Figure

Our LPFS method finally selects 4 features as the biomarkers to discriminate ST36 and Pre1. By using only 4 features we can achieve 100% leave-one-out predictive accuracy. To show these four important biomarkers are not dependent on the nearest centroid classifier, we use SVM to do five-fold cross validation, the predictive accuracy is still 100%. This demonstrates that we can select a small set of important features really matters by applying strong regularization. The selected 4 metabolites are listed in Table

Secondly, we compare the results of these five methods in a venn diagram in Figure

Venn diagram for the results obtained by different methods

**Venn diagram for the results obtained by different methods**. a) Comparing t-test, SMLR, SVM-RFE, and LPFS methods by checking the overlaps of their selected biomarkers. b) Comparing fold change, SMLR, SVM-RFE, and LPFS methods.

In addition to the overall venn diagram, the top ten biomarkers obtained by the t-test, fold change, SVM-RFE methods are summarized in Table

Biological insights for the identified biomarkers

We then applied the proposed LPFS method to identify the biomarkers from the designed seven experiments. As a result, we identified 4, 7, 2, 3, and 8 biomarkers for the acupuncture treatment effects of ST36, ST21, ST3, GB34, and BL40 respectively. These selected biomarkers can achieve 100%,100%,100%,100%, and 95% leave-one-out cross validation accuracy. The results are summarized in Table

From Table ^{1}H NMR and biomarker identification method provide experimental evidence for distinguishing between Yangming meridian points and other meridian points from the metabolic aspect. This fact may become a new useful information source to study the specificities of meridian points.

To reveal the similarity and difference of the identified biomarkers regarding to meridian points, we calculate the overlaps of these biomarkers and present them in Figure

Venn diagram for identified biomarkers from different acupuncture points

**Venn diagram for identified biomarkers from different acupuncture points**. The similarity and difference of those identified biomarkers from Zusanli, Liangmen, Yanglingquan, and Weizhong are shown in a venn diagram. The overlapped biomarkers are indicated by their ppm and known annotation.

Our results show that metabolite with chemical shift value 3.55 is clearly a common biomarker for ST36, ST21, ST3, and GB34. In Figure

Highlight the selected biomaker in 2D plot

**Highlight the selected biomaker in 2D plot**. Metabolic sample is visualized as a two dimensional image. Each grid denotes a group of metabolites with similar profiles. Red color means the highly expressed metabolite group and blue color means the lowly expressed metabolite group. In particular, metabolite with chemical shift value 3.55 is highlighted in white color and indicated by the star.

Importantly, our LPFS method reveals the metabolite with ppm 1.33 as a biomarker for meridian points ST36 and ST21. This molecule is annotated as lactate. Lactate has been extensively studied over years for many important functions. For example, the lactate has always been regarded as the central nervous system metabolic waste and a sign of hypoxia

Discussions and conclusions

Biomaker identification or feature selection considers the problem of constructing a prediction rule from only a feature-subset and accurately classifying the context of diagnosis and treatment observations (e.g. with vs. without acupuncture treatment). Such problems have become increasing important and quite general in genomics (identifying differentially expressed genes in microarray data), proteomics (finding promising protein marker from the mass spectrometry data), metabolics (selecting metabolite markers from NMR, GC-MS data), and other areas of computational biology. Due to the number of features is much larger than the number of observations, simple, highly regularized approaches are in pressing need. Here, we proposed a novel linear programming based feature selection (LPFS) model to address this important problem. The feature selection problem is cast into an optimization problem with two objectives, one is to minimize the number of chosen features and the other is to maximize the predictive accuracy. Mathematically the feature selection problem is formulated as a mixed integer linear programming problem. Then the model is further relaxed to linear programming to ensure the efficient identification of a feature-subset. We can solve the in-essence combinatorial optimization problem in a computational reasonable way. In summary, our LPFS method can select feature and learn the classifier in a joint way and we can select a small set of features by applying strong regularization. Our methodology is general and can be easily applied to other scenarios

We extensively compared our LPFS method with existing methods in the real datasets on acupuncture treatment at different acupoints. We find that, 1). Our method can select the fewest features while achieve accurate predictions. 2). Our method is free of arbitrary threshold choice. 3). Close check of the selected feature shows that our method can identify those biological meaningful features. 4). In addition, the cross-validation results show that our method can achieve relatively high accuracy in prediction.

Prior information allows further improvement of our method. Currently the identified biomarkers are independent to each other. We can move further step to interpretation by considering a group of biologically meaningful biomarkers. For example, we can incorporate the network information (interactions among features) into the feature selection procedure. As a result, a pathway or modules in the network will be finally selected instead of single molecule as the biomarker, so called network biomarker. We note that prior information can be easily incorporated into our optimization model either by adding some constraints or penalizing in the objective function.

In this paper, the biomarker identification for each acupuncture point is treated as a single binary classification task. We then compare the revealed biomarkers for their similarity and difference across different acupuncture points. We note that a multi-classifier can be developed to systematically integrate all the profiles from different points together. This topic is in progress as our further direction.

Finally, the metabolic profile is known for its high variance. We note that the main source of variance is from NMR technology instead of acupuncture effect

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

YW proposed the computational method. QFW, XZY, SGY, and FRL designed the experimental study and generated the data. YW, CC, LYW, and XSZ implemented the method, performed the experiments and analyzed the data. YW and QFW wrote the manuscript. All authors revised the manuscript and approved the final version.

Acknowledgements

The authors would like to thank Prof. Luonan Chen, Dr. Ruisheng Wang, and ZHANGroup members for insightful discussions. YW, LYW, and XSZ are supported by NSFC grant 61171007, 11131009, 60970091, and CAS grant kjcx-yw-s7. QFW and FRL are supported by NSFC grant 30901933 and National Basic Research Program of China (no.2012CB518500). YW is also supported by SRF for ROCS, SEM and the Shanghai Key Laboratory of Intelligent Information Processing (No. IIPL-2010- 008).

This article has been published as part of