MSACompro: protein multiple sequence alignment using predicted secondary structure, solvent accessibility, and residue-residue contacts

Deng, Xin; Cheng, Jianlin

doi:10.1186/1471-2105-12-472

Research article
Open access
Published: 14 December 2011

MSACompro: protein multiple sequence alignment using predicted secondary structure, solvent accessibility, and residue-residue contacts

Xin Deng¹ &
Jianlin Cheng^1,2,3

BMC Bioinformatics volume 12, Article number: 472 (2011) Cite this article

8301 Accesses
21 Citations
1 Altmetric
Metrics details

Abstract

Background

Multiple Sequence Alignment (MSA) is a basic tool for bioinformatics research and analysis. It has been used essentially in almost all bioinformatics tasks such as protein structure modeling, gene and protein function prediction, DNA motif recognition, and phylogenetic analysis. Therefore, improving the accuracy of multiple sequence alignment is important for advancing many bioinformatics fields.

Results

We designed and developed a new method, MSACompro, to synergistically incorporate predicted secondary structure, relative solvent accessibility, and residue-residue contact information into the currently most accurate posterior probability-based MSA methods to improve the accuracy of multiple sequence alignments. The method is different from the multiple sequence alignment methods (e.g. 3D-Coffee) that use the tertiary structure information of some sequences since the structural information of our method is fully predicted from sequences. To the best of our knowledge, applying predicted relative solvent accessibility and contact map to multiple sequence alignment is novel. The rigorous benchmarking of our method to the standard benchmarks (i.e. BAliBASE, SABmark and OXBENCH) clearly demonstrated that incorporating predicted protein structural information improves the multiple sequence alignment accuracy over the leading multiple protein sequence alignment tools without using this information, such as MSAProbs, ProbCons, Probalign, T-coffee, MAFFT and MUSCLE. And the performance of the method is comparable to the state-of-the-art method PROMALS of using structural features and additional homologous sequences by slightly lower scores.

Conclusion

MSACompro is an efficient and reliable multiple protein sequence alignment tool that can effectively incorporate predicted protein structural information into multiple sequence alignment. The software is available at http://sysbio.rnet.missouri.edu/multicom_toolbox/.

Background

Aligning multiple evolutionarily related protein sequences is a fundamental technique for studying protein function, structure, and evolution. Multiple sequence alignment methods are often an essential component for solving challenging bioinformatics problems such as protein function prediction, protein homology identification, protein structure prediction, protein interaction study, mutagenesis analysis, and phylogenetic tree construction. During the last thirty years or so, a number of methods and tools have been developed for multiple sequence alignment, which have made fundamental contributions to the development of the bioinformatics field.

State of the art multiple sequence alignment methods adapt some popular techniques to improve alignment accuracy, such as iterative alignment [1], progressive alignment [2], alignment based on profile hidden Markov models [3], and posterior alignment probability transformation [4, 5]. Some alignment methods, such as 3D-Coffee [6] and PROMALS3D [7], use 3D structure information to improve multiple sequence alignment, which cannot be applied to the majority of protein sequences without tertiary structures. In order to overcome this problem, we have developed a method to incorporate secondary structure, relative solvent accessibility, and contact map information predicted from protein sequences into multiple sequence alignment. Predicted secondary structure information has been used to improve pairwise sequence alignment [8, 9], but few attempts had been made to use predicted secondary structure information in multiple sequence alignment [10–15]. To the best of our knowledge, applying predicted relative solvent accessibility and residue-residue contact map to multiple sequence alignment is novel.

In order to use the predicted structural information to advance the state of the art of multiple sequence alignment, we first compared the existing multiple sequence alignment tools [16–31, 4, 5, 32–37] on the standard benchmark data sets such as BAliBASE [38], SABmark [39] and OXBENCH [40], which showed that MAFFT [30], T-coffee [31], MSAProbs [4], and ProbCons [5] yielded the best performance. Then we developed MSACompro, a new multiple sequence alignment method, which effectively utilizes predicted secondary structure, relative solvent accessibility, and residue-residue contact map together with posterior alignment probabilities produced by both pair hidden Markov models and partition function as in MSAProbs [4]. The assessment results of MSACompro compared to the benchmark data sets from BAliBASE, SABmark and OXBENCH showed that incorporating predicted structural information has improved the accuracy of multiple sequence alignment over most existing tools without using structural features and sometimes the improvement is substantial.

Method

Following the general scheme in MSAProbs [4], MSACompro has five main steps: (1) compute the pairwise posterior alignment probability matrices based on both pair-HMM and partition function, considering the similarity in amino acids, secondary structure, and relative solvent accessibility; (2) generate the pairwise distance matrix from both the pairwise posterior probability matrices constructed in the first step and the pairwise contact map similarity matrices; (3) construct a guide tree based on pairwise distance matrix, and calculate sequence weights; (4) transform all the pairwise posterior matrices by a weighting scheme; (5) perform a progressive alignment by computing the profile-profile alignment from the probability matrices of all sequence pairs, and then an iterative alignment to refine the results from progressive alignment. Our method is different from MSAProbs in that it adds secondary structure and solvent accessibility information to the calculation of the posterior residue-residue alignment probabilities and computes the pairwise distance matrix with the help of predicted residue-residue contact information.

Construction of pairwise posterior probability matrices based on amino acid sequence, secondary structure and solvent accessibility information

For two protein sequences X and Y in a sequence group S to be aligned, we denote X = (x₁, x₂,......,x_n₁), Y = (y₁, y₂,......,y_n₂), where x₁, x₂,......, x_n₁ and y₁, y₂,......,y_n₂ are lists of the residues in X and Y, respectively. n₁ is the length of sequence X, and n₂ is the length of sequence Y. Suppose x_iis the i-th amino acid in sequence X, and y_jis the j-th amino acid in sequence Y. We let aln denote a global alignment between X and Y, ALN the set of of all the possible global alignments of X and Y, and aln* ∈ ALN the true pairwise alignment of X and Y. The posterior probability that the i-th residue in X (x_i) is aligned to the j-th residue (y_j) in Y in aln* is defined as:

\begin{array}{l} p (x_{i} ~ y_{j} \in a l n^{*} | X, Y) = \\ \sum_{a l n \in A L N} P (a l n | X, Y) I {x_{i} ~ y_{j} \in a l n} \end{array}

(1)

(1 \leq x_{i} \leq n_{1}, 1 \leq y_{j} \leq n 2)

I {x_{i} ~ y_{j} \in a l n} = {\begin{cases} 1, i f (x_{i} ~ y_{j} \in a l n) t r u e \\ o, o t h e r w i s e \end{cases}

P(aln | X, Y) denotes the probability that aln is the true alignment aln*: Thus, the posterior probability n₁ × n₂ matrix P_XYis a collection of all the values p(x_i~ y_j∈ aln* | X, Y) (p(x_i~ y_j) for short) for 1 ≤ x_i≤ n₁, 1 ≤ y_j≤ n 2. The calculation process of the pairwise posterior probability matrix is described as follows.

As in MSAProbs, two different methods (a pair hidden Markov model and a partition function) are used to compute the pairwise posterior probability matrices ( $P_{X Y}^{1}$ and $P_{X Y}^{2}$ ), respectively. The first kind of pairwise probability matrix $P_{X Y}^{1}$ is calculated by a partition function (F) of alignments based on dynamic programming. F(i, j) denotes the probability of all partial global alignments of X and Y ending at position (i, j). F_M(i, j) is the probability of all partial global alignments with x_ialigned to y_j, F_y(i, j), is the probability of all partial global alignments with y_jaligned to a gap, and F_X(i, j) is the probability of all partial global alignments with x_ialigned to a gap. Accordingly, the partition function can be calculated recursively as follows:

F_{M} (i, j) = F (i - 1, j - 1) e^{W_{1} β s (x_{i}, y_{j}) + W_{2} S S (s s (x_{i}), s s (y_{j})) + W_{3} S A (s a (x_{i}), s a (y_{j}))}

F_{Y} (i, j) = F_{M} (i, j - 1) e^{β g a p} + F_{Y} (i, j - 1) e^{β e x t}

(2)

F_{X} (i, j) = F_{M} (i - 1, j) e^{β g a p} + F_{X} (i - 1, j) e^{β e x t}

F (i, j) = F_{M} (i, j) + F_{Y} (i, j) + F_{X} (i, j)

Subject to the constraint W₁ + W₂ + W₃ = 1.

In the formula above, s(x_i, y_j) is the amino acid similarity score between x_iand y_j. One element of the substitution matrix s, SS(ss(x_i), ss(y_j)) is the similarity score between the secondary structure (ss(x_i)) of residue x_iin protein X and that of residue y_jin protein Y according to the secondary structure similarity matrix SS, SA(sa(x_i), sa(y_j)) is the similarity score between the relative solvent accessibility (sa(x_i)) of residue x_iin protein X and that of residue y_jin protein Y according to the solvent accessibility similarity matrix SA. W₁, W₂, W₃ are weights used to control the influence of the amino acid substitution score, secondary structure similarity score, and solvent accessibility similarity score. The secondary structure and solvent accessibility can be automatically predicted by SSpro/ACCpro [41] (http://sysbio.rnet.missouri.edu/multicom_toolbox/) using a multi-threading technique implemented in MSACompro, or alternatively be provided by a user. The values of the three weights are set to 0.4, 0.5, and 0.1 by default, and can be adjusted by users. The ensembles of bidirectional recurrent neural network architectures in ACCpro are used to discriminate between two different states of relative solvent accessibility, higher or lower than the accessibility cutoff - 25% of the total surface area of a residue [42], corresponding to e or b. As in MSAprobs, β is a parameter measuring the deviation between suboptimal and optimal alignments, gap(gap ≤ 0) is the gap open penalty, and ext(ext ≤ 0) is the gap extension penalty.

We used the Gonnet 160 matrix as a substitution matrix to generate the similarity scores between two amino acids in proteins [43]. The 3 × 3 secondary structure similarity matrix SS contains the similarity scores of three kinds of secondary structures (E, H, C) as follows:

S S = [\begin{gathered} 100 \\ 010 \\ 001 \end{gathered}]

, where two identical secondary structures receive a score of 1 and different ones receive a score of 0.

The 2 × 2 solvent accessibility similarity matrix SA contains the similarity scores of two kinds of relative solvent accessibilities (e, b) as follows:

S A = [\begin{gathered} 10 \\ 01 \end{gathered}]

, where two identical solvent accessibilities receive a score of 1 and different ones receive a 0. It is worth noting that we used the simple identity scoring matrix for secondary structure and solvent accessibility here. Employing more advance scoring matrices defined in [44] may lead to further improvement. Each posterior residue-residue alignment probability element in the first kind of posterior probability matrix ( $P_{X Y}^{1}$ ) can be calculated from the partition function as:

\begin{array}{l} p^{1} (x_{i} ~ y_{j}) = \frac{F_{M} (i - 1, j - 1) F_{M}^{'} (i + 1, j + 1)}{F} • \\ e^{W_{1} β s (x_{i}, y_{j}) + W_{2} S S (s s (x_{i}), s s (y_{j})) + W_{3} S A (s a (x_{i}), s a (y_{j}))} \end{array}

(3)

, where $F_{M}' (i, j)$ denotes the partition function of all the reverse alignments starting from the position (n₁, n₂) till position (i, j) with x_ialigned to y_j.

As in MSAProbs, the second kind of pairwise probability matrix $P_{X Y}^{2}$ is calculated by a pair hidden Markov model (HMM) combining both Forward and Backward algorithm [4, 5, 45]. The pairwise probabilities can be generated under the guidance of pair HMM involving state emissions and transitions. $P_{X Y}^{2}$ is only derived from protein sequences without using secondary structure and solvent accessibility, which is different from PROMALS [15] that lets HMM emit both amino acids and secondary structure alphabets.

The final posterior probability matrix P_XYis calculated as the root mean square of the corresponding values in $P_{X Y}^{1}$ and $P_{X Y}^{2}$ as follows.

p (x_{i} ~ y_{j}) = \sqrt{\frac{p^{1} {(x_{i} ~ y_{j})}^{2} + p^{2} {(x_{i} ~ y_{j})}^{2}}{2}}

(4)

where p¹(x_i~ y_i) and p²(x_i~ y_i) denote a posterior probability element in two kinds of posterior probability matrices ( $P_{X Y}^{1}$ and $P_{X Y}^{2}$ ), respectively.

Construction of pairwise distance matrices based on pairwise posterior probabilities and pairwise contact map scores

The posterior probability matrix P_XYis used as a scoring function to generate a pairwise global alignment between sequences X and Y. The optimal global alignment score Opt(X,Y) of the global alignment is computed according to an optimal sub-alignment score matrix AS. The optimal sub-alignment score AS(i, j) denotes the score of the optimal sub-alignment ending at residues i and j in X and Y. The AS matrix is recursively calculated as:

A S (i, j) = \max {\begin{cases} A S (i - 1, j - 1) + P_{X Y} (x_{i} ~ y_{j}) \\ A S (i - 1, j) \\ A S (i, j - 1) \end{cases}

(5)

AS (n₁, n₂) is the optimal score of the full global alignment between X and Y, which is denoted as Optscore(X,Y).

In addition to the optimal alignment score, we introduce a contact map score, CMscore(X, Y), for the optimal pairwise alignment of X and Y, assuming that the spatially neighboring residues of two aligned residues should have a higher tendency to be aligned together. CMscore(X, Y) is calculated from the contact map correlation score matrix CMap_XYbased on the residue-residue contact map matrices CMap_Xand CMap_Yof X and Y.

Assuming the optimal global alignment of X and Y is represented as,

\begin{gathered} x_{1} x_{2} . . . . . . . - x_{m} . . . . . . x_{p} . . . . . . x_{n 1} \\ y_{1} - . . . . . . y_{k} y_{k + 1} . . . . . - . . . . . . y_{n 2} \end{gathered}

we can generate a new alignment after removing the pairs containing gaps:

\begin{gathered} x_{1} . . . . . . . x_{m} . . . . . . . . . . . . x_{n 1} \\ y_{1} . . . . . . y_{k + 1} . . . . . . . . . . . y_{n 2} \end{gathered}

, which can be denoted as

\begin{gathered} x_{1}^{'} x_{2}^{'} . . . . . . . . . . . . x_{n}^{'} \\ y_{1}^{'} y_{2}^{'} . . . . . . . . . . . y_{n}^{'} \end{gathered}

, where n is the length of the new alignment without gaps

From this alignment, we can construct two contact map matrices, CMap_Xand CMap_Y, shown below:

C M a p_{X} = [\begin{gathered} x_{11}^{'} x_{12}^{'} . . . . . . x_{1 n}^{'} \\ x_{21}^{'} x_{22}^{'} . . . . . . x_{2 n}^{'} \\ . . . . . . . . . . . . . . . . . . . \\ . . . . . . . . . . . . . . . . . . . \\ x_{n 1}^{'} x_{n 2}^{'} . . . . . . x_{n n}^{'} \end{gathered}]

(6)

C M a p_{Y} = [\begin{gathered} y_{11}^{'} y_{12}^{'} . . . . . . y_{1 n}^{'} \\ y_{21}^{'} y_{22}^{'} . . . . . . y_{2 n}^{'} \\ . . . . . . . . . . . . . . . . . . . \\ . . . . . . . . . . . . . . . . . . . \\ y_{n 1}^{'} y_{n 2}^{'} . . . . . . y_{n n}^{'} \end{gathered}]

x_{i j}^{'}

is the contact probability score between amino acid $x_{i}^{'}$ and $x_{j}^{'}$ in protein sequence X, and $y_{i j}^{'}$ is the contact probability score between amino acid $y_{i}^{'}$ and $y_{j}^{'}$ in protein sequence Y. The residue-residue contact probabilities are predicted from the sequence by NNcon [46] (http://sysbio.rnet.missouri.edu/multicom_toolbox/). The contact map correlation score matrix CMap_XYis calculated as the multiplication of CMap_Xand CMap_Y:

\begin{gathered} C M a p_{X Y} = C M a p_{X} \times C M a p_{Y} \\ = [\begin{gathered} x y_{11}^{'} x y_{12}^{'} . . . . x y_{1 n}^{'} \\ x y_{21}^{'} x y_{22}^{'} . . . . x y_{2 n}^{'} \\ . . . . . . . . . . . . . . . . . . . . . . \\ . . . . . . . . . . . . . . . . . . . . . . \\ x y_{n 1}^{'} x y_{n 2}^{'} . . . . x y_{n n}^{'} \end{gathered}] \end{gathered}

(7)

x y_{i i}^{'}

is the contact map score for an aligned residue pair (amino acid $x_{i}^{'}$ in protein X and amino acid $y_{i}^{'}$ in protein Y). The contact map score for the global alignment of two sequences X and Y is calculated as

\begin{gathered} C M s c o r e (X, Y) = \frac{1}{n^{2}} \sum_{i = 1}^{n} C M a p_{X Y} (i, i) \\ = \frac{1}{n^{2}} \sum_{i = 1}^{n} x y_{i i}^{'} = \frac{1}{n^{2}} \sum_{i = 1}^{n} \sum_{j = 1}^{n} x_{i j}^{'} y_{j i}^{'} \end{gathered}

(8)

In practice, we only need to calculate the diagonal values in CMap_XY.

Finally, we define the pairwise distance between sequences X and Y as

d (X, Y) = 1 - \frac{W_{4} O p t s c o r e (X, Y)}{min {n_{1}, n_{2}}} - W_{5} C M s c o r e (X, Y)

(9)

, where W₄ + W₅ = 1. The weights W₄ and W₅ are used to control the influence of sequences X and Y.

Construction of guide tree and transformation of posterior probability

Akin to MSAProbs [4], a guide tree is constructed by the UPGMA method that uses the linear combinatorial strategy [47]. The distance between a new cluster Z formed by merging clusters X and Y, and another cluster W is calculated as (10):

d (W, Z) = \frac{d (W, X) \times N u m (X) + d (W, Y) \times N u m (Y)}{N u m (X) + N u m (Y)}

(10)

In which Num(X) is the number of leafs in cluster X.

After the guide tree is constructed, sequences are weighted according to the schemes inferred in [4].

To reduce the bias of sampling similar sequences, we use a weighted scheme to transform the former posterior probability as

P_{X Y}^{'} = \frac{1}{w N} ((w_{X} + w_{Y}) P_{X Y} + \sum_{Z \in S, Z \neq X, Y} w_{z} P_{X Z} P_{Z Y})

(11)

w_Xand w_Yare, respectively, the weight of sequences X and Y, w_Zis the weight of a sequence Z other than X or Y in the given group of sequences, and wN is the sum of sequence weights in dataset S.

Combination of progressive and iterative alignment

We first use the guide tree to generate a multiple sequence alignment by progressively aligning two clusters of the most similar sequences together. As in MSAProbs [4], we also apply a weighted profile-profile alignment to align two clusters of sequences. The sequence weights are the same as in the previous step. The posterior alignment probability matrix of two clusters/profiles is averaged from the probability matrices of all sequence pairs (X, Y), where x and y are from the two different clusters. Formula (5) used to generate the global profile-profile alignment is based on the posterior alignment probability matrices of the profiles. In order to further improve the alignment accuracy, we then use a randomized iterative alignment to refine the initial alignment. This randomized iterative refinement randomly partitions the given sequence group S into two separate groups, and performs a profile-profile alignment of the two groups. The iterative refinement can be completed after 10 iterations by default, or a fixed number of iterations set by users. Generally speaking, the final progressive alignment orders sequences along the guide tree from closely related to distantly related. To improve the alignment accuracy, a final iterative alignment is applied to refine the results from progressive alignment. In addition, a multi-thread technology based on OpenMP is also used to improve the efficiency of the program [48].

Results and discussion

Evaluation of MSACompro and other tools on the standard benchmarks

We tested MSACompro in comparison to three benchmarks: BAliBASE, SABmark and OXBENCH, and evaluated the alignment results in terms of sum-of-pairs (SP) score and true column (TC) score. The SP score is the number of correctly aligned pairs of residue in the test alignment divided by the total number of aligned pairs of residues in core blocks of the reference alignment [49]. The TC score is the number of correctly aligned columns in the test alignment divided by the total number of aligned columns in core blocks of the reference alignment [49]. We used the application bali_score provided by BAliBASE 3.0 to calculate these scores. We compared MSACompro to 11 other MSA tools which do not have access to the structural information, including ClustalW 2.0.12, DIALIGN-TX 1.0.2 [27], FSA 1.15.5, MAFFT 6.818, MSAProbs 0.9.4, MUSCLE 3.8.31, Opal 0.2.0, POA 2, Probalign 1.3, Probcons and T-coffee 8.93. It is worth noting that a fair comparison between our method with these multiple sequence alignment methods without using structural features is not possible because these methods use less input information. So, the goal of comparison is to present the idea that structural information-based alignment may contain valuable information that is not available in sequence-based multiple sequence alignments and can therefore be a supplement to sequence-based alignments. And to make the evaluation more fair and comprehensive, we also compared MSACompro with four tools which use structural information, including MUMMALS 1.01 [14], PROMALS [15] and PROMALS3D [7].

To understand how various parameters of MSACompro affect alignment accuracy, some experiments were carried out to evaluate these variants based on two algorithm changes: (1) combining amino acids, secondary structure, and relative solvent accessibility information into the partition function calculation using respective weights for each of them; (2) computing the pairwise distance from both the pairwise posterior probability matrices and the pairwise contact map similarity matrices by introducing the weight wc for contact map information. To optimize the parameters, we used BAliBASE 3.0 data sets as training sets, and SABmark 1.65 and OXBENCH data sets as testing sets. Firstly, we focused on the effect of secondary structure and solvent accessibility information by testing different values of weight w₁ for amino acid similarity and weight w₂ for secondary structure information on BAliBASE 3.0 data sets. MSACompro worked wholly the best if the weight w₁ for amino acid similarity and the weight w₂ for secondary structure information were 0.4 and 0.5, respectively. Since the sum of w₁, w₂ and w_c is 1, we can deduce that w_c is 0.1 if w₁ and w₂ are 0.4 and 0.5. Then we focused on the effect of residue-residue contact map information under two different scenarios: using secondary structure and relevant solvent accessibility information by keeping the w₁, w₂, and w₃ at their optimum values (0.4, 0.5, 0.1), or excluding that information by setting both w₂ and w₃ as 0. Evaluation results on BAliBASE 3.0 database were found to improve the most when w_c is 0.9 by integrating both secondary structure and relevant solvent accessibility information. Additionally, to avoid over-fitting, we tested MSACompro against SABmark 1.65 and OXBENCH data sets using this set of parameters independently, and found that a significant improvement was also gained in comparison to other leading protein multiple sequence alignment tools. More details can be found in the next section, "A comprehensive study on the effect of predicted structural information on the alignment accuracy". Consequently, the weights w₁, w₂, w₃ and w_c are respectively set at 0.4, 0.5, 0.1 and 0.9 in MSACompro by default. All other tools were also evaluated under default parameters.

Firstly, we evaluated these methods on BAliBASE [16] - the most widely used multiple sequence alignment benchmark. The latest version, BAliBASE 3.0, contains 218 reference alignments, which are distributed into five reference sets. Reference set 1 is a set of equal-distant sequences, which are organized into two reference subsets, RV11 and RV12. RV11 contains sequences sharing >20% identity and RV12 contains sequences sharing 20% to 40% identity. Reference set 2 contains families with >40% identity and a significantly divergent orphan sequence that shares <20% identity with the rest of the family members. Reference set 3 contains families with >40% identity that share <20% identity between each two different sub-families. Reference set 4 is a set of sequences with large N/C-terminal extensions. Reference set 5 is a set of sequences with large internal insertions. Tables 1, 2, and 3 report the mean SP scores and TC scores of MSACompro and the tools without using structural information for the six subsets and the whole database. All the scores in the tables are multiplied by 100, and the highest scores in each column are marked in bold. The results show that MSACompro received the highest SP and TC scores on the whole database and all the subsets except for the SP score for the subset RV40. In some cases, MSACompro's improvement was substantial.

Table 1 Total SP scores on the full-length BAliBASE 3.0 subsets.

MSACompro: protein multiple sequence alignment using predicted secondary structure, solvent accessibility, and residue-residue contacts

Abstract

Background

Results

Conclusion

Background

Method

Construction of pairwise posterior probability matrices based on amino acid sequence, secondary structure and solvent accessibility information

Construction of pairwise distance matrices based on pairwise posterior probabilities and pairwise contact map scores

Construction of guide tree and transformation of posterior probability

Combination of progressive and iterative alignment

Results and discussion

Evaluation of MSACompro and other tools on the standard benchmarks

A comprehensive study of the effect of predicted structural information on the alignment accuracy

I. Effect of secondary structure information

II. Effect of relative solvent accessibility information

III. Effect of residue-residue contact map information

IV. Effect of combining secondary structure and solvent accessibility information

V. Effect of using contact map information together with secondary structure and solvent accessibility information

Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Authors' contributions

Authors’ original submitted files for images

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us