Abstract
Background
Sequence alignment has become an indispensable tool in modern molecular biology research, and probabilistic sequence alignment models have been shown to provide an effective framework for building accurate sequence alignment tools. One such example is the pair hidden Markov model (pairHMM), which has been especially popular in comparative sequence analysis for several reasons, including their effectiveness in modeling and detecting sequence homology, model simplicity, and the existence of efficient algorithms for applying the model to sequence alignment problems. However, despite these advantages, pairHMMs also have a number of practical limitations that may degrade their alignment performance or render them unsuitable for certain alignment tasks.
Results
In this work, we propose a novel scheme for comparing and aligning biological sequences that can effectively address the shortcomings of the traditional pairHMMs. The proposed scheme is based on a simple messagepassing approach, where messages are exchanged between neighboring symbol pairs that may be potentially aligned in the optimal sequence alignment. The messagepassing process yields probabilistic symbol alignment confidence scores, which may be used for predicting the optimal alignment that maximizes the expected number of correctly aligned symbol pairs.
Conclusions
Extensive performance evaluation on protein alignment benchmark datasets shows that the proposed messagepassing scheme clearly outperforms the traditional pairHMMbased approach, in terms of both alignment accuracy and computational efficiency. Furthermore, the proposed scheme is numerically robust and amenable to massive parallelization.
Background
Sequence alignment has become an indispensable tool in modern molecular biology research, as it provides an effective and intuitive way of comparing and analyzing biological sequences. Given a set of biological sequences, the primary objective of sequence alignment is to predict the best overall mapping between the sequences, which accurately aligns the homologous regions that are embedded in them. This provides an effective means for detecting conserved sequence regions with potentially important functional roles. The concept of sequence alignment has had diverse applications in biomedical research [17], which include homology search, function and structure prediction of biomolecules, phylogenetic analysis, and detecting sequence motifs, among others.
Typically, sequence alignment is carried out by formulating and solving an optimization problem  either implicitly or explicitly  where the goal is to maximize an objective function that measures the overall quality of the sequence alignment. For example, one simple way of aligning a sequence pair would be to score each potential alignment by assigning a "substitution score" to every aligned symbol pair and penalty scores for gaps and then find the optimal alignment that maximizes the overall score through dynamic programming [1]. In the past, various ad hoc scoring schemes have been proposed to obtain intuitive and biologically meaningful sequence alignment results. As an alternative to heuristic scoring schemes, there have been also research efforts to develop probabilistic models for sequence alignment that can be used to evaluate and compare potential alignments and to estimate the symboltosymbol alignment probabilities.
Examples of such probabilistic schemes include the pair hidden Markov models (pairHMMs) [1] and the partition function based scheme [8]. Given two biological sequences, these methods can be used to estimate the posterior symbol alignment probability for each symbol pair that may be aligned in the final sequence alignment. Based on the estimated probabilities, we can predict the optimal sequence alignment that contains the largest expected number of correctly aligned symbol pairs, rather than an alignment that maximizes an ad hoc score. This is typically referred to as the maximum expected accuracy (MEA) alignment [911], and as before, it can be also found through dynamic programming.
Among a number of probabilistic sequence alignment models, pairHMMs have been especially popular, and they have been widely adopted by many multiple sequence alignment (MSA) algorithms, including ProbCons [9] and PicXAA [10]. Despite the simplicity of the model, pairHMMs have been shown to be very effective in modeling sequence homology, as reflected in the wellrounded overall performance of various MSA algorithms that utilize the symbol alignment probabilities estimated by pairHMMs. Furthermore, these probabilities can be estimated in a relatively efficient manner, making the pairHMMs an attractive choice for various sequence alignment problems. However, pairHMMs also have a number of shortcomings, which may negatively affect their alignment performance or make them impractical for certain alignment tasks.
In this paper, we propose a novel scheme for comparing and aligning biological sequences that can effectively address the limitations of pairHMMs. The proposed scheme computes probabilistic symbol alignment confidence scores based on a simple and computationally efficient messagepassing approach. As we will demonstrate in this paper, this messagepassing scheme has a number of important advantages over the traditional pairHMMs and it clearly outperforms pairHMMs in terms of both speed and accuracy on protein alignment benchmark datasets.
Methods
A brief overview of pair hidden Markov models
The pairHMM [1,12] is a generative sequence model that can simultaneously generate a pair of aligned symbol sequences. This is different from the traditional HMMs, which generate only a single symbol sequence at a time [13]. Figure 1 shows two examples of pairHMMs that are widely used in biological sequence analysis. As shown in Figure 1, a typical pairHMM consists of three hidden states I_{x}, I_{y}, and M, which are used to model insertions in sequence x, insertions in sequence y, and matched (i.e., aligned) symbols in both sequences, respectively. The pairHMM generates an aligned sequence pair (x, y) by making transitions between the hidden states according to the specified state transition probabilities. At state I_{x}, the model emits a symbol only to sequence x, while at I_{y}, a symbol is emitted only to sequence y. On the other hand, at state M, the model emits a pair of aligned symbols, where one symbol is added to x and the other symbol is added to y. Figure 1(C) gives an example of a sequence pair (x, y) that is generated by a pairHMM. In this example, the underlying hidden state sequence that gives rise to the two sequences x = AACCG and y = CCGTT is I_{x}I_{x}MMMI_{y}I_{y}. This indicates that the first two symbols (i.e., AA) in x and the last two symbols in y (i.e., TT) are "insertions," which do not have any matching counterpart in the other sequence, while the last three symbols in x and the first three symbols in y (i.e., CCG in both sequences) are jointly generated by the pairHMM, hence closely match each other. As we can see from this example, we can unambiguously identify the alignment of a given sequence pair (x, y), once the underlying hidden state sequence yielding the sequence pair is known. Of course, the hidden state sequence is generally not known, but there exist efficient algorithms that can be used for its prediction. For example, we can use the Viterbi algorithm [14] to predict the optimal hidden state sequence that maximizes the observation probability of the sequence pair (x, y). Alternatively, we can also predict the state sequence that maximizes the expected number of correct states, by first estimating the alignment probabilities between the symbols in x and y through the forward and backward procedures [13] and then applying the NeedlemanWunsch algorithm [15]. This will lead to the MEA alignment between the two sequences x and y.
Figure 1. Pair hidden Markov models. (A) The state transition diagram of a widely used pairHMM. (B) An alternative pairHMM implementation that does not allow transitions between the two insertion states I_{x }and I_{y}. (C) An example of a sequence pair (x, y) that is generated by a pairHMM.
Limitations of pairHMMs
Although the hidden state sequence of a pairHMM unambiguously points to a specific sequence alignment, this is not necessarily true the other way around. In fact, several different state sequences can lead to the same sequence alignment, hence we may not always be able to unambiguously determine the underlying state sequence for a given pairwise sequence alignment. For example, let us consider two sequences x = AAACGG and y = AAATTA. Suppose the "true" alignment aligns only the first three symbols (i.e., AAA) of x and y, hence the last three symbols in the respective sequences are regarded as insertions that do not have any matching counterpart in the other sequence. This is illustrated below, where the solid lines correspond to the aligned symbols:
For the pairHMM shown in Figure 1(A), any hidden state sequence such that s_{1 }= s_{2 }= s_{3 }= M and is a permutation of I_{x }I_{x }I_{x }I_{y }I_{y }I_{y }would lead to the sequence alignment shown in (1). When using this pairHMM for predicting the optimal alignment of a sequence pair with the largest probability, this ambiguity may lead to performance degradation as these potential state sequences compete against each other. For this reason, it is generally more desirable to estimate the symbol alignment probabilities via the pairHMM by considering all potential alignments and state sequences and use the estimated probabilities to find the MEA alignment that is expected to have the maximum number of correctly aligned symbols [911]. However, the aforementioned ambiguity also negatively affects the quality of the estimated symbol alignment probabilities, which is especially noticeable for sequence pairs with low percentage identity. In some cases, the alternative pairHMM shown in Figure 1(B) is used to avoid such ambiguity. This alternative pairHMM blocks transitions between the insertion states I_{x }and I_{y}, thereby prohibiting the model from inserting unaligned symbols to both sequences. For example, the alignment shown in (1) would not be allowed based on this alternative pairHMM. However, due to this restriction, the pairHMM in Figure 1(B) has a relatively stronger tendency to align unrelated sequence regions by treating them as mutations. This may again negatively affect the quality of the symbol alignment probabilities estimated based on the pairHMM.
Another potential drawback of pairHMMs is that the associated algorithms (i.e., the Viterbi, forward, and backward algorithms) can become numerically unstable for long sequences. Application of pairHMMs to biological sequence analysis involves computing extremely small probabilities, which decrease exponentially with the sequence length. For example, based on the pairHMM that was used in [9], the observation probability (i.e., the probability that the HMM may generate a given sequence pair) of a protein pair is typically in the order of 10^{230 }for proteins of length 80, 10^{280 }for proteins of length 100, and 10^{320 }for proteins of length 120. As a result, pairHMM algorithms are prone to underflow errors, unless they are carefully implemented to keep them numerically robust. So far, a number of schemes have been proposed to address this issue, such as using log transformations of the probabilities or normalizing the probabilities to keep them within a reasonable numerical range, and have been shown to work well for relatively long sequences [1]. However, log transformations can make the forward and backward algorithms considerably slower, and the normalization approach can still lead to underflow errors as the sequences get longer.
One further disadvantage of pairHMMs is that the algorithms that are used with the model cannot be easily parallelized. Although the Viterbi, forward, and backward algorithms for pairHMMs are relatively efficient, they are still computationally expensive to be used with very long sequences. Moreover, as the algorithms are not amenable to massive parallelization, this makes the pairHMMs not suitable for largescale sequence analysis tasks, such as the whole genome alignment, despite their superior performance compared to other heuristic methods.
A messagepassing scheme for estimating symbol alignment confidence scores
Here, we propose a novel method for aligning biological sequences that can effectively address the aforementioned shortcomings of pairHMMs. The proposed method is based on a messagepassing scheme, where messages are iteratively exchanged between neighboring symbol pairs to estimate the level of confidence for potential pairwise symbol alignments. The main underlying motivation is to develop an "analytical" method that can directly estimate the symbol alignment probabilities, without specifically modeling symbol insertions and deletions. This stands in contrast to the pairHMM approach, which is essentially based on a "generative" sequence model that tries to explicitly model symbol insertions/deletions, in addition to symbol alignments. As discussed before, modeling symbol insertions in pairHMMs can lead to subtle issues with potentially negative effects, and considering that our ultimate goal lies in finding an accurate sequence alignment through effective estimation of the symbol alignment probabilities, a method that can directly estimate these probabilities without explicitly modeling insertions/deletions would be desirable.
Suppose x = x_{1}x_{2 }⋯ x_{L }and y = y_{1}y_{2 }⋯ y_{M }are the two sequences to be aligned. We define c_{xy }(i, j) as the symbol alignment confidence score between x_{i }(the ith symbol in x) and y_{j }(the jth symbol in y). The score c_{xy }(i, j) provides a quantitative measure of confidence as to whether x_{i }and y_{j }should be aligned to each other or not, and we assume c_{xy }(i, j) ∝ P(x_{i }~ y_{j}x, y), where P (x_{i }~ y_{j}x, y) is the posterior symbol alignment probability between x_{i }and y_{j }given the sequences x and y. We estimate the alignment confidence score by iteratively passing messages between neighboring symbol pairs, where each symbol pair (x_{i}, y_{j}) corresponds to a potential symbol alignment in the true (unknown) sequence alignment between x and y. For example, during the estimation process, the symbol pair (x_{i}, y_{j}) will exchange messages with its two neighbors (x_{i1}, y_{j1}) and (x_{i+1}, y_{j+1}), and similarly, the pair (x_{i+1}, y_{j+1}) will exchange messages with (x_{i}, y_{j}) and (x_{i+2}, y_{j+2}). The messagepassing process is illustrated in Figure 2, where the solid lines indicate the messages that are used to update the alignment confidence score c_{xy}(i, j) of the symbol pair (x_{i}, y_{j}). The dashed lines correspond to messages that are used to update the confidence scores of other symbol pairs.
Figure 2. Illustration of the proposed messagepassing scheme. At iteration n, the alignment confidence score c_{xy}(i, j) of the symbol pair (x_{i}, y_{j}) is updated based on the messages received from its neighbors (x_{i1}, y_{j1}) and (x_{i+1}, y_{j+1}) and the joint occurrence probability P(x_{i}, y_{j}) of the symbols x_{i }and y_{j}. Solid lines indicate the messages that are used to update c_{xy}(i, j), while the dashed lines correspond to messages that are used to update the alignment confidence scores of other symbol pairs.
The pseudocode of the proposed messagepassing algorithm is as follows:
STEP1 Initialize c_{xy }(i, j).
STEP2 Update the alignment confidence score:
STEP3 Normalize c_{xy}(i, j).
STEP4 If c_{xy}(i, j) has converged, then terminate the algorithm.
Otherwise, go to STEP2.
In STEP1, we first initialize the alignment confidence score c_{xy}(i, j), where we can simply use random initialization. If a preliminary sequence alignment of x and y is available (e.g., obtained from a simple heuristic method), we can also initialize the score based on this alignment such that c_{xy}(i, j) = 1 if x_{i }and y_{j }are aligned, and c_{xy}(i, j) = 0 otherwise. Next, in STEP2, the alignment confidence score c_{xy}(i, j) of the symbol pair (x_{i}, y_{j}) is updated based on the scores of its two neighbors (x_{i1}, y_{j1}) and (x_{i+1}, y_{j+1}). Note that the score is set to c_{xy}(i, j) = 0 if i ∉ {1, ⋯ , L} or j ∉ {1, ⋯ , M}. P (x_{i}, y_{j}) is the joint occurrence probability of the symbol pair (x_{i}, y_{j}), which is essentially equivalent to the joint emission probability of an aligned symbol pair (x_{i}, y_{j}) at the match state M of a pairHMM. It should be noted that this probability P (x_{i}, y_{j}) is not locationdependent and is simply determined by the symbols x_{i }and y_{j}. The weight parameter λ ∈ [0, 1] is used to balance the contribution from the neighbors and that from the joint probability of (x_{i}, y_{j}) in estimating the alignment confidence score. A large λ gives more weight to the "messages" received from the neighbors in estimating the scores, which tends to penalize gaps more heavily, and it generally leads to longer aligned regions with fewer gaps. On the contrary, a small λ gives more weight to the joint symbol occurrence probability P (x_{i}, y_{j}) while giving less weight to the messages received from the neighbors, which tends to be more lenient to gaps. Once the symbol alignment confidence score c_{xy}(i, j) is updated for all i = 1, ⋯ , L and j = 1, ⋯ , M, we normalize the scores to keep them within a proper numerical range, as shown in STEP3. For example, a simple way would be to divide the score matrix C = [(c_{xy}(i, j)] by its matrix norm to normalize the confidence scores. After normalization, the updated scores are compared to the scores in the last iteration, and the algorithm terminates if the specified convergence criterion has been met. Otherwise, the algorithm goes back to STEP2 and repeats the messagepassing process.
Results and Discussion
Dataset and experimental setup
In order to evaluate the performance of the proposed messagepassing scheme, we carried out pairwise sequence alignment experiments based on the BAliBASE 3.0 protein alignment benchmark [16]. BAliBASE is arguably the most widely used benchmark for multiple sequence alignment, and it has been utilized by most multiple sequence alignment algorithms for assessing their performance. The benchmark consists of five reference sets, where Reference 1 consists of two subsets: V1 and V2. Each reference set consists of multiple sequence alignments that satisfy specific criteria, such that different reference sets can be used to test the performance of multiple sequence alignment algorithms under different conditions. For example, each alignment in Reference 2 consists of sequences that share reasonably high identity (> 40%) and "orphan sequences" that share little identity (< 20%) to other sequences in the alignment. Reference sets 4 and 5 are constructed such that every sequence has at least one other sequence in the same alignment whose identity exceeds 20%. Sequences in Reference 4 and Reference 5 may contain large N/Cterminal extensions or internal insertions, respectively. Further details of the BAliBASE 3.0 benchmark can be found in [16].
For every sequence family in BAliBASE 3.0, we performed pairwise sequence alignment for all possible sequence pairs in the given family. The pairwise alignment was performed in the following manner. First, we estimated the probabilistic symbol alignment confidence score using the proposed messagepassing scheme. In our experiments, we used three different values of λ (= 0.25, 0.5, and 0.75) to investigate the effect of λ on the overall sequence alignment performance. For the joint symbol occurrence probability P (x_{i}, y_{j}), we used the joint emission probability (at state M) of the pairHMM that was used in [9]. At the end of each iteration, we normalized the alignment confidence score by dividing the confidence score matrix C by the matrix 2norm: C ← C/C_{2}. We terminated the messagepassing process if , where c_{xy}(i, j) is the current score and is the score obtained in the previous iteration. Once the scores converged, based on our assumption that c_{xy }(i, j) ∝ P (x_{i }~ y_{j }x, y), we used the confidence score c_{xy }(i, j) to find the MEA alignment through dynamic programming. The predicted alignment was compared to the benchmark alignment in BAliBASE 3.0 to compute the sensitivity and the positive predictive value , where TP is the number of correctly aligned symbol pairs, FP is the number of incorrectly aligned pairs, and FN is the number of symbol pairs that are aligned in the benchmark alignment but not aligned in the predicted alignment. For comparison, we repeated similar experiments using the pairHMM with the same set of parameters as the one used in [9].
Performance of the proposed messagepassing scheme
Table 1 summarizes the pairwise sequence alignment performance of the proposed messagepassing scheme and the traditional pairHMM approach. Each row shows the evaluation results on each of the six reference sets (i.e., RV11, RV12, RV20, RV30, RV40, RV50) in BAliBASE 3.0. For each reference set, we estimated the average SN, PPV, and CPU time (for estimating the alignment scores/probabilities) of different alignment schemes based on all possible pairwise sequence alignments: 943 alignments for the reference set RV11, 2,335 alignments for RV12, 50,062 alignments for RV20, 76,370 alignments for RV30, 23,445 alignments for RV40, and 7,538 alignments for RV50. All experiments were performed using Matlab on a MacPro workstation with two 2.8 GHz QuadCore Intel Xeon processors and 32GB memory.
Table 1. Pairwise sequence alignment performance evaluated on the BAliBASE 3.0 benchmark.
From Table 1, we can clearly see that the proposed messagepassing scheme significantly outperforms the pairHMM approach in terms of SN and PPV, for all three values of λ. For example, the messagepassing scheme achieved up to 0.23 higher SN and 0.09 higher PPV for λ = 0.25, and up to 0.37 higher SN and 0.19 higher PPV for λ = 0.75. Our experiments showed that a larger λ tends to yield more accurate alignments, while a smaller λ tends to make the algorithm converge faster, hence computationally more efficient. For example, when the weight parameter was set to λ = 0.25, the messagepassing scheme was around 2.3 ~ 2.5 times faster than the pairHMM, while still yielding much more accurate alignments.
The results in Table 1 demonstrate that, on average, the proposed messagepassing scheme considerably improves the quality of sequence alignment over the traditional pairHMM approach. In order to see whether the proposed scheme also leads to a consistent improvement for most sequence pairs, we calculated the difference between SN_{MP }(the sensitivity of the messagepassing scheme) and SN_{HMM }(the sensitivity of the pairHMMbased approach) for every pairwise sequence alignment that we have performed in our experiments. Similarly, we calculated the difference between PPV_{MP }(the PPV of the messagepassing scheme) and PPV_{HMM }(the sensitivity of the pairHMM approach) for all sequence pairs in BAliBASE 3.0. Figure 3 shows the distributions of SN_{MP } SN_{HMM }and PPV_{MP } PPV_{HMM }for all sequence pairs. To avoid any bias from unsuccessful alignments, sequence pairs for which neither method yielded an alignment with at least one correct symbol alignment were excluded. The plots in the left column of Figure 3 show the distributions of SN_{MP } SN_{HMM}, and those in the right column show the distributions of PPV_{MP } PPV_{HMM}. The results obtained from the same reference set are shown in the same row, where the first row shows the results on RV11 and the last row shows the results on RV50. As we can see in Figure 3, every single distribution shown in the figure has a much larger probability mass in the righthalf plane, which clearly demonstrates that the proposed messagepassing scheme consistently outperforms the pairHMMbased approach for most (though not all) sequence pairs. In many cases, the improvements in SN and PPV were quite significant (0.4 ~ 0.8), which shows that the proposed scheme can often find an accurate sequence alignment even when the pairHMM has difficulty aligning the sequences.
Figure 3. Performance comparison between the proposed messagepassing scheme and the traditional pairHMM approach. The plots in the left column show the distributions of the sensitivity difference SN_{MP }SN_{HMM }between the messagepassing scheme and the pairHMMbased approach. In the right column, the distributions of the difference between the positive predictive values PPV_{MP } PPV_{HMM }of the two schemes are shown. Each row shows the evaluation results obtained from each of the six reference sets in BAliBASE 3.0.
Conclusions
In this paper, we proposed a novel method for sequence alignment based on an efficient messagepassing approach. Given two biological sequences, the proposed method estimates the symbol alignment confidence scores for all possible symbol pairs. These scores are iteratively computed by exchanging messages between neighboring symbol pairs, where empirical evidence shows that these scores quickly converge within several iterations. The proposed messagepassing scheme effectively addresses a number of limitations of the traditional pairHMMbased approach, and extensive performance assessment based on BAliBASE 3.0 shows that the proposed scheme consistently outperforms the pairHMM approach, both in terms of alignment accuracy and computational efficiency. Considering that pairHMMs have been widely adopted by many modern multiple sequence alignment algorithms [911], the proposed scheme has potentials to further improve the current stateoftheart. Furthermore, the proposed scheme is numerically stable even for extremely long sequences. Unlike the pairHMM approach, there is no global measure or quantity (such as the observation probability P (x, y) of the entire sequence pair) to be estimated, and the exchanged messages (i.e., symbol alignment confidence scores) are normalized after each iteration, which ensures that they lie within a reasonable numerical range. Finally, the simple iterative estimation process  in which the neighboring symbol pairs only exchange "local" messages  makes the proposed message passing scheme amenable to massive parallelization through the utilization of modern GPU (graphics processing unit) architecture. These characteristics open up the possibility of applying the proposed messagepassing scheme to accurate probabilistic alignment of genomescale sequences, which has not been possible using traditional pairHMMs.
Finally, it is worth noting that the formula that is used to update c_{xy}(i, j) in the proposed messagepassing algorithm bears conceptual similarity to the eigenvalue equation used by the network alignment algorithm called IsoRank [17] for estimating the functional similarity between proteins across different proteinprotein interaction (PPI) networks. As demonstrated in [18,19], techniques that were originally developed for sequence alignment may also have potentials to improve network alignment methods. Conversely, techniques used in network alignment may also lead to better sequence alignment methods. For example, the scoring scheme used by IsoRank can be viewed as a random walk [20], and it was shown that the use of a different random walk scheme can lead to more accurate network alignment results [19]. Similarly, it may be possible to modify the update formula for c_{xy}(i, j) to further improve the performance of the proposed messagepassing scheme, and we are currently in the process of investigating several different implementations.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
BJY conceived the idea, performed the simulations, analyzed the results, and wrote the paper.
Acknowledgements
This work was supported by the National Science Foundation through NSF Award CCF1149544.
Declarations
Publication of this article was funded by the National Science Foundation through NSF Award CCF1149544.
This article has been published as part of BMC Genomics Volume 15 Supplement 1, 2014: Selected articles from the Twelfth Asia Pacific Bioinformatics Conference (APBC 2014): Genomics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcgenomics/supplements/15/S1.
References

Durbin R, Eddy SR, Krogh A, Mitchison G: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press; 1998.

Phillips A, Janies D, Wheeler W: Multiple sequence alignment in phylogenetic analysis.
Mol Phylogenet Evol 2000, 16:317330. PubMed Abstract  Publisher Full Text

Cuff JA, Barton GJ: Application of multiple sequence alignment profiles to improve protein secondary structure prediction.
Proteins 2000, 40:502511. PubMed Abstract  Publisher Full Text

Notredame C: Recent progress in multiple sequence alignment: a survey.
Pharmacogenomics 2002, 3:131144. PubMed Abstract  Publisher Full Text

Edgar RC, Batzoglou S: Multiple sequence alignment.
Curr Opin Struct Biol 2006, 16:368373. PubMed Abstract  Publisher Full Text

Pei J: Multiple protein sequence alignment.
Curr Opin Struct Biol 2008, 18:382386. PubMed Abstract  Publisher Full Text

Kumar S, Filipski A: Multiple sequence alignment: in pursuit of homologous DNA positions.
Genome Res 2007, 17:127135. PubMed Abstract  Publisher Full Text

Miyazawa S: A reliable sequence alignment method based on probabilities of residue correspondences.
Protein Eng 1995, 8(10):9991009. PubMed Abstract  Publisher Full Text

Do CB, Mahabhashyam MS, Brudno M, Batzoglou S: ProbCons: Probabilistic consistencybased multiple sequence alignment.
Genome Res 2005, 15(2):330340. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Sahraeian SM, Yoon BJ: PicXAA: greedy probabilistic construction of maximum expected accuracy alignment of multiple sequences.
Nucleic Acids Res 2010, 38(15):49174928. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Hamada M, Asai K: A classification of bioinformatics algorithms from the viewpoint of maximizing expected accuracy (MEA).
J Comput Biol 2012, 19(5):532549. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Yoon BJ: Hidden Markov models and their applications in biological sequence analysis.
Curr Genomics 2009, 10(6):402415. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Rabiner LR: A tutorial on hidden Markov models and selected applications in speech recognition.
Proceedings of the IEEE 1989, 77(2):257286. Publisher Full Text

Viterbi A: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm.
Information Theory, IEEE Transactions on 1967, 13(2):260269.

Needleman SB, Wunsch CD: A general method applicable to the search for similarities in the amino acid sequence of two proteins.
J Mol Biol 1970, 48(3):443453. PubMed Abstract  Publisher Full Text

Thompson JD, Koehl P, Ripp R, Poch O: BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark.
Proteins 2005, 61:127136. PubMed Abstract  Publisher Full Text

Singh R, Xu J, Berger B: Global alignment of multiple protein interaction networks with application to functional orthology detection.
Proc Natl Acad Sci USA 2008, 105(35):1276312768. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Flannick J, Novak A, Srinivasan BS, McAdams HH, Batzoglou S: Graemlin: general and robust alignment of multiple large interaction networks.
Genome Res 2006, 16(9):11691181. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Sahraeian SM, Yoon BJ: SMETANA: Accurate and Scalable Algorithm for Probabilistic Alignment of LargeScale Biological Networks.
PLoS ONE 2013, 8(7):e67995. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Yoon BJ, Qian X, Sahraeian SME: Comparative analysis of biological networks: Hidden Markov model and Markov chainbased approach.