Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Symposium of Computations in Bioinformatics and Bioscience (SCBB06)

Open Access Research

Maximum common subgraph: some upper bound and lower bound results

Xiuzhen Huang1*, Jing Lai2 and Steven F Jennings3

Author Affiliations

1 Department of Computer Science, Arkansas State University, State University, Arkansas 72467, USA

2 Department of Applied Science, University of Arkansas at Little Rock, Little Rock, Arkansas 72204, USA

3 Department of Information Science, University of Arkansas at Little Rock, Little Rock, Arkansas 72204, USA

For all author emails, please log on.

BMC Bioinformatics 2006, 7(Suppl 4):S6  doi:10.1186/1471-2105-7-S4-S6

The electronic version of this article is the complete one and can be found online at:


Published:12 December 2006

© 2006 Huang et al; licensee BioMed Central Ltd

This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background

Structure matching plays an important part in understanding the functional role of biological structures. Bioinformatics assists in this effort by reformulating this process into a problem of finding a maximum common subgraph between graphical representations of these structures. Among the many different variants of the maximum common subgraph problem, the maximum common induced subgraph of two graphs is of special interest.

Results

Based on current research in the area of parameterized computation, we derive a new lower bound for the exact algorithms of the maximum common induced subgraph of two graphs which is the best currently known. Then we investigate the upper bound and design techniques for approaching this problem, specifically, reducing it to one of finding a maximum clique in the product graph of the two given graphs. Considering the upper bound result, the derived lower bound result is asymptotically tight.

Conclusion

Parameterized computation is a viable approach with great potential for investigating many applications within bioinformatics, such as the maximum common subgraph problem studied in this paper. With an improved hardness result and the proposed approaches in this paper, future research can be focused on further exploration of efficient approaches for different variants of this problem within the constraints imposed by real applications.

Background

Introduction

Of the many challenging problems related to understanding the biological function of DNA, RNA, proteins, and metabolic and signalling pathways, one of the most important is comparing the structure of different molecules. The hypothesis is that structure determines function and therefore it should follow that molecules with similar structure should have similar function. Evaluating the similarity of structures can be reduced to a comparison of a set of abstracted graphs if the biological structures can be abstracted as graphs.

Using bioinformatic techniques, biological structure matching can be formulated as a problem of finding the maximum common subgraph. The solution to this problem has important practical applications in many areas of bioinformatics as well as in other areas, such as pattern recognition and image processing [1-3]. For example, protein threading, an effective method to predict protein tertiary structure [4-8], and RNA structural homology searching, a method for annotating and identifying new non-coding RNAs [9-12], both align a target structure against structure templates in a template database.

Song et al [13] makes the following definitions and proposes the following graphical models for RNA structural homology searching: A structural unit in a biopolymer sequence is a stretch of contiguous residues (nucleotides or amino acids). A non-structural stretch between two consecutive structural units is called a loop. A structure of the sequence is characterized by interactions among structural units. For example, structural units in a tertiary protein are α helices and β strands, called cores. Given a biopolymer sequence, a structure graph H = (V, E, A) can be defined such that each vertex in V(H) represents a structural unit, each edge in E(H) represents the interaction between two structural units, and each arc in A(H) represents the loop ended by two structural units. Similarly, the target sequence can also be represented as a mixed graph G, called a sequence graph. Based on the graphical representations, the structure-sequence alignment problem can be formulated as the problem of finding in the sequence graph G a subgraph isomorphic to the structure graph H such that the objective function optimizes the alignment score.

Problem Definition

Throughout this paper, we will use the basic definitions and terminology from [1]: All graphs are simple, undirected graphs. Two graphs are isomorphic if there is a one-to-one correspondence between their vertices and there is an edge between two vertices in one graph if and only if there is an edge between the two corresponding vertices in the other graph. If edge (u, v) is an edge connecting u and v, then an induced subgraph G' of a graph G = (V, E) consists of a vertex subset V' ⊆ V and for all edges (u, v) ∈ E where u, v ∈ V'. A graph G12 is a common induced subgraph of two given graphs G1 and G2 if G12 is isomorphic to one induced subgraph G'1 of G1 as well as one induced subgraph G'2 of G2. A maximum common induced subgraph (MCIS) of two given graphs G1 and G2 is the common induced subgraph G12 with the maximum number of vertices. Similarly, the maximum common edge subgraph (MCES) is a subgraph with the maximum number of edges common to the two given graphs. The MCIS (or MCES) between two graphs can be further divided into a connected case and a disconnected case. All the different cases of the problem are useful within different biological contexts.

Figure 1 gives an illustration of MCIS of two graphs. In this figure, the maximum common induced subgraph of G1 and G2 contains four vertices (2, 3, 4 and 5) and the maximum common edge subgraph of them involves five vertices (1 through 5).

thumbnailFigure 1. MCIS of two graphs. For G1 and G2, the maximum common induced subgraph of them contains four vertices, and the maximum common edge subgraph of them involves five vertices.

MCES can be transformed into a formulation of MCIS. Interested readers are referred to [1] for details of the transformation. Here we focus on the maximum common induced subgraph (MCIS) problem. For convenience, we call it the maximum common subgraph problem.

The maximum common subgraph problem is NP-complete [14] and therefore polynomial-time algorithms for it do not exist unless P = NP. In fact, the maximum common subgraph problem is APX-hard [15] which means that it has no constant ratio approximation algorithms. This problem is a famous combinatorial intractable problem. Approaches for the maximum common subgraph problem and different variants of this problem are intensively studied in the literature [1].

In this paper, we derive a strong lower bound result for the maximum common subgraph problem in the light of the current research progress in the research area of parameterized computation. We then design the approaches for addressing this problem.

Methods

Parameterized Computation and Recent Progress on Parameterized Intractability

Many problems with important real-world applications in life science are NP-hard within the context of the theory of NP-completeness. This excludes the possibility of solving them in polynomial time unless P = NP. For example, the problems of cleaning up data, aligning multiple sequences, finding the closest string, and identifying the maximum common substructure are all famous NP-hard problems in bioinformatics [16-18,1]. A number of approaches have been proposed in dealing with these NP-hard problems. For example, the highly-acclaimed approximation approach [19] tries to come up with a "good enough" solution in polynomial time instead of an optimal solution for an NP-hard optimization problem [20-23].

The theory of parameterized computation [17] is a newly developed approach introduced to address NP-hard problems with small parameters. It tries to give exact algorithms for an NP-hard problem when its natural parameter is small (even if the problem size is big). A parameterized problem Q is a decision problem consisting of instances of the form (x, k), where x is the problem description and the integer k = 0 is called the parameter. The parameterized problem Q is fixed-parameter tractable [17] if it can be solved in time f(k)|x|O(1), where f is a recursive function. The class FPT contains all the problems that are fixed-parameter tractable. In this paper, we assume that complexity functions are "nice" with both the domain and range being non-negative integers and the values of the functions and their inverses are easily computed. For two functions f and g, we write f(n) = o(g(n)) if there is a nondecreasing and unbounded function λ such that f(n) = g(n)/λ(n). A function f is subexponential if f(n) = 2O(n).

For a problem in the class FPT, research is focused on identifying more efficient, parameterized algorithms. There are many effective techniques to design parameterized algorithm including the methods of "bounded search tree" and "reduction to a problem kernel". Another example is the vertex cover problem.

Definition

Vertex cover problem: given a graph G and an integer k, determine if G has a vertex cover C of k vertices, i.e., a subset C of k vertices in G such that every edge in G has at least one endpoint in C. Here, the parameter is k.

Given a graph of n vertices, there is a parameterized algorithm that can solve the vertex cover problem in time O(kn + 1.286k) [24].

Accompanying the work on designing efficient and practical parameterized algorithms, a theory of parameter intractability has previously been developed [17]. In parameterized complexity, to classify fixed-parameter intractable problems, a hierarchy of classes (the W-hierarchy ∪t = 0 W [t], where W [t] ⊆ W [t+1] for all t = 0) have been introduced in which the 0-th level W [0] is the class FPT. The hardness and completeness have been defined for each level W [i] of the W-hierarchy for i = 1, and a large number of W [i]-hard parameterized problems have been identified [17]. For example, the clique problem is W[1]-hard.

Definition

Clique problem: given a graph G and an integer k, determine if G has a clique C of k vertices, i.e., a subset C of k vertices in G such that there is an edge in G between any two of these k vertices, i.e., the k vertices induce a complete subgraph of G. Here the parameter is k.

The clique problem can be solved in time O(nk), based on the enumeration of all the vertex subsets of size k for a given graph with n vertices.

It has become commonly accepted that no W[1]-hard (and W [i]-hard, i > 1) problem can be solved in time f(k)nO(1) for any function f (i.e., W[1] ? FPT). W[1]-hardness has served as the hypothesis for fixed-parameter intractability. An example is a recent result by Papadimitriou and Yannakakis [25], showing that the database query evaluation problem is W[1]-hard. This provides strong evidence that the problem cannot be solved by an algorithm whose running time is of the form f(k)nO(1), thus excluding the possibility of a practical algorithm for the problem even if the parameter k (the size of the query) is small as in most practical cases.

Based on the W[1]-hardness of the clique algorithm, computational intractability of problems in bioinformatics has been derived [26-31], the author point out that "Unless an unlikely collapse in the parameterized hierarchy occurs, the results proved in [31] that the problems longest common subsequence and shortest common supersequence are W[1]-hard rule out the existence of exact algorithms with running time f(k)nO(1) (i.e., exponential only in k) for those problems. This does not mean that there are no algorithms with much better asymptotic time-complexity than the known O(nk) algorithms based on dynamic programming, e.g., algorithms with running time nvk are not deemed impossible by our results."

Recent investigation has derived stronger computational lower bounds for well-known NP-hard parameterized problems [32,33]. For example, for the clique problem – which asks if a given graph of n vertices has a clique of size k – it is proved that unless an unlikely collapse occurs in parameterized complexity theory, the problem is not solvable in time f(k)no(k) for any function f. Note that this lower bound is asymptotically tight in the sense that the trivial algorithm that enumerates all subsets of k vertices in a given graph to test the existence of a clique of size k runs in time O(nk).

Based on the hardness of the clique problem, lower bound results for a number of bioinformatics problems have been derived [34]. For example, our results for the problem's longest common subsequence and shortest common supersequence have strengthened the results in [31] significantly and advanced the understanding on the complexity of the problems. We show that it is actually unlikely that the problems can be solved in time nγ(k) for any sublinear function γ(k) and the known dynamic programming algorithms of running time O(nk) for the problems are actually asymptotically optimal.

In the following section, we derive the lower bound for exact algorithms of the maximum common subgraph problem.

Lower Bound for Maximum Common Subgraph Problem

The formal parameterized version of the maximum common subgraph problem is described above; we choose the number of vertices in the common subgraph as the parameter. Based on the reduction from the parameterized clique problem to the parameterized common subgraph problem, we derive the hardness result of the parameterized common subgraph problem.

An NP optimization problem Q is a four-tuple (IQ, SQ, fQ, optQ) [19], where:

1. IQ is the set of input instances. It is recognizable in polynomial time;

2. For each instance x ∈ IQ, SQ(x) is the set of feasible solutions for x, which is defined by a polynomial p and a polynomial time computable predicate π (p and π only depend on Q); SQ(x) = {y: |y| = p(|x|) and π(x, y)};

3. fQ(x, y) is the objective function mapping a pair x ∈ IQ and y ∈ SQ(x) to a non-negative integer; the function fQ is computable in polynomial time;

4. optQ∈ {max, min}. Q is called a maximization problem if optQ = max and a minimization problem if optQ = min.

An NP optimization problem Q can be parameterized in a natural way as follows [35,32]:

Definition

Let Q = (IQ, SQ, fQ, optQ) be an NP optimization problem. The parameterized version of Q is defined as:

1. If Q is a maximization problem, then the parameterized version of Q is defined as Q = {(x, k) | x ∈ IQ ^ optQ(x) = k };

2. If Q is a minimization problem, then the parameterized version of Q is defined as Q = {(x, k) | x ∈ IQ ^ optQ(x) = k}.

We now provide the definitions of the maximum common subgraph problem and the parameterized common subgraph problem.

Definition

Maximum common subgraph problem:

Input: two graphs G1 = (V1, E2) and G2= (V2, E2).

Output: the maximum common vertex-induced subgraph of the two graphs G1 and G2.

Definition

Parameterized common subgraph problem:

Input: two graphs G1 = (V1, E2) and G2= (V2, E2), and a positive integer k;

Parameter: k;

Output: "Yes", if there is a common vertex-induced subgraph of k vertices, i.e., a common subgraph of size k of the two graphs G1 and G2. Otherwise, output "No".

Lemma 1

The parameterized common subgraph problem is W[1]-hard.

Proof: We will give an FPT-reduction from clique to the parameterized common subgraph problem as follows.

Given an instance (G, k) of the clique problem, where the graph G has n vertices and k is a positive integer, we construct an instance of the parameterized common subgraph problem as follows: let G1 be the graph G, and G2 a complete graph of k vertices. The problem can therefore be stated as "Is a common vertex-induced subgraph of k vertices for the graphs G1 and G2?"

We can verify that the graph G has a clique of size k if and only if the graphs G1 and G2 have a common subgraph of k vertices. Since the reduction may be finished in polynomial time O(nk), the reduction is an FPT-reduction from clique to parameterized common subgraph problem.

To prove our main result, we will use the definition of linear FPT-reduction and W1[1]-hard [36]:

Definition

A parameterized problem Q is linear FPT-reducible, or more precisely, FPTl-reducible, to a parameterized problem Q' if there exist a function f and an algorithm A of running time f(k)nO(1) that, on each (k, n)-instance x of Q, produces a (k', n')-instance x' of Q', where k' = O(k), n' = nO(1), and x is a yes-instance of Q if and only if x' is a yes-instance of Q'.

Linear FPT-reduction has the transitivity property [36,34]. The transitivity of the FPTl-reduction is proved in the following lemma:

Lemma 2

Let Q1, Q2 and Q3 be three parameterized problems. If Q1 is FPTl-reducible to Q2, and Q2 is FPTl-reducible to Q3, then Q1 is FPTl-reducible to Q3.

Proof: If Q1 is FPTl-reducible to Q2, then there exists a function f1 and an algorithm A1 of running time f1(k1)n1o(k1)m1O(1), such that for each (k1, n1, m1)-instance x1 of Q1, the algorithm A1 produces a (k2, n2, m2)-instance x2 of Q2, where n2 = n1O(1), m2 = m1O(1), and k2 = c1k1, where c1 is a constant.

If Q2 is FPTl-reducible to Q3, then there exists a function f2 and an algorithm A2 of running time f2(k2)n2O(k2) m2O(1), such that on each (k2, n2, m2)-instance x2 of Q2, the algorithm A2 produces a (k3, n3, m3)-instance x3 of Q3, where k3 = O(k2), n3 = n2O(1), m3 = m2O(1).

We now have an algorithm A that reduces Q1 to Q3, as follows: For a given (k1, n1, m1)-instance x1 of Q1, A first calls the algorithm A1 on x1 to construct a (k2, n2, m2)-instance x2 of Q2, where k2 = c1k1, n2 = n1O(1), and m2 = m1O(1). Then A calls the algorithm A2 on x2 to construct a (k3, n3, m3)-instance x3 of Q3. It is therefore obvious that x3 is a yes-instance of Q3 if and only if x1 is a yes-instance of Q1. Moreover, from k2 = c1k1 and k3 = O(k2), we have k3 = O(k1), and from n2 = n1O(1), m2 = m1O(1), n3 = n2O(1), m3 = m2O(1), we get n3 = n1O(1) and m3 = m1O(1). Finally, since the invocation of algorithm A1 on x1 takes time f1(k1)n1o(k1) m1O(1), the invocation of algorithm A2 on x2 takes time f2(k2)n2O(k2) m2O(1), k2 = c1k1, n2 = n1O(1), and m2 = m1O(1), we conclude that the running time of algorithm A is bounded by f1(k1)n1O(k1) m1O(1), where f(k1) = f1(k1) + f2(c1k1). By definition, A is an FPTl-reduction from Q1 to Q3; i.e., Q1 is FPTl-reducible to Q3.

Definition

A parameterized problem Q is W[1]-hard under the FPTl-reduction, or more precisely Wl[1]-hard, if the Weighted antimonotone CNF 2SAT (abbreviated wcnf -2sat-) problem is FPTl-reducible to Q.

In particular, it has been shown [32,33] that the clique problem is Wl[1]-hard.

Lemma 3

(From theorem 5.2 of [33]) Unless all SNP problems are solvable in subexponential time, no Wl[1]-hard problem can be solved in time f(k)nO(k) for any recursive function f.

Note Papadimitriou and Yannakakis [30] have introduced the class SNP which contains many well-known NP-hard problems. Some of these problems have been the major targets in the study of exact algorithms, but have so far resisted all efforts for the development of subexponential time algorithms to solve them. Thus, it has been commonly agreed that it is unlikely that all SNP problems are solvable in subexponential time. A recent result showed the equivalence between the statement that "all SNP problems are solvable in subexponential time" and the collapse of a parameterized class called Mini[1,37] to FPT, which is also considered as an unlikely collapse in parameterized computation.

Lemma 4

The parameterized common subgraph problem is Wl[1]-hard.

Proof: Referring to the proof of Lemma 1, the reduction from a clique to a parameterized common subgragh problem is a linear FPT-reduction.

Based on the transitivity property of the linear FPT-reduction of Lemma 2, and the fact that the clique problem is Wl[1]-hard, the parameterized common subgraph problem could not be solved in time f(k)nO(k), where k is the number of vertices in the common subgraph and f is any recursive function, unless some unlikely collapse (Mini[1] = FPT) occurs in parameterized computation.

From Lemma 4 and Proposition 3, we have the following theorem:

Theorem

Given two graphs G1 and G2 with each graph having n vertices, there is no algorithm of time f(k)nO(k) for the parameterized common subgraph problem, where k is the number of vertices in the common subgraph and f is any recursive function, unless some unlikely collapse (Mini[1] = FPT) occurs in parameterized computation.

In consideration of the upper-bound result, we now show that our lower-bound result for the maximum common subgraph problem presented here is asymptotically tight.

Upper Bound – Clique Based Approaches

The following approach for the maximum common subgraph problem is based on the reduction [15,1] from a maximum common subgraph problem to the maximum clique problem.

From two graphs G1= (V1, E1) and G2= (V2, E2), a new graph G= (V, E) is derived as follows: Let V = V1 × V2 and call V a set of pairs. Call two pairs <u1, u2> and <v1, v2> compatible if u1 ≠ v1 and u2 ≠ v2 and if they preserve the edge relation, that is, there is an edge between u1 and v1 if and only if there is an edge between u2 and v2. Let E be the set of compatible edges. A k-clique in the new graph G can be interpreted as a matching between two induced k-node subgraphs. The two subgraphs are isomorphic since the compatible pairs preserve the edge relations. The new graph G is called the modular product graph of the two graphs G1 and G2.

We suppose n = |V1| = |V2| (The analysis for the case when |V1| ? |V2|, is similar, and thus is omitted). From the construction of G, we have |V| = n2. By a close observation of the new graph G, we can see that G is indeed an n-partite graph, where the vertices are partitioned into n disjoint partitions with each partition having n vertices.

We may use a matrix to denote the n2 vertices of the n-partite graph with n vertices in each partition.

v{1,1}, v{1,2}, ..., v{1,n}

v{2,1}, v{2,2}, ..., v{2,n}

... ...

v{n,1}, v{n,1}, ..., v{n,n}

The n vertices of the first row v{1,i}, 1 = i = n, belong to partition one of the n-partite graph. The n vertices of the second row v{2,i}, 1 = i = n, belong to partition two and so on.

There is no edge between any two vertices within the same partition. Edges only appear between two vertices that are in two different partitions. So, at most one vertex from each partition (of the n vertices) could be in a clique of the graph. Therefore, to find a clique of size k, there will be nk possible ways for choosing the clique vertices. For each possible way, the algorithm needs O(k2) time to check if it constructs a clique of size k. Therefore, this gives an algorithm of time O(nkk2) for the maximum common subgraph problem. We call this algorithm ALG-COMMON SUBGRAPH for the convenience of the following discussion.

This problem – when the maximum clique size k is equal to n – has been studied by Sze et al [38]:

Definition

Given an n-partite graph G with n vertices in each part, the n-CLIQUEnp problem finds an n-clique in the graph G.

For this problem, they developed a fast and exact divide-and-conquer approach. The basic idea of this novel approach is to subdivide the given n-partite graph into several n0-partite subgraphs with n0 < n and solve each smaller subproblem independently using a branch-and-bound approach as long as the number of cliques of size n0 in each subproblem is not too high. The reader is referred to [38] for the details of this divide-and-conquer approach. However, their approach in the worst case still has the same upper bound.

Given this O(nk k2)-time algorithm for the maximum common subgraph problem, the lower bound result of our Theorem is asymptotically tight.

When the number of vertices in the common subgraph k is not very far away from the value of n, we define k = n – c, where c is a constant. We illustrate the basic idea for c = 1 as follows [39]: Suppose the n-partite graph G has a clique C of size k-1. We add one more vertex to each of the n partitions. And we also add edges from this vertex to any vertices (except the newly added vertices) that are not in the same partition. Now we get a new graph G'. G' is an n-partite graph with n + 1 vertices in each partition. The new graph G' has a clique C' of size n if and only if the original n-partite graph G has a clique of size (n-1). The vertices of this clique C' include the vertices of the original clique C and one newly added vertex.

For the newly constructed graph G', we can now apply the algorithm ALG-COMMON SUBGRAPH without any change. And we need time O((n+1)n n2). After we find the clique C', we just remove the newly added vertex and return the other vertices of C'.

Similarly, if the n-partite graph G has a clique of size k – c, where c is a positive integer constant, we can find the clique by adding c new vertices and associated edges as described above and then applying the algorithm ALG-COMMON SUBGRAPH which runs in time O((n+c)n n2).

This simple idea of dealing with cliques of a size less than n is useful since it makes the algorithm ALG-COMMON SUBGRAPH work uniformly for finding cliques of different sizes on n-partite graphs. In the following, we give the following algorithm for finding cliques of size k – c.

Algorithm for (K-C)-CLIQUE

INPUT: an n-partite graph G, with n vertices in each partition, and a small constant c, where c is a positive integer;

OUTPUT: a clique of size no less than k – c;

Step 1: For i = 0 to c do

• Step 1.1: Construct a new graph G1, by adding i new vertices to each partition of the graph G and adding edges from each of the new vertices to any vertices (except the newly added vertices) that are not in the same partition.

• Step 1.2: Apply the algorithm ALG-COMMON SUBGRAPH on the graph G1.

• Step 1.3: If a clique C1 is found, then return "a clique C of size k – i has been found" (C is constructed by removing all the newly-added vertices from the clique C1).

• Endfor

Step 2: Return "no clique has been found".

We now propose two approaches for the maximum common subgraph problem which are based on the relationship between the vertex cover problem and the clique problem:

Algorithm 1: ALG-APPROX-CLIQUE

INPUT: an n-partite graph G, with n vertices in each partition, and a small constant c, where c is a positive integer;

OUTPUT: a clique for the graph G.

Step 1. Compute the complement graph G' of the modular product graph G = (V, E) of graph G1 and G2;

Step 2. Apply the approximation algorithm for the vertex cover problem to get a vertex cover C;

Step 3. Return V – C as the clique vertex set.

ALG-APPROX-CLIQUE gives an approximate solution for the maximum common subgraph problem in polynomial time. This approach uses the following approximation algorithm for the vertex cover problem with an approximation ratio 2 in [40]:

ALG-APPROX-VERTEX COVER

INPUT: a graph G = (V, E);

OUTPUT: a vertex cover C of approximation ratio 2 for the graph G.

Step 1. C ← Φ;

Step 2. E' ← E(G);

Step 3. While E' ≠ Φ

• Step 3.1. Let (u, v) be an arbitrary edge of E';

• Step 3.2. C = C ∪ {u, v};

• Step 3.3. Remove from E' every edge incident on either u or v;

Step 4. Return C as the vertex cover set.

In this algorithm, ALG-APPROX-VERTEX COVER selects an edge from the set of edges of the graph G = (V, E) and adds it to C. Repeating this procedure for (u, v) ∈ E(G) and deleting edges from E' that are covered by u or v results in a running time of O(V+E).

Algorithm 2: ALG-EXACT-MAXCLIQUE

INPUT: an n-partite graph G, with n vertices in each partition, and a small constant c;

OUTPUT: a clique for the graph G.

Step 1. Compute the complement graph G' of the modular product graph G = (V, E) of graph G1 and G2;

Step 2. Apply the parameterized exact algorithm for the Vertex Cover problem on G' and compute the minimum vertex cover C0.

Step 3. Return the maximum clique with the vertex set V – C0.

Alternatively, ALG-EXACT-MAXCLIQUE could apply in Step 2 the current best algorithm for vertex cover [24] which is of time O(kn + 1.286k). By running the vertex cover algorithm for at most n times, we produce the minimum vertex cover of the product graph G.

Results

In this paper we investigated the lower-bound result for the maximum common subgraph problem. We proved that it is unlikely that there is an algorithm of time f(k)nO(k) for the problem, where k is the number of vertices in the common subgraph and f is any recursive function. We then presented the upper bound of algorithms which solve this problem: O(nkk2) time where k is the number of vertices in the common subgraph. In consideration of the upper-bound result, we point out that our lower-bound result for the maximum common subgraph problem is asymptotically tight.

Conclusion

Parameterized computation is a viable approach with great potential for investigating many applications within bioinformatics, such as the maximum common subgraph problem studied in this paper. With an improved hardness result and the proposed approaches in this paper, future research can be focused on further exploration of efficient approaches for different variants of this problem within the constraints imposed by real applications.

Authors' contributions

XH carried out the study on the lower bound and approaches for the maximum common subgraph problem and helped to provide background information on parameterized computation theory. JL and SFJ participated in the design and expression of the algorithms for the maximum common subgraph problem. All authors have read and approved the final manuscript.

Acknowledgements

This publication was made possible in part by NIH Grant #P20 RR-16460 from the IDeA Networks of Biomedical Research Excellence (INBRE) Program of the National Center for Research Resources.

This article has been published as part of BMC Bioinformatics Volume 7, Supplement 4, 2006: Symposium of Computations in Bioinformatics and Bioscience (SCBB06). The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/7?issue=S4.

References

  1. Raymond JW, Willett P: Maximum common subgraph isomorphism algorithms for the matching of chemical structures.

    Journal of Computer-aided Molecular Design 2002, 16:521-533. PubMed Abstract | Publisher Full Text OpenURL

  2. Horaud R, Skordas T: Stereo correspondence through feature grouping and maximal cliques.

    IEEE Trans Pattern Anal Mach Intell 1989, 11(11):1168-1180. OpenURL

  3. Shearer K, Bunke H, Venkatesh S: Video indexing and similarity retrieval by largest common subgraph detection using decision trees.

    No. IDIAP-RR 00–15, Dalle Molle Institute for Perceptual Artificial Intelligence, Martigny, Valais, Switzerland 2000. OpenURL

  4. Bowie J, Luthy R, Eisenberg D: A method to identify protein sequences that fold into a known three-dimensional structure.

    Science 1991, 253:164-170. PubMed Abstract | Publisher Full Text OpenURL

  5. Bryant SH, Altschul SF: Statistics of sequence-structure threading.

    Curr Opin Struct Biol 1995, 5:236-244. PubMed Abstract | Publisher Full Text OpenURL

  6. Xu Y, Xu D, Uberbacher EC: An efficient computational method for globally optimal threading.

    Journal of Computational Biology 1998, 5(3):597-614. PubMed Abstract OpenURL

  7. Lathrop RH, Rogers RG Jr, Bienkowska J, Bryant BMK, Buturovic LJ, Gaitatzes C, Nambudripad R, White JV, Smith TF: Analysis and algorithms for protein sequencestructure alignment. In Computational Methods in Molecular Biology, Salzberg, Searls. Edited by Kasif. Elsevier; 1998. OpenURL

  8. Xu J, Li M, Kim D, Xu Y: RAPTOR: optimal protein threading by linear programming.

    J Bioinform Comput Biol 2003, 1(1):95-117. PubMed Abstract | Publisher Full Text OpenURL

  9. Doudna JA: Structural genomics of RNA.

    Nature Structural Biology 2000, 7(11 supp):954-956. PubMed Abstract | Publisher Full Text OpenURL

  10. Eddy SR: Computational genomics of non-coding RNA genes.

    Cell 2002, 109:137-140. PubMed Abstract | Publisher Full Text OpenURL

  11. Rivas E, Eddy SR: Noncoding RNA gene detection using comparative sequence analysis.

    BMC Bioinformatics 2001, 2:8. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  12. Lowe TM, Eddy SR: tRNAscan-SE: A Program for improved detection of transfer RNA genes in genomic sequence.

    Nucleic Acids Research 1997, 25:955-964. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  13. Song Y, Liu C, Huang X, Malmberg R, Xu Y, Cai L: Efficient parameterized algorithm for biopolymer structure-sequence alignment.

    Proceedings of 5th Workshop on Algorithms in BioInformatics (WABI 2005), Lecture Notes in Bioinformatics 2005, 3692:376-388. OpenURL

  14. Gary MR, Johnson DS, Computers and Intractability: a Guide to the Theory of NP-Completeness. WH. Freeman and Co; 1979. OpenURL

  15. Kann V: On the approximability of the maximum common subgraph problem. In Proc 9th Annual Symposium on Theoretical Aspects of Computer Science, Lecture Notes in Computer Science 577. Springer-Verlag; 1992:377-388. OpenURL

  16. Cheetham J, Dehne F, Rau-Chaplin A, Stege U, Taillon PJ: Solving large FPT problems on coarse-grained parallel machines.

    JCSS 2003, 67:691. OpenURL

  17. Downey R, Fellows M: Parameterized Complexity. Springer; 1999. OpenURL

  18. Lanctot JK, Li M, Ma B, Wang S, Zhang L: Distinguishing string selection problems.

    Inf Comput 2003, 185:41. OpenURL

  19. Ausiello G, Crescenzi P, Gambosi G, Kann V, Marchetti-Spaccamela A, Protasi M: Complexity and Approximation, Combinatorial Optimization Problems and Their Approximability Properties. New York: Springer-Verlag; 1999. OpenURL

  20. Deng X, Li G, Li Z, Ma B, Wang L: A PTAS for distinguishing (sub)string selection.

    LNCS 2002, 2380:740. OpenURL

  21. Deng X, Li G, Li Z, Ma B, Wang L: Genetic design of drugs without side-effects.

    SIAM Journal on Computing 2003, 32:1073. OpenURL

  22. Jiang T, Li M: On the Approximation of shortest common Supersequences and longest Common subsequences.

    SIAM J Comput 1995, 24:1122. OpenURL

  23. Li M, Ma B, Wang L: On the closest string and substring problems.

    Journal of the ACM 2002, 49:157. OpenURL

  24. Chen J, Kanj I, Jia W: Vertex cover: further observations and further improvements.

    Journal of Algorithms 2001, 41:280-301. OpenURL

  25. Papadimitriou C, Yannakakis M: On the complexity of database queries.

    JCSS 1999., 58 OpenURL

  26. Bodlaender HL, Downey RG, Fellows MR, Hallett MT, Wareham HT: Parameterized complexity analysis in computational biology.

    Comput Appl Biosci 1995, 11:49-57. PubMed Abstract OpenURL

  27. Bodlaender H, Downey R, Fellows M, Wareham M: The parameterized complexity of sequence alignment and consensus.

    Theoretical Computer Science 1995, 147:31. OpenURL

  28. Fellows M, Gramm J, Niedermeier R: Parameterized intractability of motif search problems.

    LNCS 2002, 2285:262. OpenURL

  29. Hallett M: An Integrated Complexity Analysis of Problems for Computational Biology. Ph.D. Thesis, University of Victoria; 1996. OpenURL

  30. Papadimitriou C, Yannakakis M: On limited nondeterminism and the complexity of VC dimension.

    JCSS 1996, 53:161. OpenURL

  31. Pietrzak K: On the parameterized complexity of the fixed alphabet shortest common supersequence and longest common subsequence problems.

    JCSS 2003, 67:757. OpenURL

  32. Chen J, Chor B, Fellows M, Huang X, Juedes D, Kanj I, Xia G: Tight lower bounds for parameterized NP-hard problems.

    Proc of the 19th Annual IEEE Conference on Computational Complexity 2004, 150-160. OpenURL

  33. Chen J, Huang X, Kanj I, Xia G: Linear FPT reductions and computational lower bounds.

    Proc of the 36th ACM Symposium on Theory of Computing 2004, 212-221. OpenURL

  34. Huang X: Parameterized Complexity and Polynomial-time Approximation Schemes. Ph.D. Dissertation, Texas A&M University; 2004. OpenURL

  35. Cai L, Chen J: On Fixed-Parameter Tractability and Approximability of NP Optimization Problems.

    J Comput Syst Sci 1997, 54:465-474. OpenURL

  36. Chen J, Huang X, Kanj I, Xia G: W-hardness linear FPT-reductions: structural properties and further applications.

    Proceedings of the Eleventh International Computing and Combinatorics Conference (COCOON 2005), Lecture Notes in Computer Science 2005, 3595:975-984. OpenURL

  37. Downey R, Estivill-Castro V, Fellows M, Prieto E, Rosamond F: Cutting up is hard to do: the parameterized complexity of k-Cut and related Problems.

    Electr Notes Theor Comput Sci 2003., 78 OpenURL

  38. Sze S-H, Lu S, Chen J: Integrating sample-driven and pattern-driven approaches in motif finding.

    WABI2004 2004, 438-449. OpenURL

  39. Sze S-H:

    Lectures notes of Special Topics in Computational Biology, Fall. 2002. OpenURL

  40. Cormen TH, Leiserson CE, Rivest RL, Stein C: Introduction to Algorithms. 2nd edition. MIT Press; 2001. OpenURL