Abstract
Background
Clearly visualized biopathways provide a great help in understanding biological systems. However, manual drawing of largescale biopathways is time consuming. We proposed a grid layout algorithm that can handle generegulatory networks and signal transduction pathways by considering edgeedge crossing, nodeedge crossing, distance measure between nodes, and subcellular localization information from Gene Ontology. Consequently, the layout algorithm succeeded in drastically reducing these crossings in the apoptosis model. However, for largerscale networks, we encountered three problems: (i) the initial layout is often very far from any local optimum because nodes are initially placed at random, (ii) from a biological viewpoint, human layouts still exceed automatic layouts in understanding because except subcellular localization, it does not fully utilize biological information of pathways, and (iii) it employs a local search strategy in which the neighborhood is obtained by moving one node at each step, and automatic layouts suggest that simultaneous movements of multiple nodes are necessary for better layouts, while such extension may face worsening the time complexity.
Results
We propose a new grid layout algorithm. To address problem (i), we devised a new forcedirected algorithm whose output is suitable as the initial layout. For (ii), we considered that an appropriate alignment of nodes having the same biological attribute is one of the most important factors of the comprehension, and we defined a new score function that gives an advantage to such configurations. For solving problem (iii), we developed a search strategy that considers swapping nodes as well as moving a node, while keeping the order of the time complexity. Though a naïve implementation increases by one order, the time complexity, we solved this difficulty by devising a method that caches differences between scores of a layout and its possible updates.
Conclusion
Layouts of the new grid layout algorithm are compared with that of the previous algorithm and human layout in an endothelial cell model, three times as large as the apoptosis model. The total cost of the result from the new grid layout algorithm is similar to that of the human layout. In addition, its convergence time is drastically reduced (40% reduction).
Background
Modeling and simulations of large scale biological pathways are some of the most important tasks in Bioinformatics. Many applications, e.g., Cell Illustrator [1,2], Cytoscape [3], Pajek [4], PATIKA [5,6], and CADLIVE [7,8] have been developed in this area. Related to these topics, the visualization of biopathways is considered to play a key role in understanding biological systems. However, manual drawing of largescale biopathways is a time consuming work, hence suitable biopathway layout algorithms and their applications are strongly demanded.
Biopathways are categorized into three types, i.e., metabolic pathways, signal transduction pathways, and generegulatory networks. For metabolic pathways, several algorithms have been already proposed [913], and some of them succeeded in capturing the flow of the reactions well. In contrast, few layout algorithms that provide a convenient biological understanding have been proposed for signal transduction pathways [14,15] and generegulatory networks [16,17]. Thus, our new layout algorithm is focused on signal transduction pathways and generegulatory networks. For signal transduction pathways and generegulatory networks, extant layout algorithms can be categorized into two types; forcedirected and grid layout algorithms.
Forcedirected algorithms are used in [16,17] by taking into account the directional constraint following different types of molecular and simple regional constraints from subcellular localizations. These algorithms have been successfully integrated into PATIKA. However, as pointed out in [14], forcedirected algorithms may not be suitable for compact layouts of complex biopathways. Furthermore, intricately shaped regions such as torusshaped region cannot be handled well as regional constraints in these forcedirected algorithms. Hence, they are not suitable for models containing torusshaped plasma membrane and nuclear membrane although such types of models are common as biopathways.
A grid layout algorithm (referred to as LKgrid layout algorithm) was initially proposed by Li and Kurata. The grid layout algorithm restricts the positions of all nodes to grid points. Li and Kurata defined a cost function for two nodes that depends on some distance between these nodes and the topology of their connections in the graph. They applied LKgrid layout algorithm to a yeast cellcycle pathway and concluded that this algorithm can geometrically classify the pathway into functional categories without using biological information. Moreover, they noticed that the algorithm generates compact layouts while avoiding overlaps between nodes. [15] proposed CBgrid layout algorithm, in which so as to reduce edgeedge crossings and nodeedge crossings, a penalty for these cases is added to the cost function. The algorithm can also deal with any complex regional constraints following subcellular localizations, and besides search space is reduced due to these constrains. As a result, in the apoptosis model, the layout algorithm succeeded in a drastic reduction of edgeedge crossings and nodeedge crossings, while placing nodes in biologically proper regions.
However, in the case of largerscale networks, this algorithm encountered three problems. First, a layout with randomly placed nodes is used as the initial layout. This random layout contains a large number of edgeedge crossings and nodeedge crossings; subsequently, many iterations will be required to obtain a locally optimal layout. Secondly, although one of the features of CBgrid layout algorithm is to use the subcellular localization information, it still does not fully utilize biological characteristics. For example, it does not consider such biological attributes as types of entities (protein, mRNA, and microRNA) or types of processes (phosphorylation, binding, and translation), although in human layouts these biological attributes are apt to contribute to the comprehension of interesting biopathways easier. Thirdly, according to a greedy strategy, CBgrid layout algorithm updates a layout by moving one node at each step until the layout reaches an optimum. However, resulting layouts are just local optima, hence their quality fundamentally depends on the initial layout. Although in [15] a multistep CBgrid layout algorithm was also proposed to solve this drawback, it requires higher time complexity and hence is not suitable for practical applications.
To overcome these three problems, we propose a new grid layout algorithm. For the first problem, we propose a new forcedirected algorithm whose output is suitable as the initial layout of grid layout algorithms. For the second problem, we introduce the concept that assigns a score i.e., a negative cost, to a layout depending on how nodes with the same attribute are aligned. This concept is realized with a combo score function, which is combined with the cost function defined in CBgrid layout algorithm. For the third problem, the search strategy in CBgrid layout algorithm is improved by adding the swap operation while keeping the time complexity. By the swap operation, the new grid layout can also consider layouts generated by exchanging the positions of two nodes in the current layout at each step.
The Methods section is organized as follows: (i) first, we introduce the previous grid layout, i.e., CBgrid layout algorithm; (ii) for the first improvement in the initial layout of CBgrid layout algorithm, the new forcedirected algorithm termed Eades initial layout algorithm is described; (iii) for the second improvement, CCBgrid layout algorithm, which is CBgrid layout algorithm with the combo score function is described; (iv) for the third improvement, SCCBgrid layout algorithm, which enhances CCBgrid layout algorithm by adding the swap operation is presented. In the Results and Discussion section, the performances of these new algorithms are compared and verified by applying them to the signal transduction pathway of an endothelial cell, which is larger than the pathways in [14] and [15].
Methods
CBgrid layout algorithm: Introduction of the grid layout algorithm
Given a graph G = (V, E) with nodes V and edges E, a layout L = (V, E, U, P) of G consists of the underlying graph G, grid points U and a function P : V → U such that P (v_{α}) ≠ P (v_{β}) for any two distinct nodes v_{α}, v_{β }∈ V. This definition does not allow overlaps between nodes in the layout. For a layout L, this paper uses the following notations.
• W_{L}: a set of vacant points of L.
• E_{v}: the set of all edges connected to node v.
• V: the number of nodes in V.
• W: the number of vacant points in L, instead of W_{L} if there is no confusion possible.
We define the following operations.
• T_{v → p }L: the layout generated by moving a node v to a vacant point p ∈ W_{L}.
• L: the layout generated by swapping nodes v_{α }and v_{β}.
• D_{v }L: the layout generated by removing a node v and all edges connected to v.
In addition, we define the following functions.
• (L): a binary function that returns 1 if an edge e_{i }crosses with an edge e_{j }and 0 otherwise.
• (L): a binary function that returns 1 if an edge e_{j }crosses with a node v_{i }and 0 otherwise.
• (L): a function that returns , where is the weight to the couple of nodes v_{i }and v_{j}, and md (v_{i}, v_{j}) is the Manhattan distance between v_{i }and v_{j}.
In our previous approach [15] (mainly referred to as CBgrid layout algorithm), the layout cost C (L) of L was defined as follows:
where W_{ee}, W_{ne}, and W_{d }are called respectively edgeedge crossing weight, nodeedge crossing weight, and distance cost weight.
The CBgrid layout algorithm repeats the operation of moving a unique node to a vacant point onebyone until it reaches a locally optimal layout. At each step, the algorithm calculates costs of all layouts that can be generated by moving one of all nodes to one of all vacant points. The layout with the lowest cost is selected as a starting layout for the next step. After reaching convergence, the algorithm outputs a locally optimal layout. If the cost calculation of all possible adjacent layouts is implemented in a naïve way, high time complexity is required. To overcome this problem, the previous method [15] introduced Δ matrix that stores each possible cost difference at the previous step and succeeded in reducing the time complexity at each step from O (W (V^{2 }+ E^{2}) to O (V^{2 }+ E^{2 }+ W (V + E)), where v_{β }is the node moved at the previous step.
When CBgrid layout algorithm was applied to several biopathways, we encountered three problems. Thus, we propose new grid layout algorithms that solve these problems. Problems and solutions are summarized as follows:
1. Improving the choice of the initial layout: since a locally optimal layout depends noticeably on the initial layout, we first apply Eades initial layout algorithm to a random layout, and use its output as the initial layout. In the previous approach, a random layout was directly used as the initial layout.
2. Improving the cost function: we introduce the concept of a combo score that gives a good score, i.e., a negative cost when nodes with the same biological attribute are aligned (CCBgrid layout algorithm). In CBgrid layout algorithm, the biological attributes, except subcellular localization, were ignored.
3. Improving the search strategy: we propose a better search strategy, which allows us to obtain improved results, keeping the time complexity. For obtaining a better layout, the search space is extended by adding the swap operation. At each step, all layouts obtained by swapping two nodes are also considered (SCCBgrid layout algorithm).
In the remainder of this section, we describe these three new algorithms mentioned above.
Eades initial layout algorithm: generating a new initial layout for grid layout algorithms
In the previous paper [15], a random layout was used as an initial layout for CBgrid layout algorithm. When the initial layout is far from the global optimum, the local optimum obtained tends to be unacceptable. Therefore, we decided to develop Eades algorithm [18] and use its output as the initial layout. Eades algorithm is one of the forcedirected algorithms, consisting of the following two steps.
1. Two types of forces are defined for each pair of nodes. If two nodes are adjacent, there exists an attractive force a_{c1 }log(d/a_{c2}) between them, where a_{c1 }and a_{c2 }are constants, and d is the distance between the two nodes. On the other hand, if two nodes are not adjacent, there exists a repulsive force r_{c}/ between them, where r_{c }is a constant. At each step, the positions of all the nodes are updated according to the sum of the repulsive and attractive forces between them.
2. The above step is iterated a predetermined number of times, and the final result is obtained.
We have customized two points in Eades algorithm. First, nodes in Eades algorithm can be placed anywhere. All the nodes in the initial layout for CBgrid layout algorithm, however, should be placed on the grid points that satisfy the subcellular localization. Thus, the output of Eades algorithm cannot be used directly as an input for CBgrid layout algorithm.
To handle this problem, we propose to move each node to the closest vacant point that satisfies the subcellular localization after moving nodes at each step.
Second improvement is the following one. Since Eades algorithm doesn't consider edgeedge crossings and nodeedge crossings in its implementation, the resulting layout could contain a lot of such crossings. For example, suppose a biological pathway with a subcellular localization, membrane, which slimly surrounds other subcellular localizations as shown in Figure 1(a), the graph in (a) could be a layout resulting from Eades algorithm. In this case, the layout might contain a large number of edgeedge crossings and nodeedge crossings because edges cross over other subcellular localizations. In order to avoid this problem, we propose to gather nodes around a particular grid point for each subcellular localization as shown in Figure 1(b). Eades algorithm with the above improvements is called Eades initial layout algorithm.
Figure 1. Two layouts with the same canvas and three subcellular localizations. The grid canvases (a) and (b) have the same biological subcellular localizations extracellular space, plasma membrane, and cytoplasm. Both canvases contain the same graph with four nodes that are located in plasma membrane, which surrounds cytoplasm. In (a), nodes are spread apart in plasma membrane, and edges among these nodes cross over cytoplasm. In (b), the nodes are gathered in the lefttop corner, and no edge crosses over cytoplasm. Due to its crossing patterns in (a) these edges have a higher probability to cross other nodes in cytoplasm. This is the drawback of using the layout in (a) as the initial layout for Eades initial layout algorithm.
CCBgrid layout algorithm: utilizing various biological attributes
When humans draw biopathway models, nodes with the same attribute are usually arranged according to a rule. In CBgrid layout algorithm, this type of information is completely ignored. To implement this type of property, we introduce the concept of combo scores called combo1 and combo2 (see Figure 2). Note that a combo score is applied only to nodes having an attribute since some nodes do not have any attributes. We denote the set of nodes having an attribute by V' ⊆ V. In this algorithm, (i) upperGrid(p, i)/lowerGrid(p, i) returns the upper/lower ith grid point over/under a grid point p ∈ P, and (ii) Attr(v) is the attribute of a node v ∈ V', and CW_{a }= (1 + C/), where C is a constant and normally set to V, and is the set of nodes having an attribute a.
Figure 2. Pseudo codes of combo score functions: combo1 and combo2. (a) combo1: a score function that considers nodes with one vertical grid distance from the target node. (b) combo2: a score function that considers nodes with up to two vertical grid distances from the target node, (c) isCombo: a boolean function that takes a node and a grid point as its arguments and returns "true" if the attribute of the node and that of the node on the grid point are the same.
The combo score is designed such that the more nodes with the same attribute are aligned vertically, the higher the score is. The combo score is defined between two nodes, and a combo score of a layout L is defined to be the sum of all the combo scores occurring in L. We say that two nodes have a combo relation when a combo score occurs between them. Note that the horizontal alignment score is not implemented because if the above combo score supported both the vertical and horizontal directions, the numbers of edgeedge crossings and nodeedge crossings would be considerably increased. Therefore, we should choose only one direction for combo scores. In this paper, we defined combo scores in the vertical direction. We have considered two types of combo scores, i.e., combo1 and combo2 for layouts in Figure 3(a) and 3(b), respectively. Let nodes v_{a }to v_{f }in Figure 3 have the same attribute. The combo1 considers only the nodes with one vertical grid distance from the target node. In contrast, combo2 considers the nodes with up to two vertical grid distances from the target node. For the layout in Figure 3(a), the number of combo relations with combo1 and combo2 are 8 and 12, respectively. If node v_{f }is moved as shown in Figure 3(b), the number of combo relations with combo1 is the same as before, whereas that with combo2 is 14. Thus, only by using combo2, we can improve the combo score when node v_{f }is moved as shown in Figure 3(a) and 3(b). As shown in the dotted rectangle in Figure 3(a), a pair of vertically aligned nodes often occurs during the process of updating a layout. In this case, Figure 3(b) should be a better layout than Figure 3(a). For this reason, we decide to employ combo2. Henceforth, for a node v ∈ V in a layout L, Combo_{v }(L) denotes the same combo score as combo2 (v, L). The total score for L is denoted by Combo (L).
Figure 3. An example that compares the features of combo1 and combo2 score functions. (a) An intermediate layout of CCBgrid layout algorithm. In this layout, all six nodes have the same attribute. (b) The next candidate layout that is generated from (a) by moving node v_{f }below node v_{d}. Combo scores of (a) and (b) are the same with combo1 score function. Instead, the combo score of (b) will be better than (a) with combo2 score function.
If CW_{a }returns the same value for any attribute a, many of the nodes with the same attribute will be vertically aligned easily since they have a greater chance to neighbor one another. So as to reduce the biases among the attributes, we define CW_{a }to be inversely related to the total number of the nodes whose attribute is a.
By modifying the layout score of CBgrid layout algorithm, we can define the layout cost C (L) of a layout L with the new concept of the combo score as follows:
where W_{cs }is called combo score weight. CBgrid layout algorithm improved by the above modification is named Combo score, Cross cost and Biological information grid layout algorithm (CCBgrid layout algorithm). The reason for multiplying the sum of the combo scores by 1/2 is that combo scores are counted twice since a combo score between nodes v_{α }and v_{β }is included in both (L) and (L). The algorithm is the same as Coptimization (L) step in [15] except for the use of the above layout cost C (L), i.e., the algorithm for calculating Δ matrix is also the same.
For calculating the combo score for each node, only four nodes need to be checked at most, i.e., its time complexity is constant, while for calculating the edgeedge crossing cost, the nodeedge crossing cost, and the distance cost for each node, these time complexities depend on E, V, and W, respectively. Thus, without using Δ matrix, the time complexity related to combo scores is O (VW) at each step.
At each step, we need to calculate the difference between the combo score of the previous layout L and that of the current layout that is generated by moving a node v to a vacant point p, i.e., Combo(T_{v→p }L) – Combo(L). We can efficiently calculate the difference of the combo score (L) as follows:
where
We introduced Adj_{v }(L) due to the following reason. First, suppose that three nodes with the same attribute are aligned vertically. We call them v_{α}, v_{β}, and v_{γ }beginning from the bottom. There are three combo relations among the three nodes: one is between v_{α }and v_{β}, another between v_{β }and v_{γ}, and the third between v_{α }and v_{γ}. Although v_{β }is involved in these three combo relations, the combo relation between v_{α }and v_{γ }is not considered in (L). Therefore, Adj_{v }(L) is needed to correct this type of undercount.
SCCBgrid layout algorithm: extension of the search space due to the swap operation
Another drawback of CBgrid layout algorithm is that only one node can be moved to a vacant point at each step. For example, the layout shown in Figure 4(a) is optimal for CBgrid layout algorithm despite the fact the layout in Figure 4(b) should be selected as the better layout. This limitation is due to the strategy of CBgrid layout algorithm. Thus, we have devised a new algorithm by allowing the swap operations between two nodes while keeping the time complexity. With this improvement, the layout in Figure 4(a) will be arranged as shown in Figure 4(b). The new algorithm is named CCBgrid layout with the swap operation (SCCBgrid layout algorithm). The layout cost function is the same as in CCBgrid layout algorithm. However, a naïve implementation would increase the time complexity to calculate the layout cost for swapped layouts.
Figure 4. An optimal layout of CBgrid and improved layout with the swap operation. (a) An optimal layout for CBgrid layout algorithm. (b) From (a) a better layout will be generated with the swap operation.
In the previous approach [15], Δ matrix stores cost differences that are induced only by moving nodes to vacant points. As a result, if a grid point of interest was occupied at the previous step, we cannot exploit Δ matrix to calculate cost differences corresponding to that grid point. Since grid points of interest on the swap operation are obviously occupied at the previous step, Δ matrix cannot be used. However, if Δ matrix also stores cost differences related to occupied points, Δ matrix can be exploited for this problematic case, too. We then propose an extended Δ matrix, which considers occupied points as well as vacant points. Since the definition of the cost differences for vacant points cannot be applied directly to occupied points, we decide to calculate the cost differences for the occupied points by calculating it without taking into account the node occupying that grid point and all edges connected to it. In the remainder of this section, we will show how to calculate the extended Δ matrix and then compare the time complexity of the extended Δ matrix and the original Δ matrix.
Henceforth, let us refer to the extended Δ matrix as Δ matrix. Given a layout L, at the first step, we update Δ (L) matrix as follows:
If the previous layout is updated by moving node v_{β }to vacant point q, Δ (L) can be updated efficiently by using Δ (L) as follows:
where DIFF_{0 }to DIFF_{4 }are defined in the following way:
where Q shall be defined below.
If the previous layout is updated by swapping two nodes and , Δ (L) is then updated efficiently by using Δ (L) as follows:
where DIFF_{5 }to DIFF_{9 }are defined in the following way:
The case of v_{α }= is not considered in Equation (13) because equations of this case can be obtained by simply replacing with in case 1 and 3.
(·) and (·) in DIFF_{0 }to DIFF_{9 }are partial cost functions depending on the two nodes v_{a }and v_{b }and the three nodes v_{a}, v_{b}, and v_{c}, respectively, they are the sums of the corresponding partial edgeedge crossing costs, nodeedge crossing costs and distance costs as follows:
where (·) and (·) are related to edgeedge crossings, while (·) and (·) are related to nodeedge crossings, and (·) and (·) are related to the distance cost. The details are described as below.
(a) (·) is a partial edgeedge crossing cost function of and , and is defined as follows:
Similarly, (·) is a partial edgeedge crossing cost function of , , and , and is defined as follows:
(b) is a partial nodeedge crossing cost function of v_{a}, v_{b}, , and , and is defined as follows:
Similarly, (·) is a partial nodeedge crossing cost function of v_{a}, v_{b}, v_{c}, , , and , and is defined as follows:
(c) is a partial distance cost function of v_{a }and v_{b}, and is defined as follows:
Similarly, (·) is a partial distance cost function of v_{a}, v_{b}, and v_{c}, and is defined as follows:
Thus far, we found out a method to efficiently calculate Δ matrix. The purpose of extending Δ matrix is to calculate the cost difference of the swap operation. When nodes and are swapped, we can calculate using these Δ costs as follows:
where
In SCCBgrid layout algorithm, the combo score also needs to be considered. Given a layout such that a node v_{α }is moved to a vacant point p, can be calculated as shown in Equation (3). In contrast, if two nodes and are swapped, the difference of combo scores, Combo (L) – Combo (L), is effectively calculated as follows:
where
A pseudo code of SCCBgrid layout algorithm is described in Figure 5.
Figure 5. SCCBgrid layout algorithm. A pseudo code of SCCBgrid layout algorithm.
If node v_{β }is moved at the previous step, the time complexity of calculating Δ matrix is O ((V + E)U). If two and are swapped at the previous step, the time complexity of calculating Δ matrix was O ((V + E) ( + ) U) = O ((V + E) U), where  = ( + )/2. In addition, the time complexity of all the swap operations considered at each step is O (E^{2}). Therefore, the time complexity of SCCBgrid layout algorithm is O (E^{2 }+ U (V + E)) at each step.
Since the time complexity of CBgrid layout algorithm is O (V^{2 }+ E^{2 }+ W (V + E)) at each step [15], the time complexity of SCCBgrid layout algorithm is O(V (V + E)) larger than that of CBgrid layout algorithm (note that v_{β }and v_{β' }are not distinguished here). Here, we consider two cases, V ≤ W (case 1) and V > W (case 2) and show these two algorithms have the same time complexity with high probability. For case 1, the above difference is negligible since O (V (V + E)) ≤ O (W(V + E)). In contrast, the O(V (V + E)) difference cannot be neglected in case 2. However, if we assume that all nodes can be moved to form the next layout with equal probability, V = 2 E, and O(V (V + E)) = O (V^{2 }+ E^{2}) subsequently. Therefore, the time complexity of SCCBgrid layout algorithm will be the same as that of CBgrid layout algorithm even in the case 2. For the above reasons, the time complexities of SCCBgrid and CBgrid layout algorithms are the same in practice.
Results and Discussion
Data and Parameters
To evaluate our algorithms on a largescale signal transduction pathway with a gene regulatory network, we create the pathway model of an endothelial cell with Cell Illustrator [1,2] by extracting information from [19]. The model consists of 309 nodes and 371 edges (three times as large as the apoptosis model in [15], which consists of 117 nodes and 126 edges), and the maximum degree of a node is ten (eight in the apoptosis model). Grid widths and heights are fixed to 100 pixels; the total numbers of vertical and horizontal grid points are 36 and 40, respectively. We used the following information pertaining to seven GO subcellular localizations: extracellular space (GO:0005615), cytoplasm (GO:0005737), nucleus (GO:0005634), mitochondrion (GO:0005739), plasma membrane (GO:0005886), nuclear membrane (GO:0005635), and mitochondria membrane (GO:0005740). We also used the following information pertaining to sixteen processes and entities used as attributes of nodes: migration, phosphorylation, protein with a modification, ligand, assembly, transcription, translation, mRNA, ligand and receptor, receptor, unknown, protein, exchange, trimer, ubiquitination, and degradation.
Usually, these types of biological models have many nodes termed as degradation. The degradation process always has only one edge. To exploit this property, we apply these layout algorithms after removing degradation nodes (97 nodes). After applying layout algorithms, we attach each eliminated degradation node just below the entity to which it was initially connected. Thus, in practice, the numbers of nodes and edges in the model given to layout algorithms are 212 and 274, respectively. Note that when the performances of algorithms are compared with the numbers of edgeedge crossings and nodeedge crossings in the latter part of this section, crossings that are caused by degradations and edges connected to them are not taken into account.
We apply the following rule to edgeedge crossing weight W_{ee}, nodeedge crossing weight W_{ne}, combo score weight W_{cs}, and distance cost weight W_{dc }of a layout cost, in Equation (2), to ensure that the importance of the distance cost is less than those of the others:
In our study, W_{dc}, W_{ee}, W_{ne}, and W_{cs }were set to 1, 70, 150, and 110, respectively. Also, the constant C in CW_{a }was set to 12.
Using the combo score, many nodes can be aligned vertically. However, in many cases, the nodes cannot be moved once they have combo relations. Plasma membrane, nuclear membrane, and mitochondrial membrane are thin and torus shaped, thus, vertical alignments of the nodes on these subcellular localizations will not be of interest for users (e.g., the width of plasma membrane in our model is only two grids). Therefore, in this paper, we decided to ignore combo scores in plasma membrane, nuclear membrane, and mitochondrial membrane.
Comparison of layouts
Figure 6 shows the number of edgeedge crossings, the number of nodeedge crossings, combo scores, and total costs of the layouts with CBgrid, CCBgrid, and SCCBgrid layout algorithms, and the human layout. We generate ten initial layouts by applying Eades initial layout algorithm to ten random layouts. These initial layouts are commonly used for each layout algorithm (CB Eades, CCB Eades, and SCCB Eades in Figure 6). In addition, we use the ten random layouts directly as initial layouts of CBgrid layout algorithms (CB random in Figure 6, which corresponds to the previous layout algorithm) to confirm the significance of preparing proper initial layouts. Figure 8 and 9 respectively show the best layouts of CBgrid and SCCBgrid layout algorithms, which have the lowest total cost among ten resulting layouts of each algorithm. The human layout is shown in Figure 10.
Figure 6. Comparisons of edgeedge crossings, nodeedge crossings, combo score, and total cost among the results of four grid layout algorithms and the human layout. Costs and scores of the generated layouts with the CB random, CB Eades, CCB Eades, SCCB Eades, and human layout from the same initial layout. These algorithms are applied to ten initial layouts. (a) the number of edgeedge crossings. (b) the number of nodeedge crossings. (c) combo score. (d) total cost.
Figure 8. A resulting layout of CBgrid layout algorithm. A resulting layout of CBgrid layout algorithm in an endothelial signal transduction pathway. The pathway model is the same as that in Figure 10.
Figure 9. A resulting layout of SCCBgrid layout algorithm. A resulting layout of SCCBgrid layout algorithm in an endothelial signal transduction pathway. The pathway model is the same as that in Figure 10.
Figure 10. The human layout. The human layout of an endothelial signal transduction pathway. This pathway model is arranged with CBgrid and SCCBgrid layout algorithms in Figure 8 and Figure 9, respectively.
In [15], the initial layout for CBgrid layout algorithm was a random layout, which had a large number of edgeedge crossings and nodeedge crossings. Many iterations will, therefore, be needed until convergence. This fact prompted us to use the output of Eades initial layout algorithm as an initial layout. Figure 7 shows the number of iterations until convergence. As shown in this figure, CBgrid Eades successfully reduces the number of iterations when compared to CBgrid random (40% reduction on average). Moreover, the total score of CBgrid Eades is greatly improved over that of CBgrid random (see Figure 6(d)). A discussion in [15] was suggesting that reducing edgeedge crossings and nodeedge crossings will lead to a better approximation of the human layout. In contrast as shown in Figure 6(a) and 6(b), the human layout also has several edgeedge and nodeedge crossings, and has a higher combo score than that of CBgrid layout algorithm. Based on these facts, we proposed an additional scoring criterion – combo score – in CCBgrid layout algorithm. As seen through the value of combo scores (see Figure 6(c)), CCBgrid layout algorithm drastically improves this score, and this score becomes closer to that of the human layout. However, the numbers of edgeedge crossings and nodeedge crossings in CCBgrid layout algorithm increase, comparing to CBgrid Eades (see Figure 6(a) and 6(b)). In this paper, the swap operation is proposed to increase the number of candidate layouts at each step. As shown in Figure 6(a) and 6(b), SCCBgrid layout algorithm succeeds in reducing edgeedge crossings and nodeedge crossings, i.e., the above drawback of CCBgrid layout algorithm is partially diminished. In addition, as shown in Figure 6(c), the combo score of SCCBgrid layout algorithm is also improved slightly.
Figure 7. Comparisons of the total numbers of iterations for optimal layouts among four grid layout algorithms. Total number of iterations for optimal layouts with CB random, CB Eades, and SCCB Eades from the same initial layout. Ten initial layouts are applied with these algorithms.
We also apply gridlayout algorithms to Fasinduced apoptosis pathway model [20] and ASE cell fate simulation model [21] to obtain a more generalized comparison. Resulting layouts and the number of crossings in each layout are summarized in Additional file 1. These models including the endothelial cell model are also available as Additional file 2, and the application of SCCBgrid layout algorithm for these models can be downloaded from [22].
Additional file 1. Resulting layouts of applying LKgrid layout algorithm, CBgrid layout algorithm and SCCBgrid layout algorithm to Fasinduced apoptosis pathway model and ASE cell fate simulation model are shown. Comparison of these results are also included.
Format: PDF Size: 3.1MB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 2. Biopathway model files. Endothelial cell model, Fasinduced apoptosis pathway model and ASE cell fate simulation model are included.
Format: ZIP Size: 76KB Download file
Conclusion
For better biopathway layouts, three improvements to CBgrid layout algorithm were proposed: (i) the improvement of initial layouts (ii) the improvement of cost function (iii) the improvement of search strategy itself without increasing the time complexity. For (i), Eades initial layout algorithm was proposed and the improvement was confirmed with a signal transduction pathway of an endothelial cell. For (ii), CCBgrid layout algorithm, which includes combo score function, was proposed and the improvement was verified with the same signal transduction pathway. For (iii), SCCBgrid layout algorithm was proposed. Due to (i) and (iii), our layout algorithm can be started from the better layout, and more robust to the condition of the initial layout than extant methods. In addition, we succeeded in utilizing the biological attributes that are not considered in extant methods due to combo score.
However, our layout algorithm has limitations and problems, which should be addressed in future work. Firstly, if the parameters of the combo score are not correctly selected, once a node gets a combo relation, the node no longer moves to other grid points anymore. Thus, it is important to devise a method that automatically selects the suitable parameters for the combo score function, edgeedge crossing function, and nodeedge crossing function. Secondly, in our algorithm, only undirected graphs are considered to be laid out. On the other hand, for metabolic pathways, [11,13] proposed layout algorithms that decompose a digraph to hierarchical structural parts and directed cycle parts by considering the direction of edges in order to capture the flow of reactions. Therefore, the grid layout algorithm will also need to handle digraphs, utilizing its property that is effective especially in the gridbased layout. Finally, it should be addressed that grid layout algorithms including our new approach requires high time complexity and are not suitable for the realtime drawing. Thus, we would like to devise a further optimized grid layout algorithm to enable the realtime drawing.
Authors' contributions
The basic idea was conceived by MK and MN. This idea was developed by KK and MN who then conceived a new idea and developed it. EJ created the endothelial model in Figure 10. SM supervised the whole study. The final manuscript was read and approved by all authors.
Acknowledgements
Computation time was provided by the Super Computer System, Human Genome Center, Institute of Medical Science, University of Tokyo.
References

Nagasaki M, Doi A, Matsuno H, Miyano S: Genomic Object Net: I. A platform for modelling and simulating biopathways.
Applied Bioinformatics 2003, 2(3):181184. PubMed Abstract

Doi A, Nagasaki M, Fujita S, Matsuno H, Miyano S: Genomic Object Net: II. Modelling biopathways by hybrid functional Petri net with extension.
Applied Bioinformatics 2003, 2(3):185188. PubMed Abstract

Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks.
Genome Research 2003, 13(11):24982504. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Networks/Pajek [http://vlado.fmf.unilj.si/pub/networks/pajek/] webcite

Demir E, Babur O, Dogrusoz U, Gursoy A, Nisanci G, Atalay RC, Ozturk M: PATIKA: an integrated visual environment for collaborative construction and analysis of cellular pathways.
Bioinformatics 2002, 18(7):9961003. PubMed Abstract  Publisher Full Text

Dogrusoz U, Erson EZ, Giral E, Demir E, Babur O, Cetintas A, Colak R: PATIKAweb: a Web interface for analyzing biological pathways through advanced querying and visualization.
Bioinformatics 2006, 22(3):374375. PubMed Abstract  Publisher Full Text

Kurata H, Matoba N, Shimizu N: CADLIVE for constructing a largescale biochemical network based on a simulationdirected notation and its application to yeast cell cycle.
Nucleic Acids Research 2003, 31(14):40714084. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Kurata H, Masaki K, Sumida Y, Iwasaki R: CADLIVE dynamic simulator: direct link of biochemical networks to dynamic models.
Genome Research 2005, 15(4):590600. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Brandes U, Dwyer T, Schreiber F: Visualizing related metabolic pathways in two and a half dimensions.
Proceedings of the 11th International Symposium on Graph Drawing 2003, 111122.

Karp PD, Paley SM: Automated drawing of metabolic pathways.
Proceedings of the 3rd International Conference on Bioinformatics and Genome Research 1994, 225238.

Becker MY, Rojas I: A graph layout algorithm for drawing metabolic pathways.
Bioinformatics 2001, 17(5):461467. PubMed Abstract  Publisher Full Text

Sirava M, Schafer T, Eiglsperger M, Kaufmann M, Kohlgacher O, BornbergBauer E, Lenhof HP: BioMinermodeling, analyzing, and visualizing biochemical pathways and networks.
Bioinformatics 2002, 18(Suppl 2):S219230. PubMed Abstract  Publisher Full Text

Wegner K, Kummer U: A new dynamical layout algorithm for complex biochemical reaction networks.
BMC Bioinformatics 2005., 6(212) PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Li W, Kurata H: A grid layout algorithm for automatic drawing of biochemical networks.
Bioinformatics 2005, 21(9):20362042. PubMed Abstract  Publisher Full Text

Kato M, Nagasaki M, Doi A, Miyano S: Automatic drawing of biological networks using cross cost and subcomponent data.
Genome Informatics 2005, 16(2):2231. PubMed Abstract

Genc B, Dogrusoz U: A constrained, forcedirected layout algorithm for biological pathways.
Proceedings of the 11th International Symposium on Graph Drawing 2003, 314319.

Dogrusoz U, Gral E, Cetintas A, Civril A, Demir E: A compound graph layout algorithm for biological pathways.
Proceedings of the 12th International Symposium on Graph Drawing 2004, 442447.

Pober JS: Endothelial activation: Intracellular signaling pathways.
Arthritis Research 2002, 4(Suppl 3):S109116. PubMed Abstract  BioMed Central Full Text

Matsuno H, Tanaka Y, Aoshima H, Doi A, Matsui M, Miyano S: Biopathways representation and simulation on hybrid functional Petri net.
In Silico Biology 2003, 3(3):389404. PubMed Abstract

Saito A, Nagasaki M, Doi A, Ueno K, Miyano S: Cell fate simulation model of gustatory nuerons with microRNAs doublenegative feedback loop by hybrid functional Petri net with extension.

[http://www.csml.org/download/SCCBLayout_BMC_inst.exe] webcite