Clearly visualized biopathways provide a great help in understanding biological systems. However, manual drawing of large-scale biopathways is time consuming. We proposed a grid layout algorithm that can handle gene-regulatory networks and signal transduction pathways by considering edge-edge crossing, node-edge crossing, distance measure between nodes, and subcellular localization information from Gene Ontology. Consequently, the layout algorithm succeeded in drastically reducing these crossings in the apoptosis model. However, for larger-scale networks, we encountered three problems: (i) the initial layout is often very far from any local optimum because nodes are initially placed at random, (ii) from a biological viewpoint, human layouts still exceed automatic layouts in understanding because except subcellular localization, it does not fully utilize biological information of pathways, and (iii) it employs a local search strategy in which the neighborhood is obtained by moving one node at each step, and automatic layouts suggest that simultaneous movements of multiple nodes are necessary for better layouts, while such extension may face worsening the time complexity.
We propose a new grid layout algorithm. To address problem (i), we devised a new force-directed algorithm whose output is suitable as the initial layout. For (ii), we considered that an appropriate alignment of nodes having the same biological attribute is one of the most important factors of the comprehension, and we defined a new score function that gives an advantage to such configurations. For solving problem (iii), we developed a search strategy that considers swapping nodes as well as moving a node, while keeping the order of the time complexity. Though a naïve implementation increases by one order, the time complexity, we solved this difficulty by devising a method that caches differences between scores of a layout and its possible updates.
Layouts of the new grid layout algorithm are compared with that of the previous algorithm and human layout in an endothelial cell model, three times as large as the apoptosis model. The total cost of the result from the new grid layout algorithm is similar to that of the human layout. In addition, its convergence time is drastically reduced (40% reduction).
Modeling and simulations of large scale biological pathways are some of the most important tasks in Bioinformatics. Many applications, e.g., Cell Illustrator [1,2], Cytoscape , Pajek , PATIKA [5,6], and CADLIVE [7,8] have been developed in this area. Related to these topics, the visualization of biopathways is considered to play a key role in understanding biological systems. However, manual drawing of large-scale biopathways is a time consuming work, hence suitable biopathway layout algorithms and their applications are strongly demanded.
Biopathways are categorized into three types, i.e., metabolic pathways, signal transduction pathways, and gene-regulatory networks. For metabolic pathways, several algorithms have been already proposed [9-13], and some of them succeeded in capturing the flow of the reactions well. In contrast, few layout algorithms that provide a convenient biological understanding have been proposed for signal transduction pathways [14,15] and gene-regulatory networks [16,17]. Thus, our new layout algorithm is focused on signal transduction pathways and gene-regulatory networks. For signal transduction pathways and gene-regulatory networks, extant layout algorithms can be categorized into two types; force-directed and grid layout algorithms.
Force-directed algorithms are used in [16,17] by taking into account the directional constraint following different types of molecular and simple regional constraints from subcellular localizations. These algorithms have been successfully integrated into PATIKA. However, as pointed out in , force-directed algorithms may not be suitable for compact layouts of complex biopathways. Furthermore, intricately shaped regions such as torus-shaped region cannot be handled well as regional constraints in these force-directed algorithms. Hence, they are not suitable for models containing torus-shaped plasma membrane and nuclear membrane although such types of models are common as biopathways.
A grid layout algorithm (referred to as LK-grid layout algorithm) was initially proposed by Li and Kurata. The grid layout algorithm restricts the positions of all nodes to grid points. Li and Kurata defined a cost function for two nodes that depends on some distance between these nodes and the topology of their connections in the graph. They applied LK-grid layout algorithm to a yeast cell-cycle pathway and concluded that this algorithm can geometrically classify the pathway into functional categories without using biological information. Moreover, they noticed that the algorithm generates compact layouts while avoiding overlaps between nodes.  proposed CB-grid layout algorithm, in which so as to reduce edge-edge crossings and node-edge crossings, a penalty for these cases is added to the cost function. The algorithm can also deal with any complex regional constraints following subcellular localizations, and besides search space is reduced due to these constrains. As a result, in the apoptosis model, the layout algorithm succeeded in a drastic reduction of edge-edge crossings and node-edge crossings, while placing nodes in biologically proper regions.
However, in the case of larger-scale networks, this algorithm encountered three problems. First, a layout with randomly placed nodes is used as the initial layout. This random layout contains a large number of edge-edge crossings and node-edge crossings; subsequently, many iterations will be required to obtain a locally optimal layout. Secondly, although one of the features of CB-grid layout algorithm is to use the subcellular localization information, it still does not fully utilize biological characteristics. For example, it does not consider such biological attributes as types of entities (protein, mRNA, and microRNA) or types of processes (phosphorylation, binding, and translation), although in human layouts these biological attributes are apt to contribute to the comprehension of interesting biopathways easier. Thirdly, according to a greedy strategy, CB-grid layout algorithm updates a layout by moving one node at each step until the layout reaches an optimum. However, resulting layouts are just local optima, hence their quality fundamentally depends on the initial layout. Although in  a multi-step CB-grid layout algorithm was also proposed to solve this drawback, it requires higher time complexity and hence is not suitable for practical applications.
To overcome these three problems, we propose a new grid layout algorithm. For the first problem, we propose a new force-directed algorithm whose output is suitable as the initial layout of grid layout algorithms. For the second problem, we introduce the concept that assigns a score i.e., a negative cost, to a layout depending on how nodes with the same attribute are aligned. This concept is realized with a combo score function, which is combined with the cost function defined in CB-grid layout algorithm. For the third problem, the search strategy in CB-grid layout algorithm is improved by adding the swap operation while keeping the time complexity. By the swap operation, the new grid layout can also consider layouts generated by exchanging the positions of two nodes in the current layout at each step.
The Methods section is organized as follows: (i) first, we introduce the previous grid layout, i.e., CB-grid layout algorithm; (ii) for the first improvement in the initial layout of CB-grid layout algorithm, the new force-directed algorithm termed Eades initial layout algorithm is described; (iii) for the second improvement, CCB-grid layout algorithm, which is CB-grid layout algorithm with the combo score function is described; (iv) for the third improvement, SCCB-grid layout algorithm, which enhances CCB-grid layout algorithm by adding the swap operation is presented. In the Results and Discussion section, the performances of these new algorithms are compared and verified by applying them to the signal transduction pathway of an endothelial cell, which is larger than the pathways in  and .
CB-grid layout algorithm: Introduction of the grid layout algorithm
Given a graph G = (V, E) with nodes V and edges E, a layout L = (V, E, U, P) of G consists of the underlying graph G, grid points U and a function P : V → U such that P (vα) ≠ P (vβ) for any two distinct nodes vα, vβ ∈ V. This definition does not allow overlaps between nodes in the layout. For a layout L, this paper uses the following notations.
• WL: a set of vacant points of L.
• Ev: the set of all edges connected to node v.
• |V|: the number of nodes in V.
• |W|: the number of vacant points in L, instead of |WL| if there is no confusion possible.
We define the following operations.
• Tv → p L: the layout generated by moving a node v to a vacant point p ∈ WL.
• Dv L: the layout generated by removing a node v and all edges connected to v.
In addition, we define the following functions.
In our previous approach  (mainly referred to as CB-grid layout algorithm), the layout cost C (L) of L was defined as follows:
where Wee, Wne, and Wd are called respectively edge-edge crossing weight, node-edge crossing weight, and distance cost weight.
The CB-grid layout algorithm repeats the operation of moving a unique node to a vacant point one-by-one until it reaches a locally optimal layout. At each step, the algorithm calculates costs of all layouts that can be generated by moving one of all nodes to one of all vacant points. The layout with the lowest cost is selected as a starting layout for the next step. After reaching convergence, the algorithm outputs a locally optimal layout. If the cost calculation of all possible adjacent layouts is implemented in a naïve way, high time complexity is required. To overcome this problem, the previous method  introduced Δ matrix that stores each possible cost difference at the previous step and succeeded in reducing the time complexity at each step from O (|W| (|V|2 + |E|2) to O (|V|2 + |E|2 + |W||| (|V| + |E|)), where vβ is the node moved at the previous step.
When CB-grid layout algorithm was applied to several biopathways, we encountered three problems. Thus, we propose new grid layout algorithms that solve these problems. Problems and solutions are summarized as follows:
1. Improving the choice of the initial layout: since a locally optimal layout depends noticeably on the initial layout, we first apply Eades initial layout algorithm to a random layout, and use its output as the initial layout. In the previous approach, a random layout was directly used as the initial layout.
2. Improving the cost function: we introduce the concept of a combo score that gives a good score, i.e., a negative cost when nodes with the same biological attribute are aligned (CCB-grid layout algorithm). In CB-grid layout algorithm, the biological attributes, except subcellular localization, were ignored.
3. Improving the search strategy: we propose a better search strategy, which allows us to obtain improved results, keeping the time complexity. For obtaining a better layout, the search space is extended by adding the swap operation. At each step, all layouts obtained by swapping two nodes are also considered (SCCB-grid layout algorithm).
In the remainder of this section, we describe these three new algorithms mentioned above.
Eades initial layout algorithm: generating a new initial layout for grid layout algorithms
In the previous paper , a random layout was used as an initial layout for CB-grid layout algorithm. When the initial layout is far from the global optimum, the local optimum obtained tends to be unacceptable. Therefore, we decided to develop Eades algorithm  and use its output as the initial layout. Eades algorithm is one of the force-directed algorithms, consisting of the following two steps.
1. Two types of forces are defined for each pair of nodes. If two nodes are adjacent, there exists an attractive force ac1 log(d/ac2) between them, where ac1 and ac2 are constants, and d is the distance between the two nodes. On the other hand, if two nodes are not adjacent, there exists a repulsive force rc/ between them, where rc is a constant. At each step, the positions of all the nodes are updated according to the sum of the repulsive and attractive forces between them.
2. The above step is iterated a predetermined number of times, and the final result is obtained.
We have customized two points in Eades algorithm. First, nodes in Eades algorithm can be placed anywhere. All the nodes in the initial layout for CB-grid layout algorithm, however, should be placed on the grid points that satisfy the subcellular localization. Thus, the output of Eades algorithm cannot be used directly as an input for CB-grid layout algorithm.
To handle this problem, we propose to move each node to the closest vacant point that satisfies the subcellular localization after moving nodes at each step.
Second improvement is the following one. Since Eades algorithm doesn't consider edge-edge crossings and node-edge crossings in its implementation, the resulting layout could contain a lot of such crossings. For example, suppose a biological pathway with a subcellular localization, membrane, which slimly surrounds other subcellular localizations as shown in Figure 1(a), the graph in (a) could be a layout resulting from Eades algorithm. In this case, the layout might contain a large number of edge-edge crossings and node-edge crossings because edges cross over other subcellular localizations. In order to avoid this problem, we propose to gather nodes around a particular grid point for each subcellular localization as shown in Figure 1(b). Eades algorithm with the above improvements is called Eades initial layout algorithm.
Figure 1. Two layouts with the same canvas and three subcellular localizations. The grid canvases (a) and (b) have the same biological subcellular localizations extracellular space, plasma membrane, and cytoplasm. Both canvases contain the same graph with four nodes that are located in plasma membrane, which surrounds cytoplasm. In (a), nodes are spread apart in plasma membrane, and edges among these nodes cross over cytoplasm. In (b), the nodes are gathered in the left-top corner, and no edge crosses over cytoplasm. Due to its crossing patterns in (a) these edges have a higher probability to cross other nodes in cytoplasm. This is the drawback of using the layout in (a) as the initial layout for Eades initial layout algorithm.
CCB-grid layout algorithm: utilizing various biological attributes
When humans draw biopathway models, nodes with the same attribute are usually arranged according to a rule. In CB-grid layout algorithm, this type of information is completely ignored. To implement this type of property, we introduce the concept of combo scores called combo1 and combo2 (see Figure 2). Note that a combo score is applied only to nodes having an attribute since some nodes do not have any attributes. We denote the set of nodes having an attribute by V' ⊆ V. In this algorithm, (i) upperGrid(p, i)/lowerGrid(p, i) returns the upper/lower ith grid point over/under a grid point p ∈ P, and (ii) Attr(v) is the attribute of a node v ∈ V', and CWa = (1 + C/||), where C is a constant and normally set to |V|, and is the set of nodes having an attribute a.
Figure 2. Pseudo codes of combo score functions: combo1 and combo2. (a) combo1: a score function that considers nodes with one vertical grid distance from the target node. (b) combo2: a score function that considers nodes with up to two vertical grid distances from the target node, (c) isCombo: a boolean function that takes a node and a grid point as its arguments and returns "true" if the attribute of the node and that of the node on the grid point are the same.
The combo score is designed such that the more nodes with the same attribute are aligned vertically, the higher the score is. The combo score is defined between two nodes, and a combo score of a layout L is defined to be the sum of all the combo scores occurring in L. We say that two nodes have a combo relation when a combo score occurs between them. Note that the horizontal alignment score is not implemented because if the above combo score supported both the vertical and horizontal directions, the numbers of edge-edge crossings and node-edge crossings would be considerably increased. Therefore, we should choose only one direction for combo scores. In this paper, we defined combo scores in the vertical direction. We have considered two types of combo scores, i.e., combo1 and combo2 for layouts in Figure 3(a) and 3(b), respectively. Let nodes va to vf in Figure 3 have the same attribute. The combo1 considers only the nodes with one vertical grid distance from the target node. In contrast, combo2 considers the nodes with up to two vertical grid distances from the target node. For the layout in Figure 3(a), the number of combo relations with combo1 and combo2 are 8 and 12, respectively. If node vf is moved as shown in Figure 3(b), the number of combo relations with combo1 is the same as before, whereas that with combo2 is 14. Thus, only by using combo2, we can improve the combo score when node vf is moved as shown in Figure 3(a) and 3(b). As shown in the dotted rectangle in Figure 3(a), a pair of vertically aligned nodes often occurs during the process of updating a layout. In this case, Figure 3(b) should be a better layout than Figure 3(a). For this reason, we decide to employ combo2. Henceforth, for a node v ∈ V in a layout L, Combov (L) denotes the same combo score as combo2 (v, L). The total score for L is denoted by Combo (L).
Figure 3. An example that compares the features of combo1 and combo2 score functions. (a) An intermediate layout of CCB-grid layout algorithm. In this layout, all six nodes have the same attribute. (b) The next candidate layout that is generated from (a) by moving node vf below node vd. Combo scores of (a) and (b) are the same with combo1 score function. Instead, the combo score of (b) will be better than (a) with combo2 score function.
If CWa returns the same value for any attribute a, many of the nodes with the same attribute will be vertically aligned easily since they have a greater chance to neighbor one another. So as to reduce the biases among the attributes, we define CWa to be inversely related to the total number of the nodes whose attribute is a.
By modifying the layout score of CB-grid layout algorithm, we can define the layout cost C (L) of a layout L with the new concept of the combo score as follows:
where Wcs is called combo score weight. CB-grid layout algorithm improved by the above modification is named Combo score, Cross cost and Biological information grid layout algorithm (CCB-grid layout algorithm). The reason for multiplying the sum of the combo scores by 1/2 is that combo scores are counted twice since a combo score between nodes vα and vβ is included in both (L) and (L). The algorithm is the same as C-optimization (L) step in  except for the use of the above layout cost C (L), i.e., the algorithm for calculating Δ matrix is also the same.
For calculating the combo score for each node, only four nodes need to be checked at most, i.e., its time complexity is constant, while for calculating the edge-edge crossing cost, the node-edge crossing cost, and the distance cost for each node, these time complexities depend on |E|, |V|, and |W|, respectively. Thus, without using Δ matrix, the time complexity related to combo scores is O (|V||W|) at each step.
At each step, we need to calculate the difference between the combo score of the previous layout L and that of the current layout that is generated by moving a node v to a vacant point p, i.e., Combo(Tv→p L) – Combo(L). We can efficiently calculate the difference of the combo score (L) as follows:
We introduced Adjv (L) due to the following reason. First, suppose that three nodes with the same attribute are aligned vertically. We call them vα, vβ, and vγ beginning from the bottom. There are three combo relations among the three nodes: one is between vα and vβ, another between vβ and vγ, and the third between vα and vγ. Although vβ is involved in these three combo relations, the combo relation between vα and vγ is not considered in (L). Therefore, Adjv (L) is needed to correct this type of undercount.
SCCB-grid layout algorithm: extension of the search space due to the swap operation
Another drawback of CB-grid layout algorithm is that only one node can be moved to a vacant point at each step. For example, the layout shown in Figure 4(a) is optimal for CB-grid layout algorithm despite the fact the layout in Figure 4(b) should be selected as the better layout. This limitation is due to the strategy of CB-grid layout algorithm. Thus, we have devised a new algorithm by allowing the swap operations between two nodes while keeping the time complexity. With this improvement, the layout in Figure 4(a) will be arranged as shown in Figure 4(b). The new algorithm is named CCB-grid layout with the swap operation (SCCB-grid layout algorithm). The layout cost function is the same as in CCB-grid layout algorithm. However, a naïve implementation would increase the time complexity to calculate the layout cost for swapped layouts.
Figure 4. An optimal layout of CB-grid and improved layout with the swap operation. (a) An optimal layout for CB-grid layout algorithm. (b) From (a) a better layout will be generated with the swap operation.
In the previous approach , Δ matrix stores cost differences that are induced only by moving nodes to vacant points. As a result, if a grid point of interest was occupied at the previous step, we cannot exploit Δ matrix to calculate cost differences corresponding to that grid point. Since grid points of interest on the swap operation are obviously occupied at the previous step, Δ matrix cannot be used. However, if Δ matrix also stores cost differences related to occupied points, Δ matrix can be exploited for this problematic case, too. We then propose an extended Δ matrix, which considers occupied points as well as vacant points. Since the definition of the cost differences for vacant points cannot be applied directly to occupied points, we decide to calculate the cost differences for the occupied points by calculating it without taking into account the node occupying that grid point and all edges connected to it. In the remainder of this section, we will show how to calculate the extended Δ matrix and then compare the time complexity of the extended Δ matrix and the original Δ matrix.
Henceforth, let us refer to the extended Δ matrix as Δ matrix. Given a layout L, at the first step, we update Δ (L) matrix as follows:
where DIFF0 to DIFF4 are defined in the following way:
where Q shall be defined below.
where DIFF5 to DIFF9 are defined in the following way:
(·) and (·) in DIFF0 to DIFF9 are partial cost functions depending on the two nodes va and vb and the three nodes va, vb, and vc, respectively, they are the sums of the corresponding partial edge-edge crossing costs, node-edge crossing costs and distance costs as follows:
Thus far, we found out a method to efficiently calculate Δ matrix. The purpose of extending Δ matrix is to calculate the cost difference of the swap operation. When nodes and are swapped, we can calculate using these Δ costs as follows:
In SCCB-grid layout algorithm, the combo score also needs to be considered. Given a layout such that a node vα is moved to a vacant point p, can be calculated as shown in Equation (3). In contrast, if two nodes and are swapped, the difference of combo scores, Combo (L) – Combo (L), is effectively calculated as follows:
A pseudo code of SCCB-grid layout algorithm is described in Figure 5.
Figure 5. SCCB-grid layout algorithm. A pseudo code of SCCB-grid layout algorithm.
If node vβ is moved at the previous step, the time complexity of calculating Δ matrix is O ((|V| + |E|)|||U|). If two and are swapped at the previous step, the time complexity of calculating Δ matrix was O ((|V| + |E|) (|| + ||) |U|) = O ((|V| + |E|) |||U|), where || = (|| + ||)/2. In addition, the time complexity of all the swap operations considered at each step is O (|E|2). Therefore, the time complexity of SCCB-grid layout algorithm is O (|E|2 + |U||| (|V| + |E|)) at each step.
Since the time complexity of CB-grid layout algorithm is O (|V|2 + |E|2 + |W||| (|V| + |E|)) at each step , the time complexity of SCCB-grid layout algorithm is O(|V||| (|V| + |E|)) larger than that of CB-grid layout algorithm (note that vβ and vβ' are not distinguished here). Here, we consider two cases, |V| ≤ |W| (case 1) and |V| > |W| (case 2) and show these two algorithms have the same time complexity with high probability. For case 1, the above difference is negligible since O (|V||| (|V| + |E|)) ≤ O (|W|||(|V| + |E|)). In contrast, the O(|V||| (|V| + |E|)) difference cannot be neglected in case 2. However, if we assume that all nodes can be moved to form the next layout with equal probability, |V||| = 2 |E|, and O(|V||| (|V| + |E|)) = O (|V|2 + |E|2) subsequently. Therefore, the time complexity of SCCB-grid layout algorithm will be the same as that of CB-grid layout algorithm even in the case 2. For the above reasons, the time complexities of SCCB-grid and CB-grid layout algorithms are the same in practice.
Results and Discussion
Data and Parameters
To evaluate our algorithms on a large-scale signal transduction pathway with a gene regulatory network, we create the pathway model of an endothelial cell with Cell Illustrator [1,2] by extracting information from . The model consists of 309 nodes and 371 edges (three times as large as the apoptosis model in , which consists of 117 nodes and 126 edges), and the maximum degree of a node is ten (eight in the apoptosis model). Grid widths and heights are fixed to 100 pixels; the total numbers of vertical and horizontal grid points are 36 and 40, respectively. We used the following information pertaining to seven GO subcellular localizations: extracellular space (GO:0005615), cytoplasm (GO:0005737), nucleus (GO:0005634), mitochondrion (GO:0005739), plasma membrane (GO:0005886), nuclear membrane (GO:0005635), and mitochondria membrane (GO:0005740). We also used the following information pertaining to sixteen processes and entities used as attributes of nodes: migration, phosphorylation, protein with a modification, ligand, assembly, transcription, translation, mRNA, ligand and receptor, receptor, unknown, protein, exchange, trimer, ubiquitination, and degradation.
Usually, these types of biological models have many nodes termed as degradation. The degradation process always has only one edge. To exploit this property, we apply these layout algorithms after removing degradation nodes (97 nodes). After applying layout algorithms, we attach each eliminated degradation node just below the entity to which it was initially connected. Thus, in practice, the numbers of nodes and edges in the model given to layout algorithms are 212 and 274, respectively. Note that when the performances of algorithms are compared with the numbers of edge-edge crossings and node-edge crossings in the latter part of this section, crossings that are caused by degradations and edges connected to them are not taken into account.
We apply the following rule to edge-edge crossing weight Wee, node-edge crossing weight Wne, combo score weight Wcs, and distance cost weight Wdc of a layout cost, in Equation (2), to ensure that the importance of the distance cost is less than those of the others:
In our study, Wdc, Wee, Wne, and Wcs were set to 1, 70, 150, and 110, respectively. Also, the constant C in CWa was set to 12.
Using the combo score, many nodes can be aligned vertically. However, in many cases, the nodes cannot be moved once they have combo relations. Plasma membrane, nuclear membrane, and mitochondrial membrane are thin and torus shaped, thus, vertical alignments of the nodes on these subcellular localizations will not be of interest for users (e.g., the width of plasma membrane in our model is only two grids). Therefore, in this paper, we decided to ignore combo scores in plasma membrane, nuclear membrane, and mitochondrial membrane.
Comparison of layouts
Figure 6 shows the number of edge-edge crossings, the number of node-edge crossings, combo scores, and total costs of the layouts with CB-grid, CCB-grid, and SCCB-grid layout algorithms, and the human layout. We generate ten initial layouts by applying Eades initial layout algorithm to ten random layouts. These initial layouts are commonly used for each layout algorithm (CB Eades, CCB Eades, and SCCB Eades in Figure 6). In addition, we use the ten random layouts directly as initial layouts of CB-grid layout algorithms (CB random in Figure 6, which corresponds to the previous layout algorithm) to confirm the significance of preparing proper initial layouts. Figure 8 and 9 respectively show the best layouts of CB-grid and SCCB-grid layout algorithms, which have the lowest total cost among ten resulting layouts of each algorithm. The human layout is shown in Figure 10.
Figure 6. Comparisons of edge-edge crossings, node-edge crossings, combo score, and total cost among the results of four grid layout algorithms and the human layout. Costs and scores of the generated layouts with the CB random, CB Eades, CCB Eades, SCCB Eades, and human layout from the same initial layout. These algorithms are applied to ten initial layouts. (a) the number of edge-edge crossings. (b) the number of node-edge crossings. (c) combo score. (d) total cost.
Figure 8. A resulting layout of CB-grid layout algorithm. A resulting layout of CB-grid layout algorithm in an endothelial signal transduction pathway. The pathway model is the same as that in Figure 10.
Figure 9. A resulting layout of SCCB-grid layout algorithm. A resulting layout of SCCB-grid layout algorithm in an endothelial signal transduction pathway. The pathway model is the same as that in Figure 10.
Figure 10. The human layout. The human layout of an endothelial signal transduction pathway. This pathway model is arranged with CB-grid and SCCB-grid layout algorithms in Figure 8 and Figure 9, respectively.
In , the initial layout for CB-grid layout algorithm was a random layout, which had a large number of edge-edge crossings and node-edge crossings. Many iterations will, therefore, be needed until convergence. This fact prompted us to use the output of Eades initial layout algorithm as an initial layout. Figure 7 shows the number of iterations until convergence. As shown in this figure, CB-grid Eades successfully reduces the number of iterations when compared to CB-grid random (40% reduction on average). Moreover, the total score of CB-grid Eades is greatly improved over that of CB-grid random (see Figure 6(d)). A discussion in  was suggesting that reducing edge-edge crossings and node-edge crossings will lead to a better approximation of the human layout. In contrast as shown in Figure 6(a) and 6(b), the human layout also has several edge-edge and node-edge crossings, and has a higher combo score than that of CB-grid layout algorithm. Based on these facts, we proposed an additional scoring criterion – combo score – in CCB-grid layout algorithm. As seen through the value of combo scores (see Figure 6(c)), CCB-grid layout algorithm drastically improves this score, and this score becomes closer to that of the human layout. However, the numbers of edge-edge crossings and node-edge crossings in CCB-grid layout algorithm increase, comparing to CB-grid Eades (see Figure 6(a) and 6(b)). In this paper, the swap operation is proposed to increase the number of candidate layouts at each step. As shown in Figure 6(a) and 6(b), SCCB-grid layout algorithm succeeds in reducing edge-edge crossings and node-edge crossings, i.e., the above drawback of CCB-grid layout algorithm is partially diminished. In addition, as shown in Figure 6(c), the combo score of SCCB-grid layout algorithm is also improved slightly.
Figure 7. Comparisons of the total numbers of iterations for optimal layouts among four grid layout algorithms. Total number of iterations for optimal layouts with CB random, CB Eades, and SCCB Eades from the same initial layout. Ten initial layouts are applied with these algorithms.
We also apply grid-layout algorithms to Fas-induced apoptosis pathway model  and ASE cell fate simulation model  to obtain a more generalized comparison. Resulting layouts and the number of crossings in each layout are summarized in Additional file 1. These models including the endothelial cell model are also available as Additional file 2, and the application of SCCB-grid layout algorithm for these models can be downloaded from .
Additional file 1. Resulting layouts of applying LK-grid layout algorithm, CB-grid layout algorithm and SCCB-grid layout algorithm to Fas-induced apoptosis pathway model and ASE cell fate simulation model are shown. Comparison of these results are also included.
Format: PDF Size: 3.1MB Download file
This file can be viewed with: Adobe Acrobat Reader
Format: ZIP Size: 76KB Download file
For better biopathway layouts, three improvements to CB-grid layout algorithm were proposed: (i) the improvement of initial layouts (ii) the improvement of cost function (iii) the improvement of search strategy itself without increasing the time complexity. For (i), Eades initial layout algorithm was proposed and the improvement was confirmed with a signal transduction pathway of an endothelial cell. For (ii), CCB-grid layout algorithm, which includes combo score function, was proposed and the improvement was verified with the same signal transduction pathway. For (iii), SCCB-grid layout algorithm was proposed. Due to (i) and (iii), our layout algorithm can be started from the better layout, and more robust to the condition of the initial layout than extant methods. In addition, we succeeded in utilizing the biological attributes that are not considered in extant methods due to combo score.
However, our layout algorithm has limitations and problems, which should be addressed in future work. Firstly, if the parameters of the combo score are not correctly selected, once a node gets a combo relation, the node no longer moves to other grid points anymore. Thus, it is important to devise a method that automatically selects the suitable parameters for the combo score function, edge-edge crossing function, and node-edge crossing function. Secondly, in our algorithm, only undirected graphs are considered to be laid out. On the other hand, for metabolic pathways, [11,13] proposed layout algorithms that decompose a digraph to hierarchical structural parts and directed cycle parts by considering the direction of edges in order to capture the flow of reactions. Therefore, the grid layout algorithm will also need to handle digraphs, utilizing its property that is effective especially in the grid-based layout. Finally, it should be addressed that grid layout algorithms including our new approach requires high time complexity and are not suitable for the real-time drawing. Thus, we would like to devise a further optimized grid layout algorithm to enable the real-time drawing.
The basic idea was conceived by MK and MN. This idea was developed by KK and MN who then conceived a new idea and developed it. EJ created the endothelial model in Figure 10. SM supervised the whole study. The final manuscript was read and approved by all authors.
Computation time was provided by the Super Computer System, Human Genome Center, Institute of Medical Science, University of Tokyo.
Applied Bioinformatics 2003, 2(3):181-184. PubMed Abstract
Applied Bioinformatics 2003, 2(3):185-188. PubMed Abstract
Genome Informatics 2005, 16(2):22-31. PubMed Abstract
In Silico Biology 2003, 3(3):389-404. PubMed Abstract