Abstract
Background
Graph drawing is one of the important techniques for understanding biological regulations in a cell or among cells at the pathway level. Among many available layout algorithms, the spring embedder algorithm is widely used not only for pathway drawing but also for circuit placement and www visualization and so on because of the harmonized appearance of its results. For pathway drawing, location information is essential for its comprehension. However, complex shapes need to be taken into account when torusshaped location information such as nuclear inner membrane, nuclear outer membrane, and plasma membrane is considered. Unfortunately, the spring embedder algorithm cannot easily handle such information. In addition, crossings between edges and nodes are usually not considered explicitly.
Results
We proposed a new gridlayout algorithm based on the spring embedder algorithm that can handle location information and provide layouts with harmonized appearance. In gridlayout algorithms, the mapping of nodes to grid points that minimizes a cost function is searched. By imposing positional constraints on grid points, location information including complex shapes can be easily considered. Our layout algorithm includes the spring embedder cost as a component of the cost function. We further extend the layout algorithm to enable dynamic update of the positions and sizes of compartments at each step.
Conclusions
The new spring embedderbased gridlayout algorithm and a spring embedder algorithm are applied to three biological pathways; endothelial cell model, Fasinduced apoptosis model, and C. elegans cell fate simulation model. From the positional constraints, all the results of our algorithm satisfy location information, and hence, more comprehensible layouts are obtained as compared to the spring embedder algorithm. From the comparison of the number of crossings, the results of the gridlayoutbased algorithm tend to contain more crossings than those of the spring embedder algorithm due to the positional constraints. For a fair comparison, we also apply our proposed method without positional constraints. This comparison shows that these results contain less crossings than those of the spring embedder algorithm. We also compared layouts of the proposed algorithm with and without compartment update and verified that latter can reach better local optima.
Background
For biological pathways such as signal transduction pathways, gene regulatory networks, and metabolic pathways, one of the crucial techniques for understanding their characteristics is to use graph visualization. Both publicly [1] and commercially available pathway databases [2] display retrieved pathways in the form of graphs to enable users to understand them easily. Usually, in these databases, a large number of pathways are retrieved with various types of criteria according to biologists' purposes. However, it is laborious to manually draw graphs for each request, and hence, automatic layout algorithms specialized for biological pathways are strongly desired.
Thus far, several types of drawing algorithms have been designed for biological pathways and they have been integrated in biological modeling and/or simulation software, e.g., Cell Illustrator [3,4], Pajek [5], PATIKA [6,7], and CADLIVE [8,9].
Karp and Paley extracted biological topologies such as linear, cyclic, and branching pathways and used them as the backbone of the layout [10]. For chemical reaction networks, Becker and Rojas proposed a method [11] that uses the longest directed cycle as the backbone of the layout to capture the flow of reactions. On the other hand, Wegner and Kummer used recursively extracted small cycles as the backbone of the layout [12], because such cycles are known to participate in important recycling processes. Their work has been implemented as an SBML application [13].
Several biological properties are considered in spring embedder approaches. The use of edge directions and simple positional constraints has been proposed for more general metabolic pathways [14,15]. In GOlorize [16], an additional attractive force is applied to nodes belonging to the same Gene Ontology class. An SBML layout extension, SBWAutoLayout [17], employs the spring embedder approach as its layout algorithm. Schreiber et al. [18] proposed to a generic layout algorithm where the spring force cost is optimized independently in horizontal and vertical directions. Due to the optimization strategy, the algorithm can handle placement constraints for the biological comprehension such as horizontal or vertical aligning of nodes, nonoverlapping of nodes, and keeping the same network motifs to some formation. Spring embedder approaches are very popular in the field of Bioinformatics because of the harmonized appearance of their results. However, Li and Kurata noted that spring embedder approaches are not suitable for generating compact layouts of complex pathways [19]. In addition, such approaches have a difficulty in handling complicated positional constraints such as arranging some nodes only on a toursshaped region, which corresponds to cellular membranes, e.g., nuclear inner membrane, nuclear outer membrane, and plasma membrane.
A grid layout algorithm for biological networks was first proposed by Li and Kurata, in which nodes of the given graph are mapped to grid points and the locally optimal mapping of nodes in terms of the defined cost function is searched over all possible mappings [19]. The cost function is defined by the weighted sum of several components: node distances weighted according to the graph structure and Manhattan distance [19], edgeedge and nodeedge crossings [20], rewarding scores for the aligned nodes possessing the same biological attributes [21], and negative inner product between directions of inedges and outedges that induces the traceability of the flows [22]. Because finding optimal mapping is NPhard [23] even when only edgeedge crossings are considered in the cost function, the basic grid layout algorithm repeatedly updates the layout by moving nodes one by one under a greedy search strategy, and a locally optimal layout is obtained after convergence. For efficient calculation, the cost differences calculated by checking movements of a node to a grid point at the current step are cached for calculating the cost differences at the next step [19]. Swapping the positions of two nodes is additionally considered at update steps for the better local optimum without increasing the time complexity [21], whereas Barsky et al. restricted movements of a node to stochastically selected grid points at update steps [24]. Further, the reduction of time complexity is accomplished by using sweep calculation algorithm [22], which can efficiently calculate edgeedge and nodeedge crossings and the Manhattan distance. In addition, grid layout algorithms can deal with complicated positional constraints that are often assumed in biological networks as subcellular localization information, and thus, they succeed in generating compact and biologically comprehensible layouts.
We propose a new gridlayoutbased spring embedder algorithm that considers the spring force cost as a component of the cost function of a gridlayout algorithm. Hence, the cost function consists of the spring force cost and edgeedge and nodeedge crossings. As stated above, the sweep calculation can be used to efficiently count the number of edgeedge and nodeedge crossings and calculate the Manhattan distance. However, the sweep calculation cannot be used to calculate the spring force. Therefore, we devise a new caching approach for calculating the spring force and propose a new layout algorithm having the same time complexity.
The remainder of this paper is organized as follows. In the Results and Discussion section, we discuss the performance of the proposed algorithm by comparing it with that of the conventional spring embedder approach on three biological networks. The conclusions of our work are presented in the Conclusions section. Finally, the Methods section describes the procedure of the proposed algorithm and its time complexity.
Results and Discussion
Experimental settings and results
We compare our proposed algorithm (Grid Layout) with a spring embedder layout algorithm [25] (Spring). As the attraction force F_{a}(d) and repulsion force F_{r}(d), d^{2 }and 1/d^{2 }are used, respectively, where d is the distance between nodes (here, the Euclidean distance is adopted). We use three biological networks that were constructed from curated knowledge in biological literatures:
• Endothelial cell model [26]: 221 nodes and 274 edges.
• Fasinduced apoptosis model [27]: 84 nodes and 93 edges.
• Cell fate simulation model of C. elegans [28]: 53 nodes and 59 edges.
Grid resolutions of 73 × 81, 39 × 29, and 26 × 21 are used for the endothelial cell model, Fasinduced apoptosis model, and cell fate simulation model of C. elegans, respectively. Both algorithms are implemented in Java and experiments were performed on a Core microarchitecturebased Xeon 3.0 GHz processor. For each model, ten random layouts are generated and applied to Grid Layout and Spring. Since nodeedge crossings cause the difficulty on distinguishing the node connections, we consider nodeedge crossings as more problematic factor than edgeedge crossings and then set more weight for nodeedge crossings than edgeedge crossings in the cost function. Specifically, weight for nodeedge crossings w_{n }is two times as much as that for edgeedge crossing w_{e}, i.e., w_{n }= 2 × w_{e}. For the three pathway networks, the numbers of rows and columns of the grid are determined by setting the grid interval as the size of basic elements and setting the canvas size as the size used in the manually created layout. Li and Kurata stated that the desirable numbers of rows and columns of grid are proportional to [19].
Since by calculating (the number of row + the number of columns) for the three pathways we have (73 + 81)/ = 10.76, (39 + 39)/ = 7.42, (26 + 21)/ = 6.46, our grid resolutions somehow follow the assumption in [19]. The repulsion force is defined to be inversely proportional to the square of distance between nodes, and the force comes from all nodes. As we discussed, the grid size is proportional to and the repulsion force does not depend on the grid size if nodes are evenly distributed in the canvas ((1/)^{2 }× V = 1). Thus, we use the same weight for repulsion force (w_{r }= 1) among three networks. Remnant weights to be adjusted are attraction force w_{a }and edgeedge crossings w_{e}. We empirically select their parameter ranges as w_{a }= {1, 5, 10, 12} and w_{e }= {10, 50}, respectively. Figures 1, 2, and 3 respectively show the layouts of the Fasinduced apoptosis model, cell fate simulation model of C. elegans, and endothelial cell model obtained from Grid Layout and Spring, which has the minimum cost among the results from ten random layouts. Note that here only a resulting layout under one of the above parameter sets is given for each model (for Fasinduced apoptosis model w_{a }= 1, w_{r }= 1, w_{e }= 10, w_{n }= 20; cell fate simulation model of C. elegans w_{a }= 1, w_{r }= 1, w_{e }= 50, w_{n }= 100; and endothelial cell model w_{a }= 12, w_{r }= 1, w_{e }= 50, w_{n }= 100). For results under other parameter sets, see Section 1 of Additional file 1.
Figure 1. Resulting layouts of Fasinduced apoptosis model obtained from Grid Layout (a) and Spring (b). Because the spring embedder algorithm does not consider location information, this location information is not shown in its resulting layout.
Figure 2. Resulting layouts of cell fate simulation model of C. elegans obtained from Grid Layout (a) and Spring (b). Because the spring embedder algorithm does not consider location information, this location information is not shown in its resulting layout.
Figure 3. Resulting layouts of endothelial cell model obtained from Grid Layout (a) and Spring (b). Because the spring embedder algorithm does not consider location information, this location information is not shown in its resulting layout.
Additional file 1. Comparison of the resulting layouts under several parameter sets (Section 1) and among three cost functions (Section 2). Layouts of Fasinduced apoptosis model, cell fate simulation model of C. elegans, and endothelial cell model obtained by the proposed algorithm under several parameter sets are compared in Section 1. From the comparison, the influence of parameters to positions of nodes and the number of crossings are discussed. In Section 2, resulting layouts of Grid Layout, Grid Layout without considering spring force cost, and Grid Layout considering spring force cost are compared on the three models. By using box plots for the numbers of edgeedge and nodeedge crossings on layouts from these algorithms, the effectiveness of spring force cost is discussed.
Format: PDF Size: 7.8MB Download file
This file can be viewed with: Adobe Acrobat Reader
The resulting layouts are generated using an XML format called Cell System Markup Language (CSML), and these can be directly displayed by using the Cell Illustrator Player in a Web browser. All URL links are listed in Table 1. In the layouts, the cellular membrane, nucleus, mitochondria, and Golgi apparatus are depicted by a blue frame, yellow circle, red oval, and brown crabshaped object, respectively. In order to analyze the number of crossings in the layouts and the running time, ten random layouts for each model are also applied to Grid Layout without positional constraints (hereafter called Grid Layout NL). For these random layouts, we compare the numbers of edgeedge and nodeedge crossings in the resulting layouts and the running time of Grid Layout and Spring. These comparisons are summarized in Figures 4, 5, and 6 for the Fasinduced apoptosis model, cell fate simulation model of C. elegans, and endothelial cell model, respectively.
Table 1. Summary of layout results.
Figure 4. Comparisons of number of edgeedge crossings (left), number of nodeedge crossings (middle), and running time (right) for Fasinduced apoptosis model. Numbers of edgeedge intersections and nodeedge intersections and running time for Grid Layout, Spring, and Grid Layout NL (Grid Layout considering no location information) are compared using box plots. These indicators are obtained by applying these three algorithms to ten randomly obtained layouts of the Fasinduced apoptosis model.
Figure 5. Comparisons of number of edgeedge crossings (left), number of nodeedge crossings (middle), and running time (right) for cell fate simulation model of C. elegans. Numbers of edgeedge intersections and nodeedge intersections and running time for Grid Layout, Spring, and Grid Layout NL (Grid Layout considering no location information) are compared using box plots. These indicators are obtained by applying these three algorithms to ten randomly obtained layouts of the cell fate simulation model C. elegans.
Figure 6. Comparisons of number of edgeedge crossings (left), number of nodeedge crossings (middle), and running time (right) for endothelial cell model. Numbers of edgeedge intersections and nodeedge intersections and running time for Grid Layout, Spring, and Grid Layout NL (Grid Layout considering no location information) are compared using box plots. These indicators are obtained by applying these three algorithms to ten randomly obtained layouts of the endothelial cell model.
As shown in the two lefthand side plots of Figures 4, 5, and 6, the resulting layouts from Grid Layout tend to contain more edgeedge and nodeedge crossings than those from Spring. This may be because positional constraints restrict the search space of Grid Layout. This hypothesis is also reinforced by the results of Grid Layout NL, which contains a lesser or, occasionally a comparable number of crossings as compared to those of the spring embedder algorithm among three cases on both edgeedge and nodeedge crossings.
Although the above comparison appears to suggest that positional constraints degrade the quality of the resulting layouts, in the next subsection, we show that how location information serves to improve the understandability of biological networks while surveying the results of Grid Layout.
Dynamic resizing and repositioning of compartments
Dynamic resizing and repositioning of compartments are considered in our proposed algorithm. Li and Kurata [19] stated that as an empirical rule, setting vertical and horizontal sizes of canvas proportional to the square root of the number of nodes is suitable for most networks. This rule can also be applied to size the compartment according to the nodes localized in it. However, if nodes in a compartment are densely connected, they tend to create cluster, and thus they do not fill out the space optimally. By making the size of the compartment smaller, better quality layout will be obtained. On the other hand, if nodes are distributed uniformly enough in the compartment, its enlargement might be required for the better quality of the layout. Also, if the center of the compartment is away from the nodes' center of gravity, it might spoil the quality of the resulting layout. Thus, we consider dynamic update of sizes and positions of compartments iteratively at each step. Hereafter, we call Grid Layout with dynamic compartment update as GDC. Figures 7, 8, and 9 show the minimum cost resulting layouts obtained from GDC for Fasinduced apoptosis model, cell fate simulation model of C. elegans, and endothelial cell model, respectively. Weights for the cost functions are the same as those of the experiments in the previous section. Initial sizes and positions of the compartments are the same as in layouts for Grid Layout. In this study, we keep the size and position of the extracellular or cellular membrane, and then update the sizes and positions of other components inside of the cellular membrane such as nucleus, mitochondria, and Golgi apparatus. The detailed procedures of dynamic compartment update are in the following method section. As a common property in the layouts of the three models, nodes of the layouts of GDC are centered on each compartment, whereas nodes of the layouts Grid Layout tend to be positioned only on a part of each compartment. In addition, the compartments are well resized and then are filled out with the nodes enough, e.g., nodes on nucleus in Figure 9, comparing to nodes on nucleus in the layout image of Figure 3(a). We also compare the total cost, the number of edgeedge crossings, the number of nodeedge crossings, and computational time of the resulting layouts of Grid Layout and GDC. The box plots of total cost, the number of edgeedge crossings, the number of nodeedge crossings, and computational time are summarized in Figures 10, 11, and 12. Although GDC requires slightly more computational time than Grid Layout, GDC provides better or competitive results in other indicators. Since the repositioning of compartments is allowed in GDC, the positions of compartments are moved to more desirable positions, which contributes to the better cost, the number of edgeedge crossings, and the number of nodeedge crossings. On the other hand, since the consideration of the dynamic compartment update spreads out the search space of the layouts, the more steps are required to reach local optima.
Figure 7. Resulting layouts of Fasinduced apoptosis model obtained from GDC (Grid Layout with dynamic compartment update). Iterative update of the sizes and positions of nucleus and mitochondria is considered at each step.
Figure 8. Resulting layouts of cell fate simulation model of C. elegans obtained from GDC (Grid Layout with dynamic compartment update). Iterative update of the size and position of nucleus is considered at each step.
Figure 9. Resulting layouts of endothelial cell model obtained from GDC (Grid Layout with dynamic compartment update). Iterative update of the sizes and positions of nucleus, mitochondria, and Gologi apparatus are considered at each step.
Figure 10. Comparisons of total cost (left), number of edgeedge crossings (middle left), number of nodeedge crossings (middle right), and running time (right) for Fasinduced apoptosis model. Total costs, Numbers of edgeedge intersections and nodeedge intersections and running time for Grid Layout and GDC (Grid Layout with dynamic compartment update) are compared using box plots. These indicators are obtained by applying these two algorithms to ten randomly obtained layouts of the Fasinduced apoptosis model.
Figure 11. Comparisons of total cost (left), number of edgeedge crossings (middle left), number of nodeedge crossings (middle right), and running time (right) for cell fate simulation model of C. elegans. Total costs, Numbers of edgeedge intersections and nodeedge intersections and running time for Grid Layout and GDC (Grid Layout with dynamic compartment update) are compared using box plots. These indicators are obtained by applying these two algorithms to ten randomly obtained layouts of the cell fate simulation model C. elegans.
Figure 12. Comparisons of total cost (left), number of edgeedge crossings (middle left), number of nodeedge crossings (middle right), and running time (right) for endothelial cell model. Total costs, Numbers of edgeedge intersections and nodeedge intersections and running time for Grid Layout and GDC (Grid Layout with dynamic compartment update) are compared using box plots. These indicators are obtained by applying these two algorithms to ten randomly obtained layouts of the endothelial cell model.
Discussion
The first model shown in Figure 1 is a famous signal transduction pathway, apoptosis, which is known to participate in various biological processes such as development, maintenance of tissue homeostasis, and elimination of cancer cells. Malfunctions of apoptosis have been implicated in many forms of human diseases such as neurodegenerative diseases, AIDS, and ischemic stroke. Apoptosis is reportedly caused by various inducers such as chemical compounds, proteins, or removal of NGF. The biochemical pathways of apoptosis are complex and depend on both the cells and the inducers. In particular Fasinduced apoptosis has been studied in detail and its simulation model has been proposed [27]. Fas ligands, which usually exist as trimmers in the extracellular region, bind and activate their receptors by inducing receptor trimerization in the cytoplasm membrane region. Activated receptors recruit adaptor molecules such as Fasassociating protein with death domain (FADD), which recruit procaspase8 to the receptor complex, where it undergoes autocatalytic activation in the cytoplasm. Activated caspase8 activates caspase3 through two pathways. In the complex pathway, caspase8 cleaves the Bcl2 interacting protein and its COOHterminal part translocates to the mitochondria where it triggers the release of cytochrome c. The cytochrome c released from the mitochondria binds to apoptotic protease activating factor1 (Apaf1) together with dATP and procaspase9 and activates caspase9 in the cytoplasm. Caspase9 cleaves procaspase3 and activates caspase3. In other pathway, caspase8 cleaves procaspase3 directly and activates it. In the nucleus, caspase3 cleaves DNA fragmentation factor (DFF) 45 in a heterodimeric factor of DFF40 and DFF45. The cleaved DFF45 dissociates from DFF40, inducing the oligomerization of DFF40 that has DNase activity. The active DFF40 oligomer causes internucleosomal DNA fragmentation, which is an apoptotic hallmark indicative of chromatin condensation. As stated above, these reaction events are strictly regulated in specific cellular locations, and therefore, the corresponding location information cannot be ignored in the resulting layout. Figure 1(a) clearly shows the regulation of these events in each cellular location, i.e., plasma membrane, cytoplasm, mitochondria, and nucleus. In contrast, Figure 1(b) shows the two different flows by Fasinduced apoptosis; however, it is difficult to capture the location information of each event.
Figure 2 shows the cell fate determination model of two gustatory neurons of C. elegans  ASE left (ASEL) and ASE right (ASER) [28]. These neurons are morphologically bilaterally symmetric but physically asymmetric in function, and their fates are strictly regulated by the double negative feedback loop (DNFL), the main path of which consists of four steps: (i) activation of DIE1 protein leads to the activation of lsy6 miRNA in the nucleus; (ii) lsy6 miRNA is transported from the nucleus and inhibits the translation of cog1 mRNA into COG1 protein; (iii) if the COG1 protein is not suppressed, then it is translocated into the nucleus and activates the transcription of mir273 miRNA in the nucleus; and (iv) mir273 miRNA is transported from the nucleus and inhibits the translation of die1 mRNA; this completes the loop to (i). In a manner similar to apoptosis, these DNFL reaction events are strictly regulated in specific cellular locations, and therefore, the corresponding location information cannot be ignored in the resulting layout. Although Figure 2(b) shows these steps, it is difficult to capture the location information of each step. For instance, most of the proteins and miRNAs in this model, e.g., COG1 protein, LIM6 protein, DIE1 protein, mir273 miRNA, and lsy6 miRNA, translocate between the nucleus and the cytoplasm. However, such information cannot be inferred from this figure. In contrast, Figure 2(a) shows the regulations of these steps while keeping the cellular location of each step, i.e., cytoplasm and nucleus.
These differences can be observed more clearly in larger models. Figure 3 shows the responses of endothelial cells to the tumor necrosis factor, with an emphasis on the induction of endothelial leukocyte adhesion molecules with more elements than in the other two models [26]. Since adhesion molecules are usually localized on the plasma membrane, many molecules should be on the plasma membrane domain. Usually, external signals are received by these adhesion molecules and transferred into the molecules located in the cytoplasm. Finally, these signals trigger the translation of mRNAs in the nucleus. From the viewpoint of the density of nodes, Figure 3(b) appears to suitably keep a uniform density of nodes. Unfortunately, in terms of the understanding of cascading events, Figure 3(b) does not provide completely useful information because, due to the lack of location information, it is difficult to interpret the network as the response model by the tumor necrosis factor from the external region to the nucleus via the cytoplasm. On the other hand, as shown in Figure 3(a), our layout requires no difficulty in tracing the flow of biological cascading events in our layout, i.e., a reader can easily interpret the network as the response model of a cell from the external region to the nucleus via the cytoplasm as a signal flow.
Conclusions
We propose a gridlayoutbased spring embedder algorithm that exploits the advantages on both methods, i.e., consideration of location information and harmonized layouts. Not only the harmonized appearance of resulting layouts, spring force also contributes the reduction of crossings, which is verified by comparing two cases of grid layout algorithms: (i) without considering distance cost and (ii) considering only spring force. (Section 2 of Additional file 1). Although only spring force is considered as the distance cost in this study, we can incorporate other distance cost such as the Manhattan distance cost in the cost function simultaneously.
In addition, to explicitly consider the reduction of crossings, edgeedge and nodeedge crossing costs are included in the cost function. To calculate spring forces among nodes, we proposed an efficient calculation method for spring force cost with O(V^{2}·h·w), and to calculate other costs, we employ the sweep calculation [22], which can count the crossings for all the possible movements of a node at once. By applying the proposed algorithm and spring embedder algorithm to three biological networks, we verified that the consideration of location information significantly improves the understandability of a network from a biological viewpoint.
In order to realize better biological pathway layouts, under the framework of grid layout, several useful cost functions were proposed, e.g., (i) rewarding score for aligned nodes in one line with the same attribute and (ii) negative inner product of directions of inedge and outedges. Feature (i) is very important for biologists because the nodes in a biological pathway usually have biological attributes, e.g., a node is mRNA, protein, modified protein, or complex of proteins, and they explicitly distinguish these components. Feature (ii) is also very important for biochemists because it helps in understanding the reaction flows of the biological pathway. These cost functions can be easily plugged in to our grid layout algorithm without increasing the time complexity. Furthermore, to obtain a better resulting layout, we can also introduce the swapping operation of nodes at each step of moving a node to a vacant grid point to increase the search space while keeping the time order.
Our proposed algorithm succeeded in realizing the required features for biological pathway layouts; however, several enhancements are still required. For example, usually, the combination and order of some biological reactions can be grouped, and thus, this set of reactions, e.g., phosphorylation and dephosphorylation, and related biological elements, e.g., protein and modified protein, can be considered as subgraphs that consume several grid points. Although in our search strategy the final result might fall into bad local optima, for the better local optimum, we can use simulated annealing or other techniques which enables escape from the bad local optima although more computational time is required for the final result. If we could extend the current grid layout algorithm to allow the movement of multiple fixed structured nodes at once, then the required feature would be realized. Our layout framework assumes that compartments representing subcellular localizations are allocated by users in advance and then the layout algorithm is applied, but we also considered dynamic adjustment of sizes and positions of these compartments. In this work, the initial state of compartments are given in advance. For automatically providing their initial state, the following approach can be considered as an example. The size of compartment can be determined by the square root of the number of nodes that localize in the same compartment. For the positions of the compartments, we put pair of compartments in close positions if many edges are bridging them.
In addition, the bending of edges that enables bypassing edgeedge and nodeedge crossings has not been considered in the current grid layout algorithms. This could be achieved by considering bends as virtual nodes and handling them in a manner similar to normal nodes in search steps.
Methods
Given a graph G = (V, E) and a grid of h rows and w columns, we define a cost function for mappings of nodes to grid points and show an algorithm that finds the mapping of nodes, minimizing the cost function in a greedy manner. The cost function is defined by the weighted sum of four components:
(a) Attraction force F_{a}(d(P (v), P (u))) between pairs of adjacent nodes v and u in the graph G, where P (v) and P (u) are grid points to which v and u are mapped, respectively, and d(P (v), P (u)) is the distance between two grid points P (v) and P (u).
(b) Repulsion force F_{r}(d(P (v), P (u))) between any pairs of nodes v and u.
(c) Number of edgeedge crossings ∑_{e.f∈E}I_{e}(e, f), where I_{e}(e, f ) is a binary function that returns 1 if e and f cross with each other and 0 otherwise.
(d) Number of nodeedge crossings ∑_{u∈V,e∈E}I_{n}(v, e) where I_{n}(v, e) is a binary function that returns 1 if v and e cross with each other and 0 otherwise.
Formally, the cost function is given by
where (v) is the set of adjacent nodes of v, and w_{a}, w_{r}, w_{e}, and w_{n }∈ R^{+ }are weights for the components.
Search algorithm
In grid layout, nodes are mapped to different grid points, i.e., no grid point is occupied by more than one node. Our algorithm optimizes the cost function by moving a node to an empty grid point at each step in a greedy manner. Note that, given positional constraints, nodes are allowed to be moved only to empty grid points satisfying the positional constraints, e.g., if a node is localized only in the cellular membrane, it can be mapped only to those grid points corresponding to cellular membrane. The above operation can be performed by calculating delta cost, which is the cost difference by the movement of a node to a grid point, for all nodes and for all vacant grid points. Although a naïve algorithm requires O(V^{2}·h·w) time to find the movement that reduces the cost most at each step, we devise an efficient method that requires O(E^{2}·min(h, w) + h·w) time for finding the movement, which is described below.
Efficient calculation of spring force
Repulsion force for a node v is given by
where the function P (i) returns the grid point to which i ∈ is mapped. Checking the movement of a node to all the vacant grid points requires V·h·w calculations, and hence, O(V^{2}·h·w) time is required in total at each step.
Although the above naïve calculation has a higher time complexity than existing grid layout algorithms, we propose an efficient calculation. When v is moved from P (v) to q, the repulsion force for v is given by:
Because the term ∑_{u∈V}F_{r}(d(q, P(u))) in the above equation depends on q, but not on v, by calculating ∑_{u∈V}F_{r}(d(q, P(u))) for all the vacant points q initially, the calculation of c_{r}(v) requires a constant time. The term ∑_{u∈V}F_{r}(d(q, P(u))) for all the vacant points requires O(V·h·w) time, and V·h·w movements are considered at each step. Therefore, in total, O(V·h·w) time is required at each step to calculate the repulsion force.
For the attraction force, the delta cost Δ_{v, p }induced by the movement of a node v to grid point p can be calculated by considering the attraction force between v and its adjacent nodes. In addition, the movement of a node v influences the delta costs only for v and its adjacent nodes, i.e., the delta costs for its nonadjacent nodes at the previous and current steps are the same. Thus, by using the cached delta costs obtained at the previous step, we can calculate the delta costs efficiently. If v is moved from p to q at the previous step, the delta cost for the movement of v to r can be updated by
and for a node u in (v) to r,
Efficient counting of edgeedge and nodeedge crossings
The delta cost caching technique is used for counting crossings as well. When v is moved at the previous step, the following cases need to be considered for calculating the delta costs induced by the movement of node u.
(i) edgeedge crossing between e_{u }∈ E_{u }and e_{v }∈ E_{v}, where E_{v }and E_{u }are the sets of edges connected to v and u, respectively.
(ii) nodeedge crossing between e_{u }∈ E_{u }and v.
(iii) nodeedge crossing between e_{v }∈ E_{v }and u.
(iv) edgeedge crossing between edge e(u, v) and E\(E_{u }∪ E_{v}) if edge e(v, u) exists.
(v) nodeedge crossing between edge e(u, v) and V\{v, u} if edge e(v, u) exists.
In a naïve way, the crossings of the above cases are counted in each movement of a node to a grid point. Thus, the above cases (i), (ii), (iii), (iv), and (v) may respectively require O(E_{u}E_{v}), O(E_{u}), (E_{v}), O(E), and O(V) time. Thus, each movement of a node u requires O(E_{u}E_{v}) time if u ∈ (v) and O(E_{u}E_{v} + E) time otherwise. Hence, in total, O(h·w·deg(v)E) time is required at each step, where deg(v) is the degree of v.
These time complexities can be reduced by using more sophisticated crossing counting algorithms [2931]. In this study, we employ the sweep calculation algorithm [22], which is known to require less time complexity than even sophisticated crossing counting algorithms under the assumption that h and w are proportional to and the average degree is bounded by O(V^{1/4}). The grid resolution in the former assumption is commonly employed in existing grid layout algorithms [1922]. In addition, because the biological networks we are motivated to tackle can be modeled as scalefree networks whose average degree is bounded by a constant value [32], the latter assumption is reasonable.
Given an edge e, a node v connected with e, and a set of edges F ⊆ E on the grid, we consider the counting of crossings between e and edges in F for the movement of v to each grid point. Unlike conventional crossing counting algorithms, the sweep calculation can simultaneously count the crossings for all the movements of v in O(F·min(h, w) + h·w) time [22]. Because nodeedge crossings can be counted in a manner similar to the case of edgeedge crossings, by replacing the number of edges with the number of nodes, the time complexity for counting nodeedge crossings is obtained. Therefore, for the five cases mentioned above, the sweep calculation simultaneously counts crossings for mappings of u to q for all grid points q in O(E_{u}E_{v}·min(h, w) + h·w), O(E_{u}·min(h, w) + h·w), O(E_{u}·min(h, w) + h·w), O(E·min(h, w) + h·w), and for (v) O(V·min(h, w) + h·w) time, respectively. Thus, the algorithm using sweep calculation requires O(deg(v)E·min(h, w) + h·w·V) time at each step.
Time complexity at the initial step
The calculation of delta costs at the initial step requires more computational time than those at latter steps because no cached delta costs are available. Here, the time complexity for the first step is analyzed for each component.
(a) Repulsion force: The computation of repulsion forces does not rely on the cached delta costs. Thus, O(V·h·w) time is required.
(b) Attraction force: Because attraction forces between a node v and its adjacent nodes (v) are calculated, O(deg(v)) time is required for each movement of v. Thus, O(E·h·w) time is required in total.
(c) Edgeedge crossing: Because crossings between edges in E_{v }and other edges are checked for the movement of a node v, O(E^{2}·min(h, w) + h·w) time is required by sweep calculation.
(d) Nodeedge crossing: When a node v is moved, we need to consider two cases: (i) crossings between edges in E_{v }and all nodes other than v, and (ii) crossings between v, and all the edges other than edges in E_{v}. Thus, O(EV·min(h, w) + h·w) time is required by sweep calculation.
From the above analysis, the proposed algorithm requires O(E^{2}·min(h, w) + h·w) time at the initial step.
Procedures for resizing and repositioning of compartments
The resizing and repositioning of compartments are mainly comprised of the following procedures:
(i) The size of each compartment is updated according to the distribution range of nodes localized in the compartment.
(ii) The position of the compartment is updated in such a way that the center of the compartment is close to the center of gravity of nodes localized to it.
For the resizing of each compartment in step (i), we fist calculate and where v_{c }is a node localized to the compartment c, b_{c }is the center of gravity of v_{c }(d the nodes localized to c, and d_{v}(·,·) and d_{h}(·,·) return vertical and horizontal distance of v_{c }and b_{c}, respectively. Then, if s_{v }< 0.4 × the width of the compartment and s_{h }< 0.4 × the height of the compartment, the compartment is shrunk to one level smaller size (0.95 times as large as the current size, in our setting). On the other hand, s_{v }< 0.9 × the width of the compartment and 2 value s_{h }< 0.9 × the height of the compartment, the compartment is enlarged to one level larger size (1/0.95 time as large as the current size). For the limitation of the scaling, the compartment cannot be shrunk if its current size is smaller than 0.6 times of its original size, while it cannot be enlarged if its size is larger than 1.5 times of its original size.
For step (ii), the position of the compartment that minimizes the distance of the center of compartment and the center of gravity of nodes are searched. For an easier implementation, we discredited the center of compartment and the center of gravity of nodes to some grid points and employed the Manhattan distance for the distance measure. Positioning is searched in the limited distance from the center of gravity, which is set to 10 in our setting. if the compartment is resized. Also, for the search procedure, the following two conditions must be satisfied:
• Every node satisfies its localization information.
• No compartments are allow to overlap.
For the efficiency and simplicity of checking the second condition, we only consider overlapping of the rectangles that surround the compartments. Overlapping of these rectangles can be detected by checking if at one of four corners are in the other rectangle. If no valid position can be found in the above procedure, the size of the compartment is turned back to its previous size of step (i) and then step (ii) is applied again. If no valid position is still not found, then its current size and position are used for the next step. When several nodes are located close to the surface of a compartment, its size and position cannot be updated to a better condition as resizing and repositioning of the compartment violate the localization of these nodes. In order to avoid the case, we introduce the following cost function to nodes located within one grid distance from the surface of the compartments defined as α·exp(βl), where α and β are respectively set to 20·(w + r) and 0.002 from an empirical rule and l is the number of updated steps. Due to the above cost function, the placement of nodes close to the surface of the compartments is avoided and then the compartments can be updated to a better size and position with higher probability. In addition, since the above cost function converges to zero with increasing update steps l, the convergence of the search is guaranteed.
Next, we consider the time complexity of the dynamic compartment update. For step (i), the calculation of s_{h }and s_{c }require O(V_{c}) time for a compartment c, where V_{c }is the set of nodes localized to c. Resizing the compartment c requires O(w_{c}·h_{c}) time, where w_{c }and h_{c }are width and height of the compartment c. Thus, in total, O(V + w·h) = O(w·h) time is required for step (i). For step (ii), checking the violation of localization information of every node requires O(V) time for each movement of a compartment even in a naïve way. In addition, at worst, each compartment is moved to all the grid points in the limited distance from the center of gravity and the number of them are obviously less than the number of grid points. Checking the overlapping of a pair of compartment requires constant time. Since the number of compartments are limited (in our setting, at most three), which can be considered as a constant, the time complexity of step (ii) requires O(w·h·V) time at worst case. Actually, since the number of grid points searched for the repositioning of compartments are limited, the time complexity for the dynamic compartment update is not heavy in practice, which is supported by the comparison of running time of the proposed algorithm with and without the dynamic compartment update in Figure 10, 11, and 12.
Authors' contributions
KK and MN discussed the main direction of this work. SM gave idea to this work for the further improvement. The main engine of search algorithm was implemented by KK and the visualization engine was implemented by MN as a Cell Illustrator plugin. The final manuscript was read and approved by all authors.
Acknowledgements
Computational resources were provided by Human Genome Center, Institute of Medical Science, University of Tokyo.
References

Kanehisa M: The KEGG database.
Novartis Found Symposium 2002, 247:91101. Publisher Full Text

Schacherer F, Choi C, Götze U, Krull M, Pistor S, Wingender E: The TRANSPATH signal transduction database: a knowledge base on signal transduction networks.
Bioinformatics 2001, 17(11):10531057. PubMed Abstract  Publisher Full Text

Doi A, Nagasaki M, Fujita S, Matsuno H, Miyano S: Genomic Object Net: II. Modelling biopathways by hybrid functional Petri net with extension.
Applied Bioinformatics 2003, 2(3):185188. PubMed Abstract

Nagasaki M, Doi A, Matsuno H, Miyano S: Genomic Object Net: I. A platform for modelling and simulating biopathways.
Applied Bioinformatics 2003, 2(3):181184. PubMed Abstract

Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks.
Genome Research 2003, 13(11):24982504. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Demir E, Babur O, Dogrusoz U, Gursoy A, Nisanci G, Atalay RC, Ozturk M: PATIKA: an integrated visual environment for collaborative construction and analysis of cellular pathways.
Bioinformatics 2002, 18(7):9961003. PubMed Abstract  Publisher Full Text

Dogrusoz U, Erson EZ, Giral E, Demir E, Babur O, Cetintas A, Colak R: PATIKAweb: a Web interface for analyzing biological pathways through advanced querying and visualization.
Bioinformatics 2006, 22(3):374375. PubMed Abstract  Publisher Full Text

Kurata H, Matoba N, Shimizu N: CADLIVE for constructing a largescale biochemical network based on a simulationdirected notation and its application to yeast cell cycle.
Nucleic Acids Research 2003, 31(14):40714084. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Kurata H, Masaki K, Sumida Y, Iwasaki R: CADLIVE dynamic simulator: direct link of biochemical networks to dynamic models.
Genome Research 2005, 15(4):590600. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Karp PD, Paley SM: Automated drawing of metabolic pathways.
Proceedings of the 3rd International Conference on Bioinformatics and Genome Research 1994, 225238.

Becker MY, Rojas I: A graph layout algorithm for drawing metabolic pathways.
Bioinformatics 2001, 17(5):461467. PubMed Abstract  Publisher Full Text

Wegner K, Kummer U: A new dynamical layout algorithm for complex biochemical reaction networks.
BMC Bioinformatics 2005., 6(212) PubMed Abstract  PubMed Central Full Text

Gauges R, Rost U, Sahle S, Wegner K: A model diagram layout extension for SBML.
Bioinformatics 2006, 22(15):18791885. PubMed Abstract  Publisher Full Text

Genc B, Dogrusoz U: A constrained, forcedirected layout algorithm for biological pathways.

Dogrusoz U, Gral E, Cetintas A, Civril A, Demir E: A compound graph layout algorithm for biological pathways.

Garcia O, Saveanu C, Cline M, FromontRacine M, Jacquier A, Schwikowski B, Aittokallio T: GOlorize: a Cytoscape plugin for network visualization with Gene Ontologybased layout and coloring.
Bioinformatics 2007, 23(3):394396. PubMed Abstract  Publisher Full Text

Deckard A, Bergmann FT, Sauro HM: Supporting the SBML layout extension.
Bioinformatics 2006, 22(23):29662967. PubMed Abstract  Publisher Full Text

Schreiber F, Dwyer T, Marriott K, Wybrow M: A generic algorithm for layout of biological networks.

Li W, Kurata H: A grid layout algorithm for automatic drawing of biochemical networks.
Bioinformatics 2005, 21(9):20362042. PubMed Abstract  Publisher Full Text

Kato K, Nagasaki M, Doi A, Miyano S: Automatic drawing of biological networks using cross cost and subcomponent data.

Kojima K, Nagasaki M, Jeong E, Kato M, Miyano S: An efficient grid layout algorithm for biological networks utilizing various biological attributes.
BMC Bioinformatics 2007., 8(76) PubMed Abstract  PubMed Central Full Text

Kojima K, Nagasaki M, Miyano S: Fast grid layout algorithm with sweep calculation.
Bioinformatics 2008, 24(12):14331441. PubMed Abstract  Publisher Full Text

Gary MR, Johnson DS: Crossing number is NPcomplete.
SIAM Journal on Algebraic and Discrete Methods 1983, 4:312316. Publisher Full Text

Barsky A, Gardy JL, Hancock REW, Munzner T: Cerebral: a Cytoscape plugin for layout of and interaction with biological networks using subcellular localization annotation.
Bioinformatics 2007, 23:10401042. PubMed Abstract  Publisher Full Text

Tunkelang D: JIGGLE: Java interactive graph layout environment.

Pober JS: Endothelial activation: Intracellular signaling pathways.
Arthritis Research 2002, 4(Suppl 3):S109116. PubMed Abstract  BioMed Central Full Text

Matsuno H, Tanaka Y, Aoshima H, Doi A, Matsui M, Miyano S: Biopathways representation and simulation on hybrid functional Petri net.
In Silico Biology 2003, 3(3):389404. PubMed Abstract  Publisher Full Text

Saito A, Nagasaki M, Doi A, Ueno K, Miyano S: Cell fate simulation model of gustatory neurons with microRNAs doublenegative feedback loop by hybrid functional Petri net with extension.
Genome Informatics 2006, 17:100111. PubMed Abstract

Chazelle B: Reporting and counting segment intersections.
Journal of Computer and System Sciences 1986, 32:156182. Publisher Full Text

Palazzi L, Snoeyink J: Counting and reporting red/blue segment intersections.
CVGIP: Graphical Models and Image Processing 1993, 56:304310. Publisher Full Text

Cheng SW, Janardan R: Spaceefficient rayshooting and intersection searching: Algorithms, dynamization, and applications.
2nd annual ACMSIAM symposium on Discrete algorithms 1991, 716.

Jeong L, Mason S, Barabási AL, Oltvai NZ: Lethality and centrality in protein networks.
Nature 2001, 411:4142. PubMed Abstract  Publisher Full Text