Abstract
Background
Graphs and networks are common analysis representations for biological systems. Many traditional graph algorithms such as kclique, kcoloring, and subgraph matching have great potential as analysis techniques for newly available data in biology. Yet, as the amount of genomic and bionetwork information rapidly grows, scientists need advanced new computational strategies and tools for dealing with the complexities of the bionetwork analysis and the volume of the data.
Results
We introduce a computational framework for graph analysis called the Biological Graph Environment (BioGraphE), which provides a general, scalable integration platform for connecting graph problems in biology to optimized computational solvers and highperformance systems. This framework enables biology researchers and computational scientists to identify and deploy network analysis applications and to easily connect them to efficient and powerful computational software and hardware that are specifically designed and tuned to solve complex graph problems. In our particular application of BioGraphE to support network analysis in genome biology, we investigate the use of a Boolean satisfiability solver known as Survey Propagation as a core computational solver executing on standard highperformance parallel systems, as well as multithreaded architectures.
Conclusion
In our application of BioGraphE to conduct bionetwork analysis of homology networks, we found that BioGraphE and a custom, parallel implementation of the Survey Propagation SAT solver were capable of solving very large bionetwork problems at high rates of execution on different highperformance computing platforms.
Background
Networks and graphs are wellstudied constructs in computer science and applied mathematics. As the sizes and complexities of networks grow, however, critical limitations arise that restrict a scientist's ability to analyze a large network, both computationally and cognitively. From our observations, humans appear to be capable of comprehending networks that contain up to about 100 nodes. Singleprocessor computers, on the other hand, can often process graphs of up to 100,000s of nodes, depending on the algorithm. Yet, in many scientific fields, networks often arise that may consist of millions to hundreds of millions of nodes. In a world of continually growing scientific data and information spaces, scientists need advanced new strategies, tools, and computer systems to effectively and efficiently process and analyze largescale scientific networks.
Data complexity and volume issues have recently become especially significant in biology as the result of highthroughput data production in genomics and proteomics. Networks are a fundamental concept in biological systems, and their analysis is an essential capability. Of great interest are methods for discovering bionetworks from diverse types of data, storing, representing, comparing, and extracting features from them, and computing upon dynamic bionetworks. Yet, despite these trends, it is also increasingly common to hear biologists use the term "giant hairball" to convey their experiences working with large, complex bionetworks in research.
We believe that unlocking the real potential of these networks for biologists will require a significant investment in highperformance computing (HPC) and innovative computational systems and techniques. These approaches have been applied less often in biology, yet biological systems have many intriguing and ideal properties for driving the development of advanced computational strategies and highpowered solutions. Biological computing involves critical issues in dataintensive computing, defined as "applications to explore, query, analyze, visualize, and in general process very largescale datasets" [1]. In this context, scale relates to both absolute data size and the algorithms and applications that can process largescale data.
In this paper, we introduce the Biological Graph Environment (BioGraphE), which offers novel graph approaches, techniques, and tools in an integrated framework to address dataintensive computing challenges associated with largescale bionetworks. BioGraphE will enable biology researchers and computational scientists to identify and deploy network analysis applications and to easily connect them to efficient and powerful computational software and hardware that are specifically designed and tuned to solve complex graph problems.
Results
Graph problems in genome biology
Many areas of biological research are poised to benefit from advances in network analysis. Sources of biological networks include proteinprotein interactions, gene expression from microarrays, metabolic pathways, and signal transduction cascades. We have selected genome biology problems in this project for several reasons. These results are important to biologists in the study of how proteins function in the cellular machinery. We are able to build networks in genome biology in a highly tunable fashion: they can be made as large or small, as simple or complex as needed, in large numbers, and with correct biological encodings. The data is freely available in public databases with large quantities of nodes for these networks as well as diverse types of interconnections. Edges based on homology can be defined in highthroughput using the BLAST [2] algorithm and these comparisons can be performed very efficiently with the ScalaBLAST implementation [3]. Other types of edges may also be retrieved from the databases. Finally, we are able to extract right or wrong answers from these networks to provide useful feedback on the analysis.
We have produced a series of protein homology networks for testing BioGraphE based on sigma 70 signaling pathways. Sigma 70 regulatory proteins are of interest for bioenergy applications. Sigma and antisigma factor proteins are modular protein pairs that function as on/off switches in many microbial signaling pathways. Recently, we have shown that seemingly discrete clusters of antisigma factors are linked together by a previously unrecognized protein domain [4]. In a protein homology network this result can be detected as bridge proteins that link densely clustered regions of antisigma factor homology networks. We illustrate the use of several common graph algorithms with protein homology networks, including kclique, subgraph matching, and kcoloring. Protein families that are functional subtypes are often fully connected in a homology network, and thus, appear as cliques. Figure 1 shows a Similarity Box [5] visualization of the maximum cliques found from a bionetwork derived from chromosomal neighbors of sigma 70 proteins.
Figure 1. Kcliques in homology networks. Large cliques identify families of proteins in a homology network. Cliques representing different protein subtypes are shown in the graph in different colors.
Subgraph pattern matching is illustrated in Figure 2. The subgraph highlighted in green consists of gene products with a particular orientation in the chromosome near sigma 70 factors. The highlighted proteins are not randomly placed throughout the network, but rather fall within specific clusters that represent different protein families.
Figure 2. Subgraph pattern matching in homology networks. Subgraph pattern matching may be applied to search for sets or classes of proteins within homology networks. Nodes of the identified subgraph are plotted in green. In this example, the identified proteins have a particular chromosome orientation and fall within specific clusters representing different protein families.
Kcoloring is illustrated in Figure 3. The patterns in this graph identify subtle evolutionary relationships between proteins from three strains of Rhodobacter sphaeroides, each marked in a different color. The 3cliques display orthologous proteins based on bidirectional best hits, while proteins in the open subgraphs are homologs but not likely to be orthologs.
Figure 3. Kcoloring of homology networks. Kcoloring may be used to study bionetworks generated from bidirectional best hits. The color of a protein specifies the strain to which that protein belongs. The network is designed such that adjacent nodes will never be of the same color.
Computational framework for largescale graph analysis
Graph algorithms are known to be computationally expensive for large graphs because they often involve comparisons of each node to every other node in the graph. As the number of nodes increase, the number of comparisons will often increase exponentially. In fact, many general graph problems such as kclique, Hamiltonian paths and cycles, and subgraph isomorphism are NPcomplete (i.e. there is no known polynomial time algorithm).
To facilitate complex, largescale graph analysis, we seek a general solution and computational framework to enable scientists to deploy and integrate new graph algorithms and techniques as needed. Rather than continuously building customized, oneoff graph solutions, we wish to develop and provide an extensible and uniform analysis platform built on modern computational approaches and systems.
We introduce BioGraphE as a highperformance computational framework for the analysis of complex networks from biology and potentially other domains as well. As shown in Figure 4, BioGraphE is designed to integrate a variety of graph analysis applications into a common suite of tools. Our approach is to identify relevant graph problems and reduce them into equivalent problems that may be addressed by efficient solver implementations. Examples of potential general solvers are ones built for Boolean satisfiability equations and integer linear programming. As an alternative to using a general solver, graph problems may also be directly solved by implementing specific solutions on highperformance systems.
Figure 4. The computational framework of BioGraphE. Different graph problems are applied in network analysis and may be reduced to a form for input into general solvers or directly solved on underlying highperformance systems using specific data representations.
To achieve the best performance, direct and indirect graph solvers should be implemented on highperformance computing (HPC) architectures such as compute clusters, sharedmemory SMP and NUMA platforms, as well as multithreaded architectures such as the Sun Niagara and Cray MTA2. To best utilize the features and capabilities of individual HPC platforms, computational solvers should utilize data structures and synchronization code that are most efficient for the individual platforms.
As a computational framework for bionetwork analysis, BioGraphE has distinct benefits including:
• The BioGraphE framework is extensible and allows core solvers to be added and new graph algorithms to be built on top of the algorithmic toolkit.
• Core solvers should address wellstudied problems in computer science, where efficient heuristics and algorithms are available. Specific domain problems and applications, on the other hand, have had much less scrutiny and evolution, and thus, are much less likely to garner optimized solutions.
• Core solvers should provide a very expressive language or interface to which many different kinds of graph problems may be translated.
• Once the core solvers have been ported and optimized to run on a HPC machine, domain problems that are passed to the solvers should automatically gain the performance and benefits of the HPC machine without requiring recoding to that platform.
Bionetwork analysis
When applying BioGraphE to support bionetwork analysis, we had to make some choices regarding the software and hardware to be utilized. We sought a core solver that would be capable of efficiently solving NPcomplete problems, since many of the graph algorithms we wished to apply were also NPcomplete. We also sought a general solver that would accept a wide range of problems, had known formulations for graph problems, and had strong potential for code parallelization. For these reasons, we selected a modern solver for Boolean satisfiability (SAT) equations called Survey Propagation (SP) [6,7] as a core solver.
SAT is a wellknown NPcomplete constraintsatisfaction problem. Various graph problems such as kclique, kcoloring, and Hamiltonian paths and cycles may be reduced to SAT equations using known efficient reduction algorithms [8]. SP is a modern SAT solver originally developed to study the physics of complex systems such as spin glasses. It has proven to be successful on many large satisfiability problems beyond the reach of other SAT methods.
SAT solving using Survey Propagation
Given randomly generated formulas of M clauses of exactly K literals over N Boolean variables, research has shown that hard instances of random KSAT exist when the formulas are near specific threshold values of α = M/N. KSAT problems moderately below α are underconstrained and easily satisfiable, while those moderately above α are overconstrained and generally unsatisfiable. KSAT problems near α identify a critical phase transition region where solutions are difficult and computationally expensive to obtain. In the case of random 3SAT, the α threshold value was found to be approximately 4.2 [7]. More specifically, α values between 3.921 and 4.267 were considered to represent the hard region for random 3SAT.
SP was designed to solve SAT problems that fall within the hard region. To initially evaluate SP, we executed the algorithm on a personal computer running Window XP on a single Intel 2.00 GHz Pentium M processor. We tested randomlygenerated 3SAT formulas consisting of various numbers of variables and the number of clauses equal to 4.2 times the number of variables. Thus, the 3SAT problems fell within the hard region. As shown in Table 1, the SP + Walksat combination had increasingly better performance than executing Walksat alone for 3SAT problems consisting of 2500 variables and above. For every test, SP computed a partial solution with a subset of variables defined and clauses satisfied. With our test cases, partial solutions passed to Walksat always had α values much lower than the critical range of 3.921 and 4.267, and thus, were easily satisfiable.
Table 1. Sequential SAT solver statistics and execution times on random 3SAT equations. Initial statistics and wall clock times for baseline testing of the Mézard and Zecchina sequential implementation of Survey Propagation and Walksat SAT solvers on a personal computer running Windows XP on a single Intel 2.00 GHz Pentium M processor.
A number of issues should be noted when using SP to solve hard random KSAT problems. First, SP is a heuristic algorithm that has no guarantee of convergence, but still has been found to converge for many hard KSAT problems that other KSAT implementations were unable to solve. Furthermore, SP has been found to be particularly effective in determining the satisfiability of KSAT formulas, but has some difficulty determining unsatisfiability [7]. Finally, SP has mainly been tested for randomlygenerated KSAT formulas. Evaluations of SP's performance on nonrandom KSAT formulas have been limited.
Parallel implementation of Survey Propagation
Although rapid progress has been made in SAT solvers, most are sequential and few are parallel [10], and are thus limited to the capabilities of a single workstation. In order to enable the solution of very large sets of SAT equations (such as those derived from large biologicallyoriginated graphs), we need the parallel processing capabilities and large aggregated memory spaces of HPC systems.
Given the advantages exhibited by SP over traditional SAT solvers for very large sets of equations, we implemented a parallel version of SP. SP operates by repeatedly updating weights associated with the variable and clause nodes until a fixed point is reached when the difference between successive updates falls below a specific threshold. The variables must be updated in a random order that changes at each iteration step. The finegrained nature of the synchronization in this computation makes it more suitable for a parallel implementation on a multiprocessor with a shared address space than on a network of commodity workstations.
Parallel computing environments
We implemented two versions of the parallel SP application:
• A portable, sharedmemory OpenMP implementation.
• An experimental sharedmemory, multithreaded version for the Cray MTA2 and XMT systems [12,13]. The MTA platform has been found to be very effective for irregular and graphbased applications [1416].
Our testing platforms were a Sun Fire T2000 server with an 8core Sun UltraSPARC T1 ("Niagara") processor and a 40processor Cray MTA2 system located at the Cray facilities. More details about these platforms can be found in [14]. On the Sun Niagara server, we executed the OpenMP version compiled with the Sun compiler suite. On the MTA2 system, we executed our custom multithreaded version.
As part of our characterization of the SP algorithm, we tested both parallel implementations with a series of problems that are designed to fall in the "hard" SAT space where SP is especially known to outperform other SAT solvers. All of the cases we tested were random instances generated with α = 4.2 that places the problem within the "hard" SAT region. As shown by the actual clock times of Figure 5, the execution time of this algorithm implementation scales nonlinearly with the size of the problem instance. Overall, we found the MTA2 to have a 2.66 to 4.98 times improvement in execution times over the Niagara. The speedup charts of Figure 5 also illustrate that the parallel SP code exhibits an almost linear speedup within the same problem size when utilizing from one to eight processors on both the Niagara and the MTA2. The efficiency generally degrades on both machines, however, when the number of processors exceeds eight except in the case of the MTA2 running the largest SAT problem (1,000,000 variables, 4,200,000 clauses). In this one case, MTA2 further exhibits linear speedup of up to sixteen processors. In addition, the MTA2 was able to achieve higher overall speedup rates than the Niagara. For the largest SAT problem, for example, MTA2 was able to achieve a speedup rate of up to 13.17, while Niagara's highest speedup rate was about 8.65.
Figure 5. Parallel SP solver execution statistics on random 3SAT equations. Initial wall clock times and speedup rates for parallel SP algorithm on an 8core Sun Microsystems Niagara and 40processor Cray MTA2.
Kclique analysis of homology networks
As an initial, proofofprinciple problem, we chose to apply the kclique problem on a collected set of protein homology networks, which are graphs where nodes represent proteins and edges represent similarity between proteins measured by the BLAST algorithm [2]. Correspondingly, we have built software to perform the transformation or reduction of a protein homology network specification into a set of Boolean satisfiability equations for finding a kclique in the network. To test this software, we prepared a series of graphs that represent specific biological relationships that we understood well.
Kclique to SAT reduction
We implemented a kclique to SAT reduction algorithm that takes a particular input graph and produces SAT equations. The SAT equations, in turn, will be solved by the SAT solvers in BioGraphE. A kclique is a subgraph consisting of k nodes, all of which have edges to one another.
As shown in Table 2 and 3, we applied the kclique to SAT reduction to a number of homology networks. In the first set of test cases, we started by looking for 3cliques in a specific homology network. To find the maximum clique, we incrementally increased the size of the clique until no cliques in the network were found. An interesting side effect of the reduction and SP algorithm was that the set of true variables produced by SP was always associated with the same clique in the network. We used this information to more efficiently adjust the size of clique for which we were looking. For instance, if we analyzed a homology network for 3cliques and the SP algorithm returned eight true variables, then we should know that at least one clique of size eight existed in the network. We would immediately start searching for 9cliques in the network on our way to finding the maximum clique.
Table 2. Impact of clique size on kclique to SAT reduction. Statistics from the reduction of nodes and edges from the kclique problem to the equivalent variables and clauses of SAT equations for different values of k.
Table 3. Impact of graph size on kclique to SAT reduction. Statistics from the reduction of nodes and edges from the kclique graph problem to the equivalent variables and clauses of SAT equations for different graph sizes.
Table 2 also shows how the size of the SAT problem rapidly increases with clique size. Our cliquetoSAT reduction approach produces O(n^{2}k^{2}) clauses with O(nk) variables, where n is the number of nodes in the graph and k is the size of the clique.
In the second set of test cases, we applied SP to homology networks of different sizes. Table 3 shows that the SAT problem also rapidly increases with the overall size and complexity of the homology network.
In both the cases of increasing clique size k or the number of nodes n, a very large number of clauses are generated and the corresponding α values of the SAT equations are very high compared to the 4.2 phase transition region for 3SAT equations. One might expect the produced SAT equations to be overconstrained and unsolvable. Nevertheless, the SAT equations from the reduction are generally solvable using SP. This interesting result requires further investigation as well as the exploration of alternative reduction strategies to generate SAT equations from kclique problems.
We should also note that the Boolean formulas produced from the reduction are in the form of 2 and nclauses. Since most SAT phase transition studies have focused on SAT equations with fixed clause sizes (e.g., 3SAT), a relevant research effort would be to examine and characterize the phase transition region of our particular set of SAT equations that evolved directly from bionetworks. Part of this research should involve simplifying the SAT equations to more efficient forms.
In addition, we are evaluating three additional reduction algorithms for kclique in the literature [8] that require a smaller number of variables for the reduction, as well as reductions for other graph methods. It is important to note that these more sophisticated reductions will not necessarily lead to faster execution speeds for the overall graph analysis procedure, since we expect the overall runtime to be dominated by the SP solver. In particular, the runtime should be dominated by how "hard" the SAT instances are as defined by the ratio of clauses to variables. For 3SAT, it has been experimentally determined that a ratio of clauses to variables of approximately 4.2 may lead to SAT instances with exponential solution times. We are in the process of studying the behavior of our reductiongenerated SAT instances and characterizing them with respect to their solution difficulty.
Parallel performance of Survey Propagation
In Figure 6, the performance of the parallel SP code is compared between the Sun Niagara and Cray MTA2. The parallel SP code was applied to SAT equations that were generated by the kclique to SAT reduction algorithm. The kclique problems were derived from the bionetwork analysis of protein homology networks.
Figure 6. Parallel SP solver execution statistics on SAT equations generated from genome biology graphs. Initial wall clock times and speedup rates for parallel implementation of the Survey Propagation algorithm on an 8coreprocessor Sun Microsystems Niagara and 40processor Cray MTA2 solving SAT equations reduced from kclique problems in genome biology.
The first problem was a 67clique problem performed on a graph consisting of 148 nodes and 6,186 edges. This clique problem reduced to SAT equations consisting of 10,064 variables and 36,117,724 clauses. The second problem was a 3clique problem performed on a graph consisting of 3,741 nodes and 217,350 edges. This problem reduced to SAT equations consisting of 11,223 variables and 61,979,313 clauses.
As shown by the actual clock times of Figure 6, the parallel SP algorithm implementation scales nonlinearly with the size of the problem instance. Overall, the MTA2 had a 2.28 to 5.79 times improvement in execution times over the Niagara. The speedup charts of Figure 6 also show that the parallel SP algorithm exhibits an almost linear speedup within the same problem size when utilizing from one to eight processors on the Niagara and from one to sixteen processors on the MTA2. The efficiency generally degrades on both machines, however, after reaching those processor thresholds. Furthermore, the MTA2 more than doubled the speedup rates of Niagara for the larger SAT problems when running on sixteen or more processors. For the bionetworkrelated SAT equations, MTA2 was able to achieve a speedup rate of up to 15.97, while Niagara's highest speedup rate was 6.33.
Conclusion
In this paper, we introduced a highperformance computational framework for graph analysis called BioGraphE. The general approach of BioGraphE is to identify and deploy complex graph algorithms, which may be computationally and/or dataintensive, and to integrate those algorithms with powerful and efficient computational solvers and HPC systems. The goal is to bring highperformance software and hardware capabilities to bear on challenging graph problem without requiring the scientist to become an expert of specific computing environments.
For our initial implementation, we selected a SAT solver known as Survey Propagation (SP) to serve as our first core solver. Both SAT and SP have desirable properties such as high expressivity, potential for parallelization, strong history of research, and existing tools and technologies. Our initial findings on the performance of SAT and SP were promising in that we were able to solve large kclique problems and see definitive improvements in execution speed.
We will continue to optimize the parallel SP code to achieve the highest performance possible. We will also continue to study graphtoSAT reductions to better understand the phase transition of generated SAT equations and how SAT solving performance maps back to the structure and properties of bionetworks. Also, we will continue to explore and experiment with other computational solvers to evolve and extend BioGraphE's overall graph analysis capabilities.
Methods
Serial Survey Propagation SAT solver
SP is an iterative messagepassing algorithm that represents Boolean formulas as bipartite graphs, where Boolean variables correspond to one type of node and clauses to the second. Edges between the two types of nodes correspond to variables appearing in the clauses of the original formula. Messages or data passed among the nodes of the graph consists of probability distributions which are used to compute weights on the individual nodes. The SP algorithm finds the variable node that is most biased towards a true or false value and sets that node accordingly. It then updates the probability distributions, recomputes the weights, and finds and sets the next most biased node. Normally, SP will partially solve the equations to the point where a traditional SAT solver can easily complete the remaining parts of the solution. For the traditional SAT solver, we employed Walksat [9], which is a wellknown, traditional local search algorithm.
Parallel Survey Propagation SAT solver
Our initial analysis of the serial SP code available from Mézard & Zecchina [6] indicated that the main opportunities for parallelization in this C implementation are found in the computation of the pervariable information and the perclause information, which can be executed in parallel. However, the overall structure of the application involves a large number of sequential iterations over the short parallel steps described above as well as several sequential steps such as choosing the most biased variables. Thus, this original structure would not ultimately be conducive to scalable performance.
We implemented a parallel version of SP from first principles based on the distributed SP algorithm presented in [11]. The main difference between the serial and parallel SP algorithms is that the decisions about which Boolean variables to fix to a particular value are done locally instead of globally. In the original serial algorithm, the variables that are fixed to particular Boolean values are selected globally after updating the weights for all variables. In the parallel version, the decision to fix a variable to a particular value is done with local information that does not depend on the state of other Boolean variables. For this reason, the distributed SP algorithm can be structured as a set of serial iterations over two large parallel steps – updating the weights of the variable nodes and updating the weights of the clause nodes. The fixedpoint stopping conditions can be computed locally with respect to each variable and clause node.
OpenMP implementation
Our OpenMP implementation reuses part of the code from the serial SP implementation to handle file input and output and most data structures, however, the main computational structure has been replaced with a new parallel structure as previously described. The computation is based on two main parallel loop nests that update the nodes representing Boolean variables and the nodes representing clauses. OpenMP threads will update a set of variables and a set of clauses using a blockdynamic scheme to reduce load imbalance. Each OpenMP thread has a separate condition flag to indicate whether a fixedpoint has been reached. No node under the control of the thread changed its weight above the threshold. Special care must be taken to safely simplify the overall Boolean formula when a variable is fixed to particular value – not only the clauses in which the variable appears are affected, but other variables on the same clauses could have their values fixed as a consequence.
Cray multithreaded implementation
Our Cray multithreaded version was derived from the OpenMP version and has basically the same computational structure. The main difference is that this version can take advantage of the memory latency hiding capabilities inherent in Cray's multithreaded processors, thus enabling very high processor utilization and scalability. However, since SP depends on a final postprocessing step executed by a traditional sequential SAT solver (Walksat), our experimental Cray version only implements the SP algorithm itself. Sequential code sections execute with very poor performance on the multithreaded processors.
Kclique to SAT reduction algorithm
We implemented a simple kclique to SAT reduction algorithm that worked on the basis of the following three simple rules:
1. Every member of the kclique must be some node in the graph.
2. Any given node of the graph can be at most one member of the kclique.
3. Two nodes that do not share an edge cannot both be members of the kclique.
Each rule leads to the generation of a set of disjunctive Boolean clauses, which may be viewed as constraints on the solution. The reduction software applies these rules and generates the clauses in a form suitable for the BioGraphE SAT solver programs.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
GCJ, DGC, and HJS conceived and developed the BioGraphE approach. DGC implemented parallel versions of the SP code. GCN implemented visual analysis tools and kclique to SAT reduction algorithms. GCJ and DGC conducted performance testing of algorithms and tools on HPC machines. HJS directed and managed overall research. All authors contributed to and approved the final manuscript.
Acknowledgements
This work was funded in part by the U. S. Dept. of Energy Office of Advanced Scientific Computing Research. PNNL is operated by Battelle for the U. S. Dept. of Energy.
This article has been published as part of BMC Bioinformatics Volume 9 Supplement 6, 2008: Symposium of Computations in Bioinformatics and Bioscience (SCBB07). The full contents of the supplement are available online at http://www.biomedcentral.com/14712105/9?issue=S6.
References

Cannataro M, Talia D, Srimani PK: Parallel data intensive computing in scientific and commercial applications.

Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSIBLAST: A new generation of protein database search programs.
Nucleic Acids Research 1997, 25:33893402. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Oehmen CS, Nieplocha J: ScalaBLAST: A scaleable implementation of BLAST for highperformance dataintensive bioinformatics analysis.
IEEE Transactions on Parallel and Distributed Systems 2006, 17:740749.

Campbell EA, Greenwell R, Anthony JR, Wang S, Lim L, Sofia HJ, Donohue TJ, Darst SA: A conserved structural module regulates transcriptional response to diverse stress signals in eubacteria.

Sofia HJ, Nakamura GC: Similarity Box: visual analytics for large sequence sets.
Bioinformatics, in press. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Mézard M, Zecchina R: Random Ksatisfiability problem: from an analytic solution to an efficient algorithm.
Phys Rev E Stat Nonlin Soft Matter Phys 2002, 66(5 Pt 2):056126. PubMed Abstract  Publisher Full Text

Braunstein A, Mézard M, Zecchina R: Survey propagation: An algorithm for satisfiability.

Iwama K, Miyazaki S: SATvariable complexity of hard combinatorial problems.
IFIP Transactions AComputer Science and Technology 1994, 51:253258.

Selman B, Kautz H, Cohen B: Local search strategies for satisfiability testing. In Cliques, Coloring, and Satisfiability: Second DIMACS Implementation Challenge. Edited by Johnson DS, Trick MA. Providence, RI: American Mathematical Society; 1996:521532.
[DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol 26.]

Singer D: Parallel resolution of the Satisfiability Problem: A survey. In Parallel Combinatorial Optimization. Edited by ElGhazali T. New York: WileyInterscience; 2006.
[Zomaya AY (Series Editor): Wiley Series on Parallel and Distributed Computing, 10 Nov 2006.]

Chavas J, Furtlehner C, Mézard M, Zecchina R: Surveypropagation decimation through distributed local computations.

Anderson W, Briggs P, Hellberg CS, Hess DW, Khokhlov A, Lanzagorta M, Rosenberg R: Early experience with scientific programs on the Cray MTA2. In Proceedings of the 2003 ACM/IEEE conference on Supercomputing: 15–21 November 2003; Phoenix. Edited by Koelbel C, HornerMiller B. IEEE Computer Society; 2003:46.

Feo J, Harper D, Kahan S, Konecny P: Eldorado. In Proceedings of the 2nd Conference on Computing Frontiers: 4–6 May 2005; Ischia, Italy. Edited by Valero M, Ramirez A. ACM; 2005:2834.

Nieplocha J, Marquez A, Feo J, ChavarriaMiranda D, Chin G, Scherrer C, Beagley N: Evaluating the potential of multithreaded platforms for irregular scientific computations. In Proceedings of the 4th Conference on Computing Frontiers: 7–9 May 2007; Ischia, Italy. Edited by Dubois M, Strenström P. ACM; 2007:4758.

Bader DA, Feo J: On the architectural requirements for efficient execution of graph algorithms. In Proceedings of the 2005 International Conference on Parallel Processing: 14–17 June 2005; Oslo, Norway. Edited by Feng W, Duato J. IEEE Computer Society; 2005:547556.

Bader DA, Madduri K: Designing multithreaded algorithms for breadthfirst search and stconnectivity of the Cray MTA2. In Proceedings of the 2006 International Conference on Parallel Processing: 14–18 August 2006; Columbus, Ohio. Edited by Feng W. IEEE Computer Society; 2006:523530.