Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Proceedings of the Tenth Annual MCBIOS Conference

Open Access Proceedings

Towards human-computer synergetic analysis of large-scale biological data

Rahul Singh12*, Hui Yang1, Ben Dalziel1, Daniel Asarnow1, William Murad1, David Foote3, Matthew Gormley4, Jonathan Stillman56 and Susan Fisher7

Author Affiliations

1 Department of Computer Science, San Francisco State University, San Francisco, CA, USA

2 Center for Discovery and Innovation in Parasitic Diseases, University of California, San Francisco, CA, USA

3 Open University Program, San Francisco State University, CA, USA

4 Department of Obstetrics, Gynecology, and Reproductive Sciences, University of California, San Francisco, CA, USA

5 Romberg Tiburon Center and Department of Biology, San Francisco State University, San Francisco, CA, USA

6 Department of Integrative Biology, University of California Berkeley, CA, USA

7 Department of Anatomy and Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA, USA

For all author emails, please log on.

BMC Bioinformatics 2013, 14(Suppl 14):S10  doi:10.1186/1471-2105-14-S14-S10

Published: 9 October 2013

Abstract

Background

Advances in technology have led to the generation of massive amounts of complex and multifarious biological data in areas ranging from genomics to structural biology. The volume and complexity of such data leads to significant challenges in terms of its analysis, especially when one seeks to generate hypotheses or explore the underlying biological processes. At the state-of-the-art, the application of automated algorithms followed by perusal and analysis of the results by an expert continues to be the predominant paradigm for analyzing biological data. This paradigm works well in many problem domains. However, it also is limiting, since domain experts are forced to apply their instincts and expertise such as contextual reasoning, hypothesis formulation, and exploratory analysis after the algorithm has produced its results. In many areas where the organization and interaction of the biological processes is poorly understood and exploratory analysis is crucial, what is needed is to integrate domain expertise during the data analysis process and use it to drive the analysis itself.

Results

In context of the aforementioned background, the results presented in this paper describe advancements along two methodological directions. First, given the context of biological data, we utilize and extend a design approach called experiential computing from multimedia information system design. This paradigm combines information visualization and human-computer interaction with algorithms for exploratory analysis of large-scale and complex data. In the proposed approach, emphasis is laid on: (1) allowing users to directly visualize, interact, experience, and explore the data through interoperable visualization-based and algorithmic components, (2) supporting unified query and presentation spaces to facilitate experimentation and exploration, (3) providing external contextual information by assimilating relevant supplementary data, and (4) encouraging user-directed information visualization, data exploration, and hypotheses formulation. Second, to illustrate the proposed design paradigm and measure its efficacy, we describe two prototype web applications. The first, called XMAS (E

    x
periential
    M
icroarray
    A
nalysis
    S
ystem) is designed for analysis of time-series transcriptional data. The second system, called PSPACE (
    P
rotein
    Spac
e
    E
xplorer) is designed for holistic analysis of structural and structure-function relationships using interactive low-dimensional maps of the protein structure space. Both these systems promote and facilitate human-computer synergy, where cognitive elements such as domain knowledge, contextual reasoning, and purpose-driven exploration, are integrated with a host of powerful algorithmic operations that support large-scale data analysis, multifaceted data visualization, and multi-source information integration.

Conclusions

The proposed design philosophy, combines visualization, algorithmic components and cognitive expertise into a seamless processing-analysis-exploration framework that facilitates sense-making, exploration, and discovery. Using XMAS, we present case studies that analyze transcriptional data from two highly complex domains: gene expression in the placenta during human pregnancy and reaction of marine organisms to heat stress. With PSPACE, we demonstrate how complex structure-function relationships can be explored. These results demonstrate the novelty, advantages, and distinctions of the proposed paradigm. Furthermore, the results also highlight how domain insights can be combined with algorithms to discover meaningful knowledge and formulate evidence-based hypotheses during the data analysis process. Finally, user studies against comparable systems indicate that both XMAS and PSPACE deliver results with better interpretability while placing lower cognitive loads on the users. XMAS is available at: http://tintin.sfsu.edu:8080/xmas webcite. PSPACE is available at: http://pspace.info/ webcite.