Data visualization is critical for interpreting biological data. However, in practice it can prove to be a bottleneck for non trained researchers; this is especially true for three dimensional (3D) data representation. Whilst existing software can provide all necessary functionalities to represent and manipulate biological 3D datasets, very few are easily accessible (browser based), cross platform and accessible to non-expert users.
Using basic 3D representation techniques in a technologically innovative context, we provide a program that is not intended to compete with professional 3D representation software, but that instead enables a quick and intuitive representation of reasonably large 3D datasets.
Visualisation is a key challenge in the analysis of large biological datasets, especially when analysing organized structures with distinct sub-clusters . This is particularly important when analysing 3-Dimensional (3D) datasets. When a biological process or feature has been described spatially by a set of 3D referenced points, either via laboratory work (confocal microscopy for example) or generated within a simulation, with some data attached to each point in space, the first step in interpreting the data is to visualise it. Once the data are visualised and their quality assessed, downstream analysis can proceed. For example, a typical second step is to cluster the observations into different classes based upon the information associated with each point; those results will also need visualisation.
While various 3D visualisation tools have been developed, they have typically been made available via a locally installed piece of software such as BioLayout Express 3D, Arena3D , 3D Genome Tuner , Amira 3D , V3D , the Allen Brain Atlas  or Cytoscape . These tools are very complete and usually complex to operate for non-expert users. Moreover, they require installation on every machine they are used on, which makes sharing inconvenient. To address this issue, some 3D visualisation tools have been built online and are accessible through the browser directly, such as AstexViewer , which is utilised by the Protein Databank Europe via a Java Applet. More recently, visualisation tools developed using HTML5/WebGL capabilities have been described, although they have focused on very specific applications, such as analysing radiology data .
bioWeb3D allows the user to represent any 3D dataset on their browser by defining only two files. The two files can either be formatted as JSON or XML files, two widely used structured formats on the web [11,12], or directly as Comma Separated Values files (CSV).
The first file used by the application, referred to as the “dataset file”, contains the coordinates of every point in the dataset. The second type of file used, the “information layer” file, describes one or several information layers that are associated with the points defined in the first file. For example, if each point defines the location of a cell within a tissue, the second file could describe whether a particular gene is expressed in each cell. That way the tissue expression profile can be represented in the spatial context of the tissue.
Listing 1 Json dataset file
Defining the input file formats
JSON is the recommended format to input files into bioWeb3D because of its rigorous structure and its fast object generation, which is directly built into all of the primary internet browsers’ interpreter. Compared to other data-interchange languages, such as XML, JSON is also easily human readable thanks to a light-weight syntax. However, some applications might output data only in an XML format and not JSON, as the latter is generally more web oriented. For this reason bioWeb3D can also accept XML as an input format.Furthermore, much data generated in the biological sciences is stored within CSV files. Converting CSV documents to the JSON or XML format is not always trivial. In order to facilitate this process, the application is also able to directly render simple CSV files that follow a certain format as an input.
Dataset file specification
When the user adds a new Dataset file, a new Dataset section is created in the “Data” panel of the application. Each dataset file contains one dataset.
The dataset file should have a root object called “dataset” which contains:
•The “name” property of the dataset (e.g., “my dataset”);
•The “chain” parameter, which should be set to true if the points are connected (the default value is false) - the data will be considered sequentially, with each point connected by a solid line to the previous and next point according to its order in the dataset file;
•The “points” property, which is a two dimensional array representing a list of (x,y,z) vectors that define the co-ordinates of the points.
Listing 1 is an example of a minimal 3 points dataset file.
Listing 2 XML dataset file
The dataset XML format used is very similar to the previously defined JSON format. The file must have a root object called “<dataset>” which contains:
•The “<name>” property of the dataset (e.g., “my dataset”);
•The “<chain>” parameter, which should be set to true if the points are linked (the default value is false) - the data will be considered sequentially, with each point connected by a solid line to the previous and next point according to its order in the dataset file;
•The “<points>” property, which contains all the single “<point>” elements that define the dataset. Each “<point>” has three properties to define its spatial location, namely “<x>”, “<y>” and “<z>”.
Listing 2 contains the same minimal dataset as Listing 1 but formatted in XML.
Each line represents a point and the three coordinates on each line must be separated by “comma” characters.
As an example, Listing 3 carries the same information as the JSON file in Listing 1. We note that although the spatial information remains the same it is not possible to set a name or to connect the points within a CSV file input.
Listing 3 CSV dataset file
Information layer file specification
The Information layer file contains information about the points described in the Dataset file. The information in this file has to be given in the same order as the points defined in the Dataset file.
The information layer files must have a root element named “information”. Since one information file can define multiple information sets, the structure below “information” is a list. Each element of the list is structured as follows:
•The “name” property (optional);
•The “numClass” property, which indicates the number of different classes the data will be assigned to;
•The “labels” property, which defines a list of names for the “numClass” classes previously defined (optional);
•The “values” property, which defines the class of each point in the dataset. As points do not have single IDs, this property must be in the same order and have the same length as the points defined in the dataset file.
For example coming back to the 3 points defined in Listing 1, two information layers could correspond to:
•one clustering algorithm that puts the first two points together in class one and the third point alone in a second class
•a second clustering algorithm that puts each point in a separate class
Listing 4 JSON information layer file
In this case the Information layer file would look like Listing 4.
The information layer XML format used is very similar to the previously defined JSON format. The information layer files must have a root element named “<information>”. Since one information file can define multiple information sets, the structure below “<information>” is a list of “<set>” elements. Each “<set>” element is structured as follows:
•The “<name>” property (optional);
•The “<numClass>” property, which indicates the number of different classes the data will be assigned to;
•The “<labels>” property, which contains as many individual “<label>” properties as the number of different classes. Each “<label>” defines the names for one class (optional);
•The “<values>” property, which contains all the single “<value>” properties, each one defining the class of each point in the dataset. As points do not have single IDs, the “<value>” properties must be in the same order and have the same length as the points defined in the dataset file.
Listing 5 XML information layer file
Listing 6 CSV informationlayer file
Listing 5 carries the exact same information as Listing 4.
Each column represents the class to which a point belongs. The separation character between columns must be a “comma”. Listing 6 carries the same information as Listing 4. Note that it is not possible to use the “labels” or “name” properties available in Listing 4 within a CSV information layer file.
Results and discussion
The goal of bioWeb3D is to allow scientists unfamiliar with visualisation software to explore 3D data very quickly without having to install any software. To illustrate its utility we applied bioWeb3D to study heterogeneity in gene expression levels across cells in the brain of the marine annelid Platynereis dumerilii. Using a newly developed technique called PrImR , Tomer and colleagues were able to generate a map of pseudo-cells within the Platynereis dumerilii brain, before determining whether a pre-defined set of genes were expressed in each pseudo-cell. In the context of bioWeb3D, the locations of the pseudo-cells are used to generate the “Dataset” file and information about the sets of cells that define clusters with similar gene expression profiles are used to generate the “Information Layer” file. In Figure 1 we illustrate the results — each point represents a pseudo-cell and its colour indicates the class (or cluster) to which it belongs to.
Figure 1. An example of the application of bioWeb3D. The 3D location of cells within the brain of the marine annelid Platynereis dumerilii is shown. Two classes are displayed (in green and blue) along with the shadow of the remaining cells. The User interface is visible on the right of the screen and can be hidden. Data for this figure was taken from .
bioWeb3D can be used to visualise datasets derived from a wide variety of biological assays. Examples are shown on the Github wiki , where we display a 3D representation of a Principal Component Analysis (PCA) carried out with R and the 3D structure of a protein extracted from the PDBe database.
More generally, the user can interact with the visualisation via an interface on the right of the screen, which contains three panels. In the “dataset” panel, the user can choose the datasets and information layer files that should be represented in each world. This panel also allows the user to show/hide specific classes of the selected information layers. Each dataset file entered will create a new sub-panel where the user can input information layer files for that world. Selecting an information layer in the drop-down list will display the data in the current world and generate a list of classes that the user can modify regarding their visibility and colour. The “View” panel enables the user to choose which of the worlds are shown on the screen, ranging from 1 to 4. Finally, the “Settings” panel provides the user with a number of options that affect all worlds and all datasets, such as modifying the axes scales, modifiying the transparency and size of raw data points and information layer coloured points. The user can also choose to enable centering of the data around 0 or leave the coordinates as inputted.
bioWeb3D and local software
Many 3D visualisation software tools, most of which require local installation, exist and provide similar functionalities with standard 3D format input such as Wavefront.OBJ. Some are extremely generic and powerful like Blender or Amira 3D. However, these tools are not typically oriented towards a scientific audience. Moreover, those that are more focused on science are often targeted towards a very specific application, especially in the medical sciences . In this context, we believe that bioWeb3D can be useful as it is completely generic and browser based. It should also be noted that recent browser improvements regarding GPU acceleration through the WebGL paradigm allow bioWeb3D to visualise several hundred thousand points. Additionally, local software is usually platform specific, which is not the case for browser based applications.
bioWeb3D and java applets
As mentioned previously, browser based 3D visualisation tools currently exist mainly in the form of Java Applets. This technology has attracted much criticism in 2012 regarding security flaws, leading the “United States Computer Emergency Readiness Team” to advise that all Java Applets should be disabled due to current and future Java vulnerabilities . The development of WebGL technology is viewed by many as a candidate for replacing Applets.
The main current limitation of a WebGL based application is the machine and browser compatibility. Only computers with fairly recent graphic cards will be able to run a 3D environment. It should also be noted that Microsoft has notified the developer community that Internet Explorer is not scheduled to support WebGL in the near future. However, importantly, Chrome, Firefox, Safari and Opera all now support WebGL applications. Moreover, we note that WebGL is also supported on mobile platforms such as iOS or Android .
Open source and collaborative development
As a fully open source software, the source code for bioWeb3D is available on Github , a web platform that allows interested parties to collaborate on the development of the project. In the wiki page “Contribute to bioWeb3D”, directions to alter or add capabilities to bioWeb3D are provided for users who wish to get involved.
bioWeb3D is designed to be a simple and quick way to view 3D data with a specific focus on biological applications. Being browser-based, the software can be easily used from any computer without the need to install a piece of software. Importantly, bioWeb3D has been designed to offer a very straightforward and easy-to-use working environment. Despite current limitations in terms of compatibility or rendering performance for large numbers of points, we believe that bioWeb3D will enable non-experts in 3D data representation to quickly visualise their data and the information attached to it in many biological contexts, thus facilitating downstream analyses.
Availability and requirements
The full source code is available on the Github page of the project . A live version of the software is online . You will require a graphical card and a browser with WebGL capabilities to run bioWeb3D.
Project name: bioWeb3D
Operating system: Platform independent
Other requirements: Browser and graphic card with WebGL capabilities
License: Academic Free License ("AFL") v. 3.0
The authors declare no competing interests.
JBP developed the presented software and drafted the manuscript. JCM has been involved in writing and editing the manuscript. He has also given the final approval of this version to be submitted. All authors read and approved the final manuscript.
The authors would like to acknowledge Samuel Croset, Stijn van Dongen, Konrad Rudolph and Sergio Martinez Cuesta for helpful discussion and criticisms. We also thank Raju Tomer and Detlev Arendt for providing the Platynereis data displayed in Figure 1. Funding for this study was provided by the European Molecular Biology Laboratory (EMBL).
PLoS Comput Biol 2007, 3(10):e206. Publisher Full Text
Stalling D: Amira: a highly interactive system for visual data analysis. In The Visualization Handbook, chapter 38. Edited by Charles D. Hansen and Chris R. Johnson, Charles D. Hansen and Chris R. Johnson . Elsevier; 2005:749-767.
Nature Biotechnol 2010, 28(4):348-353.Publisher Full Text
Int J Comput Appl 2012, 48(24):40-44.
[Published by Foundation of Computer Science, New York, USA]