Informatics and Biocomputing Platform, Ontario Institute for Cancer Research, MaRS Centre, South Tower, 101 College Street, Suite 800, Toronto, Ontario, M5G 0A3, Canada

Abstract

Background

Visualization of orthogonal (disjoint) or overlapping datasets is a common task in bioinformatics. Few tools exist to automate the generation of extensively-customizable, high-resolution Venn and Euler diagrams in the R statistical environment. To fill this gap we introduce

Results

The

Conclusions

The

Background

The visualization of complex datasets is an increasingly important part of biology. Many experiments involve the integration of multiple datasets to understand complementary aspects of biology. These overlapping results can be visualized in a number of ways, including textual tables (e.g. two-way tables), network diagrams ^{n}-1 possible areas created by the interaction of n sets. The use of simple geometrical shapes reduces figure complexity and size relative to space-consuming tables or network layouts.

However, despite this popularity, there are currently few packages for generating Venn diagrams in the widely-used R statistical environment. These packages are limited in their ability to generate high-resolution, publication-quality Venn diagrams in that they allow little customization of colours, line-types, label-placement, and label font. Numerous special-cases are handled inappropriately, and the output is not usually in the format of high-resolution, publication-quality TIFF files. Other, non-R-based local or web-based software capable of generating Venn diagrams exist, such as Venny

Additionally, if some intersecting or non-intersecting areas in a Venn diagram do not exist, another class of diagrams called Euler diagrams may be more desirable. Euler diagrams are equivalent to Venn diagrams when all intersecting and non-intersecting areas exist. However, areas containing zero elements are shown on Venn diagrams (by definition), whereas Euler diagrams show only non-zero areas. In many cases, Euler diagrams further reduce figure complexity, increase graphical accuracy and improve overall readability relative to Venn diagrams. Unfortunately, almost all existing packages cannot generate publication-quality Euler diagrams in R, although VennEuler does generate Euler diagrams.

To address these issues we introduce

Implementation

The

The

Click here for file

The

Click here for file

Results

Almost all graphical options in the

The four types of Venn diagrams drawn by the

**The four types of Venn diagrams drawn by the VennDiagram package**. A) A one-set Venn diagram showing rudimentary customizable features such as label font size, label font face, and shape-fill. B) A two-set Venn diagram showing more advanced features such as scaling, individual shape-fill specifications, and individual caption label placement. C) A three-set Venn diagram showing a different shape-line type ("transparent") and the "text" option of caption label placement where the caption labels are attached to area labels. D) A four-set Venn diagram showing a combination of all previous features plus the ability to customize titles. The code to generate all diagrams shown here is included in Additional File

Beyond these specific graphic elements,

Selected Venn diagram special cases and Euler diagrams drawn by the

**Selected Venn diagram special cases and Euler diagrams drawn by the VennDiagram package**. Row 1, column 1: automatically drawn, customizable lines that optimize display of partial areas when individual partial areas become too small in two-set Venn diagrams. Row 1, column 2: a two-set Euler diagram showing total inclusion of one of the sets. Row 1, column 3: a two-set Euler diagram showing two distinct sets. Row 2, column 1: a three-set Euler diagram where one set has no discrete elements. Row 2, column 2: a three-set Euler diagram where one set has no discrete elements is totally included in one of the other two sets. Row 2, column 3: a three-set Euler diagram where two sets have no discrete elements and are included in a larger third set. Row 3, column 1: a three-set Euler diagram showing total inclusion of two sets that are distinct from the third set. Row 3, column 2: a three-set Euler diagram where one set is totally included in another set, which is itself totally included in the third set. Row 3, column 2: a three-set Euler diagram showing three distinct sets. The code to generate all diagrams shown here is included in Additional File

Code to generate all Venn diagrams in Figures

Click here for file

Illustration of the parameters available in

Click here for file

Discussion

During development of the _{AB}, could be determined as long as the areas (A_{A }and A_{B }respectively) and the intersection area (A_{A }∩ A_{B}) are both known. This is possible because in a two-circle system a single A_{A }∩ A_{B }corresponds to a unique value for d_{AB}. Therefore, a system of three circles A, B, and C, d_{AB}, d_{BC}, d_{AC }could be calculated as long as A_{A}, A_{B}, A_{C}, A_{A }∩ A_{B}, A_{A }∩ A_{C}, A_{B }∩ A_{C }are all known. However, d_{AB}, d_{BC}, d_{AC }make a unique triangle, implying that a Venn diagram can be drawn without ever knowing the overall intersection A_{A }∩ A_{B }∩ A_{C}. In other words, the size of the overlap between all three circles does not alter the presentation of scaled Venn diagrams -- the area is unchanged even if one system has zero overall intersection (i.e. A_{A }∩ A_{B }∩ A_{C }= 0)! This conundrum results from the (arbitrary) choice of circles to represent set size, which reduces the degrees of freedom by one. Unique solutions can be identified by using ellipses or polygons to draw Venn diagrams but the resulting diagrams would lose the instant recognisability and familiarity associated with circular Venn diagrams, defeating the point of a convenient display of information. Non-circular diagrams would also require iterative algorithms to compute the positions and sizes of the shapes, greatly increasing computational burdens, as has been discussed by others

A general caveat when using Euler diagrams is that although they reduce the graphical complexity of some Venn diagrams, their non-traditional shapes may also be less recognizable in some cases. When empty areas are present, the user needs to choose between the familiarity of Venn diagrams and the increased accuracy of Euler diagrams. Figure

A side-by-side comparison of an Euler diagram and a Venn diagram for the same hypothetical sets.

**A side-by-side comparison of an Euler diagram and a Venn diagram for the same hypothetical sets**. A) The Euler diagram shows only non-zero areas and can therefore be more graphically accurate. B) The Venn diagram shows the non-existent area as an area with zero content. Though this is not graphically accurate, it preserves the recognisability of a Venn diagram.

The

After comparing with other programs capable of generating Venn diagrams (Table

• Drawing Euler diagrams using circles and/or ellipses with two or three sets

• Offering greater customizability to generate more elegant diagrams

• Availability in the widely-used R statistical environment

• Generating high resolution TIFF files that are standard in publications

A comparison of the features of various programs capable of generating Venn diagrams.

**DrawVenn**

**Venny**

**gplots::venn**

**venneuler**

**limma::vennDiagram**

**Google Chart**

**GeneVenn**

**VennMaster**

**BioVenn**

**VennDiagram**

Shape-fill

Colour

X

X

X

X

X

X

Shape-line

Style

X

Width

X

X

Colour

X

Caption labels

Content

X

X

X

X

Colour

X

X

Font

X

X

X

Size

X

X

X

Style

X

Location

X

X (SVG only)

X

Position

X

X (SVG only)

X

Distance

X

X (SVG only)

X

Justification

X

Area labels

Colour

X

X

X

Font

X

X

X

X

Size

X

X

X

X

X

Style

X

Titles

Main title

X

X

X

X

Subtitle

X

X

Position

X (SVG only)

X

Colour

X

X

X

Font

X

X

Size

X

X

X

Style

X

Justification

X

Background-fill

Colour

X

X

Style

X

File options

Output type

None

PNG

R graphics

R graphics

R graphics

PNG/GIF

PNG

SVG/JPEG

SVG/PNG

TIFF/PNG/JPEG/BMP/others

Figure resolution

X

X

X

Data processing

Built-in gene ID recognition

X

X

Figure from file(s)

X

X

X

Specific optimizations

Gene Ontology

General

Environment

Java

Web

R

R

R

Web

Web

Java

Web

R

Input format

Direct (slider)

Lists

Lists

Partial areas

R object

Partial areas

Lists

Lists/GoMiner output

Lists

Lists

Maximum sets

3

4

5

3

3

3

3

>5

3

4

Shapes used

Circles/Rectangles

Circles/Ellipses

Circles/Ellipses

Circles

Circles

Circles

Circles

Polygons

Circles

Circles/Ellipses

Scaling

X

X*

X*

X (iterative)

X*

X (2-set only)

Euler diagrams

X

X

X

X

Margin size

X

X

X

Rotation

X

Two-set external lines

X

Other set-specific parameters

X

X

* uses inaccurate 3-set scaling with circles

This table highlights the improvements that the VennDiagram package possesses over other notable Venn diagram-generating software. The highly customizable nature of the VennDiagram package is evident.

Conclusions

The

Availability and Requirements

The

Authors' contributions

HC and PCB conceived of the project. HC wrote the software, which HC and PCB tested and debugged. HC wrote the first draft of the manuscript, which all authors revised and approved.

Acknowledgements

The authors thank all members of the Boutros lab for support, and especially Dr. Kenneth Chu and Daryl Waggott for help in generating the windows-compatible version of this package.. This study was conducted with the support of the Ontario Institute for Cancer Research to PCB through funding provided by the Government of Ontario. This work was financially supported by grant number MOP57903 from the Canadian Institutes of Health Research (to PCB and Dr. Allan B. Okey).