Table 1

Variability in biological data

Dataset

Description

File type

Number of files

Total size (in GB)

File size (in MB)

δ


Min

Max


E61/dat

Ensembl v61 genome annotation (DAT) and DNA sequence (FASTA) files in both compressed (gzip) and uncompressed forms.

dat

5544

169.57

5.04

1385.14

0.782



E61/dat.gz

dat.gz

5544

42.92

1.02

400.21

0.996



E61/fa

fa

1484

498.51

3.47

13306.96

0.015



E61/fa.gz

fa.gz

1484

95.25

1.0

973.15

0.594


GPL570/cel

Microarray files for the HG U133 Plus chip from GEO (all files of GPL570 platform as of 03.2011). Affymetrix CEL and CHP format files, in compressed (gzip) and uncompressed form.

cel

59892

1022.29

1.92

173.27

0.000



GPL570/cel.gz

cel.gz

59892

330.09

1.13

48.84

0.000



GPL570/chp

chp

2535

63.30

1.67

36.50

0.209



GPL570/ch.gz

chp.gz

2535

26.36

1.02

23.05

0.995


BioC2.7/BSGenome

Raw DNA sequence from the Bioconductor package BSGenome, in compressed and uncompressed forms

rda

513

8.45

1.00

117.17

0.981



BioC2.7/BSGenome/u

un-packed

513

32.41

1.62

447.40

0.000


YaleTFBS/bedGraph4

Raw ChIP-seq data from the YaleTFBS dataset of the ENCODE project. Four different file types, both in compressed and uncompressed forms.

bed-Graph4

171

139.91

216.73

2447.62

0.924



YaleTFBS/bedGraph4.gz

bed-Graph4.gz

171

31.45

52.89

551.80

0.996



YaleTFBS/fastq

fastq

388

541.99

199.25

4469.89

0.919



YaleTFBS/fastq.gz

fastq.gz

388

160.75

49.55

1564.84

0.996



YaleTFBS/tagAlign

tagAlign

520

279.45

79.95

2357.32

0.544



YaleTFBS/tagAlign.gz

tag-Align.gz

520

96.70

27.86

815.63

0.994



YaleTFBS/wig

wig

33

10.66

188.92

693.66

0.912



YaleTFBS/wig.gz

wig.gz

33

3.27

59.76

207.93

0.996


Measurements of δ-variability in several biological datasets. Exact description of the experiment is available in the Supplementary material online [14].

Tretyakov et al. BMC Genomics 2013 14(Suppl 2):S8   doi:10.1186/1471-2164-14-S2-S8

Open Data