Table 1

Data available in UniParc

Source Name

Source Description

Number of Releases

Number of Entries


EMBL

EMBL Nucleotide Sequence Database

883

4,776,027

EMBLWGS

Whole Genome Shotgun

256

2,894,683

EMBL_ANNCON

Annotated CON entries

63

6,773,092

EMBL_TPA

Third Party Annotation

74

5,497

ENSEMBL_ARMADILLO

Ensembl Dasypus novemcinctus

8

15,552

ENSEMBL_BUSHBABY

Ensembl Otolemur garnettii

3

15,449

ENSEMBL_CAT

Ensembl Felis catus

4

14,846

ENSEMBL_CBRIGGSAE

Ensembl Caenorhabditis briggsae

14

14,713

ENSEMBL_CELEGANS

Ensembl Caenorhabditis elegans

35

39,090

ENSEMBL_CHICKEN

Ensembl Gallus gallus

29

67,610

ENSEMBL_CHIMP

Ensembl Pan troglodytes

30

83,636

ENSEMBL_CIONA

Ensembl Ciona intestinalis

18

40,996

ENSEMBL_COMMON_SHREW

Ensembl Sorex araneus

2

13,195

ENSEMBL_COW

Ensembl Bos taurus

17

82,819

ENSEMBL_DOG

Ensembl Canis familiaris

22

52,106

ENSEMBL_ELEPHANT

Ensembl Loxodonta africana

8

15,717

ENSEMBL_ERINACEUS

Ensembl Erinaceus europaeus

4

14,593

ENSEMBL_FLY

Ensembl Drosophila melanogaster

35

25,934

ENSEMBL_FUGU

Ensembl Fugu rubripes

36

112,525

ENSEMBL_GUINEA_PIG

Ensembl Cavia porcellus

4

28,438

ENSEMBL_HEDGEHOG

Ensembl Echinops telfairi

8

16,582

ENSEMBL_HONEYBEE

Ensembl Apis mellifera

18

43,953

ENSEMBL_HUMAN

Ensembl Homo sapiens

35

115,689

ENSEMBL_MEDAKA

Ensembl Oryzias latipes

6

25,880

ENSEMBL_MICROBAT

Ensembl Myotis lucifugus

3

16,234

ENSEMBL_MOSQUITO

Ensembl Anopheles gambiae

35

55,270

ENSEMBL_MOUSE

Ensembl Mus musculus

37

127,637

ENSEMBL_OPOSSUM

Ensembl Monodelphis domestica

13

54,269

ENSEMBL_PLATYPUS

Ensembl Ornithorhynchus anatinus

5

32,001

ENSEMBL_RABBIT

Ensembl Oryctolagus cuniculus

8

15,441

ENSEMBL_RAT

Ensembl Rattus norvegicus

35

89,524

ENSEMBL_RHESUS_MACAQUE

Ensembl Macaca mulatta

11

61,299

ENSEMBL_SQUIRREL

Ensembl Spermophilus tridecemlineatus

3

14,833

ENSEMBL_STICKLEBACK

Ensembl Gasterosteus aculeatus

8

27,671

ENSEMBL_TETRAODON

Ensembl Tetraodon nigroviridis

27

28,004

ENSEMBL_TREE_SHREW

Ensembl Tupaia belangeri

4

15,462

ENSEMBL_XENOPUS

Ensembl Xenopus tropicalis

21

76,758

ENSEMBL_YF_MOSQUITO

Ensembl Aedes aegypti

8

16,789

ENSEMBL_ZEBRAFISH

Ensembl Danio rerio

37

161,469

EPO

European Patent Office

11

780,113

FLYBASE

FlyBase

3

18,549

H_INV

H-Invitational Database

25

864,262

IPI

International Protein Index

58

910,640

JPO

Japan Patent Office

15

404,695

PDB

Protein Data Bank

261

112,882

PIR

PIR-PSD

17

283,420

PIRARC

PIR-PSD archive

2

342,752

PRF

Protein Research Foundation

77

791,254

REFSEQ

RefSeq release + updates

847

5,598,926

REFSEQ_HUMAN

REFSEQ Homo sapiens

154

105,699

REFSEQ_MOUSE

REFSEQ Mus musculus

153

152,647

REFSEQ_RAT

REFSEQ Rattus norvegicus

151

97,753

REFSEQ_ZEBRAFISH

REFSEQ Danio rerio

141

63,183

SGD

SGD Protein

16

6,002

SWISSPROT

UniProtKB/Swiss-Prot

213

333,918

SWISSPROT_VARSPLIC

SWISS-PROT alternative splicing

132

38,756

TAIR_ARABIDOPSIS

TAIR Arabidopsis thaliana

5

33,914

TREMBL

UniProtKB/TrEMBL

118

5,877,814

TREMBL_VARSPLIC

TrEMBL alternative splicing

78

1,051

TROME_CE

TROME Caenorhabditis elegans

18

84,895

TROME_DM

TROME Drosophila melanogaster

20

116,588

TROME_HS

TROME Homo sapiens

25

1,180,511

TROME_MM

TROME Mus musculus

24

675,662

UNIMES

UniProt Metagenomic and Environmental Sequences

1

6,028,191

USPTO

US Patent and Trademark Office

14

724,428

VEGA_DOG

Vega Canis familiaris

1

50

VEGA_HUMAN

Vega Homo sapiens

7

58,931

VEGA_MOUSE

Vega Mus musculus

7

20,750

VEGA_ZEBRAFISH

Vega Danio rerio

8

13,293

WORMBASE

WormBase

65

30,438


Data sources warehoused in UniParc. The source name should be used when using the REST and SOAP interfaces. The number of releases indicates how many times the source files have been parsed and loaded into UniParc and includes incremental and full releases. The number of entries corresponds to the total number of protein entries parsed for all the releases. Note that UniParc is based on 100% sequence identity so one protein entry might be repeated multiple times as versions are updated. Replaced entries are simply marked as inactive, but are never deleted in order to provide archival coverage.

Côté et al. BMC Bioinformatics 2007 8:401   doi:10.1186/1471-2105-8-401

Open Data