Pig genome sequence - analysis and publication strategy

Archibald, Alan L; Bolund, Lars; Churcher, Carol; Fredholm, Merete; Groenen, Martien AM; Harlizius, Barbara; Lee, Kyung-Tai; Milan, Denis; Rogers, Jane; Rothschild, Max F; Uenishi, Hirohide; Wang, Jun; Schook, Lawrence B

doi:10.1186/1471-2164-11-438

Correspondence
Open access
Published: 19 July 2010

Pig genome sequence - analysis and publication strategy

Alan L Archibald¹,
Lars Bolund^2,3,
Carol Churcher⁴,
Merete Fredholm⁵,
Martien AM Groenen⁶,
Barbara Harlizius⁷,
Kyung-Tai Lee⁸,
Denis Milan⁹,
Jane Rogers¹⁰,
Max F Rothschild¹¹,
Hirohide Uenishi¹²,
Jun Wang^2,13,
Lawrence B Schook¹⁴ &
the Swine Genome Sequencing Consortium

BMC Genomics volume 11, Article number: 438 (2010) Cite this article

18k Accesses
121 Citations
12 Altmetric
Metrics details

Abstract

Background

The pig genome is being sequenced and characterised under the auspices of the Swine Genome Sequencing Consortium. The sequencing strategy followed a hybrid approach combining hierarchical shotgun sequencing of BAC clones and whole genome shotgun sequencing.

Results

Assemblies of the BAC clone derived genome sequence have been annotated using the Pre-Ensembl and Ensembl automated pipelines and made accessible through the Pre-Ensembl/Ensembl browsers. The current annotated genome assembly (Sscrofa9) was released with Ensembl 56 in September 2009. A revised assembly (Sscrofa10) is under construction and will incorporate whole genome shotgun sequence (WGS) data providing > 30× genome coverage. The WGS sequence, most of which comprise short Illumina/Solexa reads, were generated from DNA from the same single Duroc sow as the source of the BAC library from which clones were preferentially selected for sequencing. In accordance with the Bermuda and Fort Lauderdale agreements and the more recent Toronto Statement the data have been released into public sequence repositories (Genbank/EMBL, NCBI/Ensembl trace repositories) in a timely manner and in advance of publication.

Conclusions

In this marker paper, the Swine Genome Sequencing Consortium (SGSC) sets outs its plans for analysis of the pig genome sequence, for the application and publication of the results.

Background

The pig genome is being sequenced and characterised under the auspices of the Swine Genome Sequencing Consortium [1]. A Data Release Workshop convened in Toronto in May 2009 by Genome Canada and other funding agencies affirmed and extended the commitments to prepublication release of large data sets in the life sciences which were originally developed in the context of the Human Genome Project. The Toronto Statement [2] places obligations on the producers of such data sets, including genome sequence data, in respect of prepublication release of the data and confirms the principle that allows the data producers to publish the first global analyses of the data set. The data producers are encouraged to produce a citable statement or "marker paper" in which they describe the data set and their intentions in respect of analysis and publication. In this marker paper, the Swine Genome Sequencing Consortium (SGSC) sets outs its plans for analysis of the pig genome sequence, for the application and publication of the results. These plans were presented to participants in the Pig Genome III conference held at the Wellcome Trust Sanger Institute, 2-4 November 2009.

Results

Pig genome sequence data

The sequence data from which a draft pig genome sequence will be assembled comprises hierarchical shotgun sequence data providing 4-6× genome coverage from BAC clones representing a minimal tile path across the genome plus > 30× genome coverage in whole genome shotgun sequence (WGS) data generated using Sanger (capillary) and next-gen (Illumina) technologies. The minimal tile path was identified from a high quality physical (BAC contig) map [3] and provides coverage of 98.3% of this physical map. As at 5^th July 2010 the total length of the BAC-derived sequence contigs, prior to the removal of sequence redundancy between overlapping BAC clones, was 3.01 Gbp of which 156.3 Mbp was at finished quality. These sequence data were generated from 16,707 BAC clones of which 15,895 have been subjected to one round of automated pre-finishing.

Prepublication data release

In accordance with the Bermuda and Fort Lauderdale agreements and the more recent Toronto Statement [2] the data have been released into public sequence repositories (Genbank/EMBL, NCBI/Ensembl trace repositories) in a timely manner and in advance of publication. Assemblies of the genome sequence have been annotated using the Pre-Ensembl and Ensembl automated pipelines and made accessible through the Pre-Ensembl/Ensembl browsers. The current annotated genome assembly (Sscrofa9) was released with Ensembl 56 in September 2009. The current assembly (Sscrofa9) was constructed entirely from the BAC-derived sequence data.

Analysis strategy

A revised assembly (Sscrofa10) is being constructed from the BAC clone derived sequence together with the WGS data. The publication of a draft genome sequence for the pig will be based on this new assembly. A series of analysis working groups have been established in consultation with the pig genome research community under the auspices of the SGSC in order to undertake genome-wide analyses of the genome sequence. These groups with their respective lead contacts are summarised in Table 1. Details of the work of these groups will be posted on the SGSC website at http://www.piggenome.org.

Table 1 Swine Genome Sequencing Consortium genome sequence analysis groups

Full size table

Publication strategy

The Swine Genome Sequencing [1] and Swine HAPMAP [4] consortia respectively propose to develop two summary papers for publication describing a) the sequencing and analysis of the pig genome and b) genetic variation and haplotype structures across a range of pig breeds and related Sus species. In addition, the consortia propose to develop a series of companion papers describing either the results from the analysis groups and/or results from other research projects that have been enabled by the publication of a draft sequence of the pig genome. The consortia would be pleased to hear from research groups with plans for manuscripts that could be included within the list of companion papers. Please address correspondence to either Alan Archibald alan.archibald@roslin.ed.ac.uk or Larry Schook schook@illinois.edu.

Discussion

The value of the pig genome sequence lies not only in shaping the continued use of pigs in agriculture and medical research but also in the realm of evolution and domestication (natural and artificial selection) [5]. The pig is an economically important species not only as a major source of meat-based protein but also increasingly as a model for biomedical research. For example, the pig has value as a model of a spectrum of human diseases that may be modelled less well in rodents, including obesity, arthritis and cardiovascular disease.

The domestic pig (Sus scrofa) is a eutherian mammal and a member of the Cetartiodactyla order, a clade distinct from rodent and primates that last shared a common ancestor with humans between 79 and 87 million years ago. The domestic pig belongs to the Suidea family that consists of multiple species, all found in Asia, Europe and Africa. The availability of this wide variety of pig species that diverged over a period of around 2 to 15 million years provides a rich resource to study genomic changes in relation to speciation. A well characterised pig genome sequence forms a template for the study of within and between species genetic variation. Our analysis of the pig genome sequence will be set in the context of parallel research on the genomes of closely related and contemporary Suids (e.g. Sus verrocus, Sus celebensis and Sus barbatus) and on within breed genetic variation using the 60 K pig SNP chip [4] and by re-sequencing.

Conclusions

The pig genome sequencing project has been conducted in an open international collaborative manner in the spirit of the Bermuda and Fort Lauderdale agreements. In accordance with the more recent Toronto Statement the sequence data have been released in advance of publication. In this marker paper, the Swine Genome Sequencing Consortium (SGSC) sets outs its plans for analysis of the pig genome sequence, for the application and publication of the results.

Methods

Sequencing strategy

The pig genome has been sequenced following a hybrid approach representing a refinement of the strategy announced earlier [1] (Figure 1). Briefly, BAC clones selected to represent a minimal tile path across the genome were identified from the high resolution physical (BAC contig) map [2] and were subjected to hierarchical shotgun sequencing. BAC clones from the CHORI-242 library prepared from DNA from a single Duroc sow (Duroc 2-14) were preferentially chosen for sequencing. The initial plan was to skim sequence the BAC clones to 3× coverage. In practice, both ends of 768 subclones for each BAC were sequenced (average read length of 707 bp) to provide ~4× coverage. Most BAC clones have subsequently been subjected to one round of automated pre-finishing by primer walking from the ends of the clone sequence contigs constructed from the initial 4× coverage skim sequencing. This hierarchical shotgun sequencing was primarily undertaken at the Wellcome Trust Sanger Institute, with additional clones sequenced by the National Institute of Agrobiological Sciences, Japan. In addition whole genome shotgun (WGS) sequence data were generated from DNA isolated from the same animal (Duroc 2-14). These WGS data were generated using both Sanger capillary sequencing at the Korean Livestock Research Institute and Illumina/Solexa sequencing at the Beijing Genomics Institute and the Wellcome Trust Sanger Institute.

References

Schook LB, Beever JE, Rogers J, Humphray S, Archibald A, Chardon P, Milan D, Rohrer G, Eversole K: Swine Genome Sequencing Consortium (SGSC): a strategic roadmap for sequencing the pig genome. Comp Funct Genom. 2005, 6: 251-255. 10.1002/cfg.479.
Article CAS Google Scholar
Toronto International Data Release Workshop Authors: Prepublication data release. Nature. 2009, 461: 168-70. 10.1038/461168a.
Article Google Scholar
Humphray SJ, Scott CE, Clark R, Marron B, Plumb R, Bender C, Camm N, Davis J, Jenks A, Noon A, Patel M, Sehra H, Yang F, Rogatcheva MB, Milan D, Chardon P, Rohrer G, Nonneman D, de Jong P, Meyers SN, Archibald A, Beever JE, Schook LB, Rogers J: A high utility integrated map of the pig genome. Genome Biol. 2007, 8 (7): R139-10.1186/gb-2007-8-7-r139.
Article PubMed Central PubMed Google Scholar
Ramos AM, Crooijmans RPMA, Amaral AJ, Archibald AL, Beever JE, Bendixen C, Dehais P, Affara NA, Hansen MS, Hedegaard J, Hu Z-L, Kerstens HH, Law AS, Megens HJ, Milan D, Nonneman DJ, Rohrer GA, Rothschild MF, Smith TPL, Schnabel RD, Van Tassell CP, Clark R, Churcher C, Taylor JF, Wiedmann RT, Schook LB, Groenen MAM: Design of a high density SNP genotyping assay in the pig using SNPs identified and characterized by next generation sequencing technology. PLoS ONE. 4 (8): e6524-10.1371/journal.pone.0006524.
Rohrer G, Beever JE, Rothschild MF, Schook L, Gibbs R, Weinstock G: Porcine genomic sequencing initiative. (NIH White Paper). 2002, [http://www.animalgenome.org/pigs/community/PigWhitePaper/]
Google Scholar

Download references

Acknowledgements

The Swine Genome Sequencing Consortium is grateful to the following for funding support for the pig genome sequencing project: the USDA National Institute of Food and Agriculture, formerly the Cooperative State Research, Education and Extension Service; the Agence Nationale de la Recherche; European Union SABRE; the Institute for Pig Genetics, Netherlands; INRA Genescope, France; Iowa Pork Producers Association; Iowa State University; Korean National Livestock Research Institute; National Institute of Agrobiological Sciences, Japan; National Pork Board, U.S.; North Carolina Pork Council; North Carolina Agricultural Research Service; North Carolina State University; the University of Illinois; the "Pigs and Health" programme of the Danish Advanced Technology Foundation, Denmark; the Wellcome Trust Sanger Institute; The Roslin Institute, University of Edinburgh and Biotechnology and Biological Sciences Research Council, U.K.; the University of Illinois Livestock Genome Sequencing Initiative.

Author information

Authors and Affiliations

The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Roslin, UK
Alan L Archibald
BGI-Shenzhen, Shenzhen, 518083, China
Lars Bolund & Jun Wang
Institute of Human Genetics, Aarhus University, DK-8000, Aarhus, Denmark
Lars Bolund
The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
Carol Churcher
Copenhagen University, Copenhagen, Denmark
Merete Fredholm
Wageningen University, Animal Breeding and Genomics Centre, Wageningen, The Netherlands
Martien AM Groenen
Institute for Pig Genetics, Beuningen, The Netherlands
Barbara Harlizius
Korean National Institute of Animal Science, Suwon, Kyunggi-do, Korea
Kyung-Tai Lee
INRA Toulouse, France
Denis Milan
The Genome Analysis Centre, Norwich, UK
Jane Rogers
Department of Animal Science and Center for Integrated Animal Genomics, Iowa State University, Ames, Iowa, 50011, USA
Max F Rothschild
National Institute of Agrobiological Sciences, Japan
Hirohide Uenishi
Department of Biology, University of Copenhagen, Copenhagen, Denmark
Jun Wang
Institute of Genomic Biology, University of Illinois, Urbana, Illinois, USA
Lawrence B Schook

Authors

Alan L Archibald
View author publications
You can also search for this author in PubMed Google Scholar
Lars Bolund
View author publications
You can also search for this author in PubMed Google Scholar
Carol Churcher
View author publications
You can also search for this author in PubMed Google Scholar
Merete Fredholm
View author publications
You can also search for this author in PubMed Google Scholar
Martien AM Groenen
View author publications
You can also search for this author in PubMed Google Scholar
Barbara Harlizius
View author publications
You can also search for this author in PubMed Google Scholar
Kyung-Tai Lee
View author publications
You can also search for this author in PubMed Google Scholar
Denis Milan
View author publications
You can also search for this author in PubMed Google Scholar
Jane Rogers
View author publications
You can also search for this author in PubMed Google Scholar
Max F Rothschild
View author publications
You can also search for this author in PubMed Google Scholar
Hirohide Uenishi
View author publications
You can also search for this author in PubMed Google Scholar
Jun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lawrence B Schook
View author publications
You can also search for this author in PubMed Google Scholar

Consortia

the Swine Genome Sequencing Consortium

Corresponding authors

Correspondence to Alan L Archibald or Lawrence B Schook.

Additional information

Authors' contributions

All authors are members of the Swine Genome Sequencing Consortium (SGSC) under whose auspices the pig genome is being sequenced. They are responsible for securing the funding for, and the management of, the pig genome sequencing project. ALA, DM, JR, MFR and LBS are members of the SGSC Steering Committee. ALA, MF, DM, JR, MFR, HU and LBS are members of the SGSC Technical Committee. LBS, CC, ALA, MAMG, DM, JR, MF, MFR comprise the SGSC Manuscript Steering Committee which is directing the SGSC's analysis and publication strategy. JR and CC led the sequencing team at the Wellcome Trust Sanger Institute which generated the BAC clone derived sequence data, during the initial and later stages of the project, respectively. LBS and JR were co-directors of the USDA grant which provided ca. 50% of the project funding. MAMG was work package leader for the EC-funded project to sequence chromosomes 7 and 14. BH and MAMG were project leaders for the IPG-funded project to sequence chromosome 4. ALA was the PI for the BBSRC grant on annotation and analysis. MFR secured US pig industry funding for the project and led a pilot project to generate finished sequence for part of chromosome 17. JW led the Beijing Genomics Institute effort to generate WGS data using Illumina next-gen sequencing technology partially funded by a grant of which LB was the PI. K-TL led the team at the Korean Livestock Research Institute that has contributed WGS data using Sanger capillary technology. HU leads the team at Japanese National Institute of Agrobiological Sciences which contributed full length cDNA sequence and some BAC clone sequence data. DM leads the team which is validating the sequence assembly against a high resolution radiation hybrid map. Finally, some of the leadership roles of the authors in the analysis of the sequence data are highlighted in Table 1. All authors have read and approved the manuscript.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Archibald, A.L., Bolund, L., Churcher, C. et al. Pig genome sequence - analysis and publication strategy. BMC Genomics 11, 438 (2010). https://doi.org/10.1186/1471-2164-11-438

Download citation

Received: 15 April 2010
Accepted: 19 July 2010
Published: 19 July 2010
DOI: https://doi.org/10.1186/1471-2164-11-438

Pig genome sequence - analysis and publication strategy

Abstract

Background

Results

Conclusions

Background

Results

Pig genome sequence data

Prepublication data release

Analysis strategy

Publication strategy

Discussion

Conclusions

Methods

Sequencing strategy

References

Acknowledgements

Author information

Authors and Affiliations

Consortia

the Swine Genome Sequencing Consortium

Corresponding authors

Additional information

Authors' contributions

Authors’ original submitted files for images

Authors’ original file for figure 1

Rights and permissions

About this article

Cite this article

Keywords

BMC Genomics

Contact us

Pig genome sequence - analysis and publication strategy

Abstract

Background

Results

Conclusions

Background

Results

Pig genome sequence data

Prepublication data release

Analysis strategy

Publication strategy

Discussion

Conclusions

Methods

Sequencing strategy

References

Acknowledgements

Author information

Authors and Affiliations

Consortia

the Swine Genome Sequencing Consortium

Corresponding authors

Additional information

Authors' contributions

Authors’ original submitted files for images

Authors’ original file for figure 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Genomics

Contact us