Natural history museums and botanic gardens are often viewed primarily as educational facilities, providing hands-on insight into the biological past or glimpses into the exotic far-away. Few will perhaps realise that, more importantly, these institutions represent vast storehouses of biological data – much of it catalogued but left untouched. The UK’s Natural History Museum alone contains more than 80 million biological specimens, whilst the Royal Botanic Gardens at Kew, UK, contains samples from around 90 percent of the world’s flowering plant species. Although these collections represent a potential treasure trove of biological data, the logistics of collating and analysing such a wealth of information presents both a challenge and an opportunity to modern biologists. The surge in recent developments in genomics and sequencing technologies presents some potential solutions, but many key questions remain regarding how to archive and store this vast amount of data.
To discuss how best to exploit the opportunities presented by the world’s newest biological technologies, a meeting hosted by the world’s oldest biological society – The Linnean Society – recently brought together leading scientists from a diverse range of related disciplines. Executive Editor of BMC Genetics, BMC Plant Biology, and BMC Ecology Simon Harold (@sid_or_simon) asked the meeting’s organisers Bill Baker and Sven Buerki from the Royal Botanic Gardens, Kew, UK, and a selection of other participants, about the future potential of collections-based research in the genomic era.
How is the role of museums and collections changing in the era of big data?
Bill Baker and Sven Buerki: Biological collections housed at institutions like the Royal Botanic Gardens at Kew in London, UK, play a fundamental role in our understanding of the living world. Together, the worldwide network of collections-based organisations contains an encyclopaedic record of the diversity of life on Earth that can be tapped for countless scientific purposes, from surveys of biodiversity hotspots through to research on human pathogens and food security. However, the application of DNA methods to collections has always been limited by the state of preservation of each individual specimen, especially historical material, which rarely yields DNA of usable quality. The scientific community has attempted to address this by developing a network of tissue banks of material preserved for the purpose of high grade DNA isolation. However, next generation sequencing bypasses these problems as the new technology actually requires fragmented DNA.
For the first time in history, a gold mine of genetic research opportunities are being opened up in herbarium and museum collections. For example, now that preservation is a much-reduced obstacle, we can analyse rare or even extinct species known only from such collections to find their place in the Tree of Life. The scale of genomic data that could be derived from biological collections is immense, and will provide an almost limitless range of research possibilities in support of biodiversity discovery, evolutionary biology, metagenomics, conservation and global environmental change, to name but a few. The genomic era only intensifies the importance of collections to science and society. It is all the more tragic that collections-based institutions, such as the Royal Botanic Gardens, Kew, have never been more threatened by cuts and closures, just as their potential and value is reaching even greater heights. It is up to the whole community to fight for these collections to prevent the squandering of such an extraordinary scientific opportunity.
To what extent is destructive sampling of museum specimens for genomic research justified by the benefits it may bring?
Tim Littlewood: A beguilingly simple question perhaps, but not an easy one to answer succinctly. At any given time that such a question is asked, one must weigh up the level of destruction against the perceived benefits with the technology currently available. Complete destruction is a heavy price to pay regardless of the potential benefit, but also depends on the scarcity and nature of the specimen. Partial destruction needs to be assessed in terms of what and how much is destroyed; a small drill hole, a single insect leg, a fragment of hair, a micro punch through a herbarium sheet all may seem harmless enough subsampling practices, particularly if done discreetly and minutely. More importantly, none of these activities should take place without an assessment that the intended techniques will work and those applying them are qualified to undertake the work; many discrete subsamples can still amount to the steady annihilation of a specimen. The impact of next-generation sequencing (NGS) suggests we should soon know how little we need and how much information we can extract from a variety of specimens, but there will always be factors that can lead to failure. Also, we are far behind in refining a process map that allows us (curators and researchers) to make an informed set of decisions on destructive sampling and it’s high time we work together to be able to do this so that we can prioritise our activities and where necessary enhance our collections appropriately, for current and future genomic research.
What are the main challenges in extracting genomic data from museum specimens?
Freek Bakker: Herbarium DNA appears to be preserved surprisingly well and therefore offers great opportunities for future (herbarium) genomic studies. For instance, recent work by Staats and colleagues (PLoS One. 2011, 6, 12:e28448), including comparisons between fresh and 100 year old herbarium DNA from the same individual trees, showed that it is not so much the specimen age per se but rather the fact that specimens have been ‘baked’ in the herbarium protocols at some stage, that causes double-stranded breaks and polymerase inaccessibility to happen. In other words, roughly 90 percent of all plant DNA appears to be ‘locked up’ after baking the specimen. Results also showed that the level of miscoding lesions, potentially causing additional ‘post-mortem’ nucleotide substitutions in herbarium DNA sequences, is negligibly small.
In a follow-up study (PLoS One. 2013, Jul 29, 8, 7:e69189) the authors demonstrated using Illumina HiSeq technology that herbarium DNA is perfectly amenable to genome sequencing (in spite of the 90 percent DNA ‘lock-up’), especially chloroplast genomes, and in the case of an Arabidopsis specimen a full nuclear genome too.
Remaining challenges to extracting genomic data from museum specimens are probably i) excessive fragmentation of herbarium DNA, affecting the efficiency of paired-end sequencing and preventing the use of new generation technologies such as Pacific Biosciences whole molecule sequencing; and ii) the presence of contaminant DNA in herbarium samples, for instance from either endophytic or ‘post-mortem’ fungi. How the latter can interfere with the ‘genome skimming’ approach usually applied in larger comparative studies remains to be ascertained. Up to one percent contaminant fungal reads were found in all samples included in a herbarium ‘genome skimming’ study by the same authors, causing in some cases fungal instead of target ribosomal DNA sequences to be assembled.
All in all, the way is wide open for exploring our herbarium collections for genomic data, which will enable testing new hypotheses especially within a historic biological framework.
How well do we understand the relationship between phenotype and genotype in the era of genomics?
Serian Sumner: Our understanding of the relationship between genotype and phenotype is undergoing a revolution. We have known for a long time that phenotypic diversity arises from genetic diversity i.e. through variation in the sequence of bases G, A, C and T. But it is now becoming clear that there is a lot more than just sequence variation going on in the route from genome to ‘phenome’. Genomic studies are now showing that there are many ways that the same gene can be used to produce different phenotypes. For example, a gene expressed early in development (or in one tissue) can produce a very different phenotype to the same gene expressed later in development (or in another tissue). We are still a long way from understanding how the timing, location and level of expression of a gene can contribute to a specific phenotypic attribute. We understand even less about how non-genetic material – the epigenetics, or chemical modifications – contribute to the phenotype, for example by regulating gene expression, especially in the natural environment. Until now, our understanding of the relationship between genotype and phenotype has been restricted to a few model organisms. The era of genomics is changing this by enabling us to dissect simultaneously the contributions of genetic variation (genotype), gene expression and gene regulatory processes to a single phenotypic trait in almost any organism. Our understanding of this relationship is being redefined by the modern genomics era.
Are all museum specimens precious?
Michael Bunce: While it is true that all museum specimens are unique, it is a stretch to say they are all precious. Many factors come into play when deciding if a museum bone/skin should be made available for destructive sampling – one of these factors is how valuable the sample is to the collection in which it is housed. To cite an example from my research – the extinct New Zealand moa – the fossils are abundant in New Zealand and are even legally sold within the country on auction websites. Over the past decade my research has sampled close to 1000 moa remains – where possible we have focused on bones/egg that are broken and not destined for display. On the flip-side are the sample fossils described by Richard Owen in the National History Museum in London, UK, which include some of the first moa to ever be collected and as such are part of paleontological history. In short, every museum specimen should be treated on a case-by-case basis and arguments considered for the risks and benefits of any destructive sampling request. There is a balance to be struck with regard to sampling as research on collections, more often than not, adds value to the collection and our understanding of species biology and evolution.
More about the organiser(s)
William(Bill) Baker (@BillJBaker) is an Assistant Keeper at the herbarium of the Royal Botanic Gardens at Kew, UK. He obtained his PhD at the University of Reading, UK, and went on to cultivate his research interests in the evolution, systematics and biogeography of palms becoming Chair the IUCN Species Survival Commission Palm Specialist Group. His current research interests focus on evolution and diversity in the palm family, addressing phylogenetic relationships and the classification of palms at all taxonomic levels using a range of data sources. Baker also investigates the origins of diversity at the species level through interdisciplinary studies, and explores the nature and origins of global patterns of palm diversity.
Sven Buerki (@BuerkiSven) is a plant evolutionary biologist and postdoctoral fellow at the Jodrell Laboratory of the Royal Botanic Gardens at Kew, UK. He received his PhD from the University of Neuchatel, Switzerland, and undertook his postdoctoral training at the Royal Botanical Garden of Madrid, Spain, before joining the Royal Botanical Gardens at Kew. Buerki is also a research associate at the Missouri Botanical Garden, USA. His research centres around the molecular systematics, evolutionary biology and biogeography of various groups of plants and insects – at both small and large spatiotemporal scales.