Chinese bayberry (Myrica rubra Sieb. and Zucc.) is a subtropical evergreen tree originating in China. It has been cultivated in southern China for several thousand years, and annual production has reached 1.1 million tons. The taste and high level of health promoting characters identified in the fruit in recent years has stimulated its extension in China and introduction to Australia. A limited number of co-dominant markers have been developed and applied in genetic diversity and identity studies. Here we report, for the first time, a survey of whole genome shotgun data to develop a large number of simple sequence repeat (SSR) markers to analyse the genetic diversity of the common cultivated Chinese bayberry and the relationship with three other Myrica species.
The whole genome shotgun survey of Chinese bayberry produced 9.01Gb of sequence data, about 26x coverage of the estimated genome size of 323 Mb. The genome sequences were highly heterozygous, but with little duplication. From the initial assembled scaffold covering 255 Mb sequence data, 28,602 SSRs (≥5 repeats) were identified. Dinucleotide was the most common repeat motif with a frequency of 84.73%, followed by 13.78% trinucleotide, 1.34% tetranucleotide, 0.12% pentanucleotide and 0.04% hexanucleotide. From 600 primer pairs, 186 polymorphic SSRs were developed. Of these, 158 were used to screen 29 Chinese bayberry accessions and three other Myrica species: 91.14%, 89.87% and 46.84% SSRs could be used in Myrica adenophora, Myrica nana and Myrica cerifera, respectively. The UPGMA dendrogram tree showed that cultivated Myrica rubra is closely related to Myrica adenophora and Myrica nana, originating in southwest China, and very distantly related to Myrica cerifera, originating in America. These markers can be used in the construction of a linkage map and for genetic diversity studies in Myrica species.
Myrica rubra has a small genome of about 323 Mb with a high level of heterozygosity. A large number of SSRs were identified, and 158 polymorphic SSR markers developed, 91% of which can be transferred to other Myrica species.
Chinese bayberry is an important commercial horticultural crop. It has been cultivated for more than 7,000 years in southern China, but is little known elsewhere. The production area is currently 340,000 ha, with an annual production of 1.1 million tons. The plant is diploid (2n = 16), generally dioecious, with the female plants cultivated for fruit , growing well on poor soils due to the association of nitrogen-fixing bacteria with the root system. It is rich in anthocyanins exhibiting a wide range of pharmacological properties, such as anti-inflammatory, antitumor and antioxidative effects .
There are four species within the genus Myrica in China, namely Myrica rubra Sieb. & Zucc., M. esculenta Buch.-Ham., M. nana Cheval., and M. adenophora Hance. M. rubra is widely distributed, with many local cultivars in the Zhejiang, Jiangsu, Fujian and Guangdong provinces and a few from Guizhou, Yunnan and Hunan provinces. The best known cultivars are Biqi and Dongkui, both from the Zhejiang province. Although there are abundant germplasm resources, studies on genetics and breeding of the species are still in their infancy. Molecular marker technology is a popular tool for breeding and genetic research, and with the construction of a genomic library, 13 polymorphic microsatellite loci have been developed in M. rubra and 11 from an expressed sequence tag library . Recently, 12 primer pairs have been temporarily developed by ISSR-suppression PCR  with GSG (GT)6 as the primer for enriching microsatellite sequences. Reports on the genetic diversity in Chinese bayberry using SSR markers have also recently been published [6,7], but the number of markers for Chinese bayberry is limited.
The reproducibility, multiallelicism, co-dominance, relative abundance and good genome coverage of SSR markers have made them one of the most useful tools for genetic diversity and linkage mapping. Genomic SSRs and EST-SSRs, considered complementary to plant genome mapping, have been reported in many fruit crops, such as walnut , cherry , apricot  and coconut . EST-SSRs are useful for genetic analysis, but their relatively low polymorphism and the high possibility of no gene-rich regions in the genome are limitations to their use. In contrast, genomic SSRs are highly polymorphic and tend to be widely distributed throughout the genome, resulting in better map coverage .
With genetic maps serving as the basis for future positional gene cloning, making map-based cloning and marker-assisted selection possible, the development of more SSRs is essential. As sequencing technologies advance, whole-genome shotgun (WGS) sequences are becoming increasingly available. These DNA sequences are valuable resources for SSR development in many plant species, such as rice  and papaya . In addition, they can be used to evaluate the frequency and distribution of different types of SSRs in the genome, and even help to estimate genome size and characters such as heterozygosis and repeats.
As a way of reducing the cost of genotyping research, Schuelke  proposed a method for fluorescent dye labelling of PCR fragments with a sequence-specific forward primer: the universal fluorescent-labelled M13(-21) primer, at the 5’ end, acts as the forward primer in a ‘one-tube’ reaction. As this method allows for high-throughput genetic analyses, with a high number of microsatellite markers widely used, we considered the possibility of using this approach for multiplex PCR, to improve the efficiency and save costs.
In this study, we mined and validated 158 SSR markers and describe their application for understanding the genetic relationship among 29 Chinese bayberry accessions and other Myrica species. These markers are useful for genotyping and genetic diversity analysis and linkage mapping of Myrica rubra and other Myrica species.
Genome survey using whole genome shotgun data in Chinese bayberry
WGS generated 273,161 (>100 bp) high quality sequence reads from two DNA libraries (250 bp and 500 bp) of the androphyte individual ‘C2010-55’. We used 9.01 G raw data for K-mer analysis and heterozygous simulation. For the 17-mer frequency distribution (Figure 1), the peak of the depth distribution was about 22. The estimated genome size was 323 Mb, using the formula: genome size = k-mer count/peak of the kmer distribution. The minor peak at 1/2 altitude of the main peak indicates the high level of heterozygosity in this genome (Figure 1). A total of 739,969 contigs were assembled with a total sequence length of 255.7 Mb. The length of N50 was 295 bp in our assembly, and the longest contig and scaffold 7,593 and 127,008 bp, respectively.
Figure 1. The distribution of 17-mer depth analysis based on whole genome shotgun data in Chinese bayberry.
Frequency distribution of different types of SSR markers
A total of 17,172 out of 273,161 scaffolds (6%) retrieved from the genome survey sequence contained 28,602 SSRs (Table 1), of which 5,401 contained more than one SSR, and 1,444 SSRs were present in compound format. Among the derived SSR repeats, the di-nucleotide was the most abundant repeat, accounting for 84.72% of total SSRs, followed by tri- (13.78%), tetra- (1.34%), penta- (0.12%), and hexa- (0.04%) nucleotides (Table 1). There was a large proportion of both dinucleotides and trinucleotides while the rest amounted to less 2%. The average frequency of occurrence was about 10.47% (Table 1).
Table 1. Occurrence of SSRs in the Genome Survey of Chinese bayberry
The SSR frequency of each motif is presented in Additional file 1. The SSR motif consists of 69 types. Among the repeat motifs of the dinucleotide, the AG/CT repeat was the most common, representing 53.72%, followed by 39.20% AT repeats (Figure 2), and the predominant motifs of trinucleotide (AAG/CTT and AAT/ATT) repeats accounted for 37.15% and 32.56%, respectively (Figure 3).
Format: DOC Size: 180KB Download file
This file can be viewed with: Microsoft Word Viewer
Figure 2. Percentage of different motifs in dinucleotide repeats in Chinese bayberry genome.
Figure 3. Percentage of different motifs in trinucleotide repeats in Chinese bayberry genome.
Polymorphism of SSR markers
We first designed and synthesised 600 SSR primer pairs from those scaffolds more than 2Kb long. The majority of SSR loci were dinucleotide repeats (597, 99.5%), and the remainder trinucleotide. Initially, the effectiveness of these primer pairs was detected in two cultivars (Biqi and Dongkui) and M. cerifera through denaturing PAGE (Polyacrylamide gel electrophoresis), and 581 (96.8%) of these were amplified successfully in Biqi and Dongkui, and 400 (66.7%) in M. cerifera. The SSR loci (186, 31%) were identified as heterozygous loci either in Biqi or in Dongkui. Subsequently, they were used to screen 32 accessions, and detected an average of 8.25 alleles and from 3 to 15 alleles per locus (Table 2).
Table 2. Characteristics of 158 SSR markers in this study
The PIC at each locus ranged from 0.256 to 0.883 with an average of 0.67 loci. The PCR product size ranged from 110 to 274 bp. All the primers produced amplicons in agreement with the expected sizes. Most of the SSR primers (139 primer pairs) showed significant deviation from HW equilibrium (P < 0.05). Partial correlation analysis showed that significant positive correlations existed between the repeat unit length and PIC (P < 0.01, r = 0.2747). This showed that these SSRs had high rates of transferability for M. adenophora (91.14%) and M. nana (89.87%) and low rates for M. cerifera (46.84%), indicating that these markers are suitable for genetic diversity analyses in other Myrica species.
One of the objectives of this study was to develop potential suitable SSR markers for genetic mapping using Biqi and Dongkui as crossing parents. We selected 99 heterozygous loci in Biqi and 105 in Dongkui (Table 3): 135 primer pairs can be used together in linkage mapping of the planned F1 populations between Biqi and Dongkui.
Table 3. Distribution of the segregation types expected for the mapping population (Biqi × Dongkui)
Genetic relationship analysis
The 32 accessions were divided into three groups (A, B and C, Figure 4), based on Nei’s genetic distance coefficient  using UPGMA cluster analysis. The similarity among all the accessions varied from 0.54 to 0.84. At the species level, the UPGMA dendrogram produced clusters separating M. nana and M. cerifera into two distinct groups. The genetic similarity between M. cerifera and M. rubra was 0.54, lower than the 0.74 previously reported by Xie .
Figure 4. Dendrogram for 32 Chinese bayberry accessions derived from UPGMA cluster analysis based on 158 SSR markers. The symbols before the accession codes indicate the sex: ○, androphyte plant, ●, common cultivars, and ◘, monoecious plant. The numbers are bootstrap values based on 1000 iterations. Only bootstrap values larger than 50 are indicated.
The main cluster ‘A’ included the subgroups A-1 and A-2. Subgroup A-1 includes 16 accessions, mainly from the cities of Ningbo (12) and Hangzhou (3), where the popular and dominant cultivar is Biqi. This demonstrates that these natural elite seedling selections are truly distinct from the local cultivars. For their genetic relationships (Figure 4), the rare monoecious individual (C2010-4) is closely related to Biqi, while the accessions ‘Shuijing’ and ‘Y2010-72’ (both white fruit type) are clearly separated in the cluster, with low genetic distance.
Subgroup A-2 includes 14 accessions, with four from Wenzhou, two from Taizhou, and one each from the cities of Hangzhou and Ningbo, and the Hunan, Guangxi, Guizhou and Jiangsu provinces. This group includes the popular cultivar Dongkui. The four accessions from Wenzhou distributed in this cluster have genetic similarity. The accession ‘Tongzimei’ from the Hunan province is on an independent branch, showing that it is genetically distinct. ‘Xiaolejiangchonghei’ and ‘M. adenophora’ grouped together, and are possibly in the same population. Six androphyte accessions, distributed in group A, are close to cultivars of the same geographic origin.
The accessions ‘Myrica nana’ from Yunnan and ‘Myrica cerifera’ from the USA were independently classified as the ‘B’ and ‘C’ group, indicating a distant relationship with cultivated Myrica rubra.
Our major aims were to find a large set of SSR markers for Myrica rubra and understand the genetic diversity and relationship among representative cultivars, androphyate and related species. This research paves the way for constructing an SSR-based linkage map in Myrica.
The genome characteristics of genusMyrica
Shotgun sequencing is suitable for simple genomes, with no or few repeat sequences, such as Chinese bayberry. For such genomes, the genome can largely be assembled simply by merging together reads containing overlapping sequence . We report the genome survey of Chinese bayberry using whole genome shotgun sequencing. The 17-nucleotide depth distribution suggests a genome size of 323 Mb, larger than peach (220 Mb, http://www.rosaceae.org/peach/genome webcite), but close to our estimate of 250 Mb from flow cytometry using rice as the reference (date not shown). Although the highly homozygous material was selected on a limited number of SSR loci assays, the actual heterozygous rate, as revealed by 185 new SSR markers, was very high (63%). To overcome the key issue of heterozygosity and allow us to generate a high-quality genome sequence, we can use a unique homozygous form such as monoploid, derived using tissue culture or from nature and worth further study.
Marker development for under-utilised fruit crops
SSRs have been widely used for high-throughput genotyping and map construction as they have the advantage of high abundance, random distribution within the genome, high polymorphism information content and stable co-dominance [18-20]. They can be developed from either genomic or expressed sequence tag (EST) libraries. Although EST-SSRs are useful for genetic analysis, their disadvantages of relatively low polymorphism and high concentration in gene-rich regions of the genome may limit their usage, especially for the construction of linkage maps . In this study, a total of 600 SSR primer pairs were designed from 28,602 SSR sites, and 581 (96.8%) primer pairs were effective. Due to the self-complementary nature to form dimers, AT/TA is not usually used to develop markers . Our findings are in agreement with that published for peach, where the dinucleotide repeat motifs were also found to be the most common, and (CT)n as the most common repeat unit .
In the present study, the mean value of PIC was higher than the previously reported 0.62 , but the consistent relationship was observed between SSR polymorphism and repeat unit length. There are some reports of a positive relationship between degree of polymorphism and repeat unit length [23,24]. However, those polymorphic SSRs that are homozygous in both parents cannot be mapped in F1 populations, although they are useful for mapping in F2 or backcross populations , while heterozygous SSRs can be used for mapping in F1 populations (Table 2). The estimated number of SSRs that can be mapped in the F1 populations between Biqi and Dongkui was about 85%.
Recently, based on mass sequence data of Chinese bayberry obtained by RNA-Seq, 41,239 UniGenes were identified and approximately 80% of the UniGenes (32,805) were annotated, which provides an excellent platform for future EST-SSR development and functional genomic research .
High efficient test methods
Normally, a universal M13 primer is labelled with one of a number of fluorescent dyes. The tailed primer provides a complementary sequence to the fluorescent labelled M13 primer, leading to the amplification of fluorescent PCR products, and then the PCR products of different sizes and/or labelled with different fluorescent dyes are mixed and tested . In this research, a multiplex PCR strategy was designed using the universal M13-tailed primer and three additional tail primers, designed arbitrarily, in presumed four-plex amplifications in single PCR, for a major reduction in cost and time. However, as only a few primer combinations were successful, most resulting in weak bands, we did the PCR individually and mixed the PCR products. Further optimisation of multiplex PCR is needed to evaluate its general applicability.
In this study, both cultivated species and wild species were analysed and their genetic diversity could easily be differentiated. M. nana and M. cerifera were clearly genetically distant to M. rubra. M. nana, also known as the dwarf or Yunnan arbutus, is indigenous to the Yunnan and Guizhou provinces, and has a plant height of < 2 m. The juvenile period of fruit tree can be shortened for breeding purposes. Studies on embryo culture in vitro of the F1 seeds of crosses between M. rubra and M. nana, , has shown good cross compatibility between M. rubra and M. nana, resulting in 70.5% normal seeds with intact embryo. M. adenophora and M. nana grow as wild trees, with the fruit of M. adenophora only suitable for medical purposes and not edible.
Our findings on the genetic similarity between M. adenophora and M. rubra, which are considered a progenitor-derivative species pair, are consistent with a previously published figure of 0.897 . An earlier study observed little change in allelic diversity along the chronosequence and no evidence for heterosis, although there was a moderate change in genotypic diversity . The markers developed in this study can be very useful in future population structure analysis.
In summary, the genome size of Myrica genus is small, about 320 Mb. A large set of SSRs were developed from a genome survey of Myrica rubra. The results suggest that they have high rates of transferability, making them suitable for use in other Myrica species.
Materials and methods
Plant materials and genome survey
We selected an androphyte ‘C2010-55’ for the genome survey because it was the most homozygous (10 out of 14 SSRs) individual among 230 accessions. Two DNA libraries of 250 and 500 bp insert size were constructed and sequenced by Illumina Hi-Seq 2000.
Twenty-nine accessions of the cultivated species (M. rubra) and 3 related species (M. adenophora, M. nana, M. cerifera), collected from different provinces in China (Table 4), were used to evaluate the suitability of the SSRs for genetic distance analysis. Young leaves were collected and frozen in liquid nitrogen prior to genomic DNA extraction using CTAB methods . DNA concentrations were measured spectrophotometrically at 260 nm, and the extracts electrophoresed on 1% agarose to confirm the quality. The purified DNAs were standardised at 40 ng/μl and stored at -40°C.
Table 4. The 32 bayberry accessions included in this study
SSR identification and primer design
We used MISA scripting language ( http://pgrc.ipk-gatersleben.de/misa/misa.html webcite) to identify microsatellite repeats in our sequence database. The SSR loci containing perfect repeat units of 2-6 nucleotides only were considered. The minimum SSR length criteria were defined as six reiterations for dinucleotide, and five reiterations for other repeat units. Mononucleotide repeats and complex SSR types were excluded from the study.
The SSR primers were designed using BatchPrimer3 interface modules ( http://pgrc.ipk-gatersleben.de/misa/primer3.html webcite). We selected 600 primers that met the following parameters: 110–230 final product length, primer size from 18 to 22 bp with an optimum size of 20 bp, and the annealing temperature was set at 60°C. The repeat units over eight were used.
Tail-1(M13 universal sequence-TGTAAAACGACGGCCAGT), Tail-2(CGAGTCAGTATAGGGCAC), Tail-3(ATCACGAAGCAGATGCAA) and Tail-4(GAGCATCTCGTACCAGTC) were added to the 5’ end of each 150 forward primers of pairs respectively. Tail-2, Tail-3 and Tail-4 were designed so that the primer size was 18 bp and the annealing temperature was 53°C. Primers were synthesised by Invitrogen Trading (Shanghai) Co., Ltd. We primarily tested two cultivars (Biqi and Dongkui) and M. cerifera for 600 SSR loci by PAGE (polyacrylamide denaturing gel) to confirm their suitability. Tail-1, Tail-2, Tail-3 and Tail-4 labelled with one of the following dyes: NED, PET, FAM, and HEX, respectively.
Polymerase chain reaction and gel electrophoresis
Each 20 μl reaction mixture contained 10 × PCR buffer (plus Mg2+), 0.2 mM of each dNTP, 5 pmol of each reverse, 4 pmol of the tail primer, 1 pmol of the forward primer, 0.5 units of rTaq polymerase (TaKaRa, China) and 40 ng genomic DNA template. Each primer pair had an interval of 20 bp according to the expected size of amplicons.
DNA amplification was in an Eppendorf Mastercycler (Germany) programmed at 94°C for 5 min for initial denaturation, then 32 cycles at 94°C (30 s)/58°C (30 s)/72°C (30 s), followed by 8 cycles of 94°C (30 s)/53°C (30 s)/72°C (30 s). The final extension step was 10 min at 72°C. Each PCR product was run on 1% agarose gel at 110 V for a quality check.
Subsequently, PCR products were electrophoresed on 8% denaturing PAGE, according to Myers et al. , at 60 W in a Sequi-Gen GT Nucleic Acid electrophoresis cell (BioRad) for 4 h, depending on the fragment sizes to be separated, and visualised by silver staining . Genotypes showing one and two bands were scored as homozygous and heterozygous, respectively, and the results recorded and photographed.
Multiplex PCR was designed and tested with products of different sizes and labelled with different fluorescent dyes. Each 20 μl reaction mixture contained 10 × PCR buffer (plus Mg2+), 0.8 mM of each dNTP, 1 unit of rTaq polymerase, 40 ng genomic DNA template and a total of four primer pairs with 5 pmol of each reverse primer, 4 pmol of each tail primer, and 1 pmol of each forward primer. The PCR products were diluted, mixed with the internal size standard LIZ500 (Applied Biosystems) and loaded on an ABI 3130 Genetic Analyzer. Alleles were scored using GeneMapper version 4.0 software (Applied Biosystems, Foster City, CA, USA).
The raw genome sequence data was first filtered to obtain high quality reads, then assembled using SOAP ( http://soap.genomics.org.cn/soapdenovo.html webcite) denovo software to contig, scaffold and fill in gaps. In addition, we used SSPACE software to build the scaffold. K-mer analysis was to help estimate the genome size and characters, such as heterozygosis and repeats.
The number of alleles (A), observed heterozygosity (Ho) and expected heterozygosity (He) were calculated using POPGENE version 1.32 ( http://www.ualberta.ca/~fyeh/popgene_download.html webcite). Chi-square test for Hardy-Weinberg equilibrium allele frequencies and polymorphism information content (PIC) was calculated using PowerMarker version 3.25  ( http://statgen.ncsu.edu/powermarker/downloads.htm webcite). Microsoft office Excel 2007 was used to analyse the correlation. Genetic similarity among all the accessions was calculated according to Dice’s coefficients using Nei's coefficient index  with the Free Tree 0.9.1.50  ( http://www.natur.cuni.cz/~flegr/programs/freetree.htm webcite) software, and the dendrogram constructed using the unweighted pair-group method with arithmetic averaging (UPGMA) option. The confidence of branch support was then evaluated by bootstrap analysis with 1,000 replicates. The dendrogram was printed using MEGA version 5.05 software  ( http://www.megasoftware.net/mega.php webcite).
ZSG: HJJ: EVW and MLC designed the experiments. YJ: CYC: GYW collected plant materials. YJ: HMJ and XWL performed the SSR experiments and analysed the data. The whole genome shotgun and sequencing assembly was performed by ZC. YJ: ZSG and EVW drafted this manuscript.
This work was supported by grants from the Zhejiang Province (2006 C14016) and Special Research Fund for International Cooperation with European Union (1114) and Public Welfare in Chinese Agriculture (contract no. 200903044) We thank Dr Rangjin Xie for technical assistance in the PAGE experiment, and Dr. Shirley Burgess for correcting the English.
1Department of Horticulture, The State Agriculture Ministry Laboratory of Horticultural Plant Growth, Development and Quality Improvement, Zhejiang University, Hangzhou 310058, China. 2BGI-Shenzhen, Beishan Industrial Zone, Yantian District, Shenzhen 518083, China. 3Fruit Research Institute, Yuyao, Ningbo 315400, China. 4Forestry Technology Extension Center, Cixi Ningbo 315300, China. 5Plant Breeding-Wageningen University and Research Centre, P.O. Box 166700 AA, Wageningen, The Netherlands.
J Agr Food Chem 2011, 59(2):537-545. Publisher Full Text
Mol Ecol Notes 2006, 6(3):709-711. Publisher Full Text
Conserv Genet 2009, 10(5):1605-1607. Publisher Full Text
Plant Mol Biol Rep 2011, 29(3):554-562. Publisher Full Text
Plant Mol Biol Rep 2010, 28(4):646-653. Publisher Full Text
Plant Genet Resour-C 2010, 8(3):242-248. Publisher Full Text
Audergon JM, Bourguiba H, Khadari B, Krichen L, Trifi-Farah N, Santoni S: Grafting versus seed propagated apricot populations: two main gene pools in Tunisia evidenced by SSR markers and model-based Bayesian clustering.
Tree Genet Genomes 2010, 6(1):73-81. Publisher Full Text
Tropical Plant Biology 2008, 1(3):278-292. Publisher Full Text
Am Nat 1972, 106(949):283-292. Publisher Full Text
Plant Genet Resour-C 2011, 9(3):404-410. Publisher Full Text
McCouch SR, Cho YG, Ishii T, Temnykh S, Chen X, Lipovich L, Park WD, Ayres N, Cartinhour S: Diversity of microsatellites derived from genomic libraries and GenBank sequences in rice (Oryza sativa L.).
Theor Appl Genet 2000, 100(5):713-722. Publisher Full Text
Theor Appl Genet 2000, 101(3):421-428. Publisher Full Text
Cuc LM, Mace ES, Crouch JH, Quang VD, Long TD, Varshney RK: Isolation and characterization of novel microsatellite markers and their application for diversity assessment in cultivated groundnut (Arachis hypogaea).
Mol Breeding 2011, 28(2):241-254. Publisher Full Text
Int J Syst Evol Micr 2001, 51:731-735. Publisher Full Text