Insertion sequences (IS) are small DNA segments capable of transposing within and between prokaryotic genomes, often causing insertional mutations and chromosomal rearrangements. Although several methods are available for locating ISs in microbial genomes, they are either labor-intensive or inefficient. Here, we use vectorette PCR to identify and map the genomic positions of the eight insertion sequences (IS1, 2, 3, 4, 5, 30, 150, and 186) found in E. coli strain CGSC6300, a close relative of MG1655 whose genome has been sequenced.
Genomic DNA from strain CGSC6300 was digested with a four-base cutter Rsa I and the resulting restriction fragments ligated onto vectorette units. Using IS-specific primers directed outward from the extreme ends of each IS and a vectorette primer, flanking DNA fragments were amplified from all but one of the 37 IS elements identified in the genomic sequence of MG1655. Purification and sequencing of the PCR products confirmed that they are IS-associated flanking DNA fragments corresponding to the known IS locations in the MG1655 genome. Seven additional insertions were found in strain CGSC6300 indicating that very closely related isolates of the same laboratory strain (the K12 isolate) may differ in their IS complement. Two other E. coli K12 derivatives, TD2 and TD10, were also analyzed by vectorette PCR. They share 36 of the MG1655 IS sites as well as having 16 and 18 additional insertions, respectively.
This study shows that vectorette PCR is a swift, efficient, reliable method for typing microbial strains and identifying and mapping IS insertion sites present in microbial genomes. Unlike Southern hybridization and inverse PCR, our approach involves only one genomic digest and one ligation step. Vectorette PCR is then used to simultaneously amplify all IS elements of a given type, making it a rapid and sensitive means to survey IS elements in genomes. The ability to rapidly identify the IS complements of microbial genomes should facilitate subtyping closely related pathogens during disease outbreaks.
Insertion sequences (IS) are small DNA segments capable of transposing within and between prokaryotic genomes and episomes, often causing insertional mutations and chromosomal rearrangements . Identifying and mapping IS elements in microbial genomes is essential to understand their evolutionary significance [2-5]. So rapidly can IS elements move that even closely related laboratory strains commonly differ in the positions of their IS sequences [6,7]. A swift means to identify IS insertions might therefore allow isolates from specific disease outbreaks to be distinguished from other closely related strains.
Several methods have been used to identify the number and locations of IS elements in bacterial genomes, including Southern hybridization  and the inverse polymerase chain reaction (iPCR) [4,8,9]. Southern hybridization is rather time-consuming and requires additional procedures for localizing ISs. Inverse PCR, a commonly used PCR method for recovering unknown flanking sequences of a known target sequence, uses a library of circularized chromosomal DNA fragments as template and two outward primers located in each end of the known fragment for amplification . However, when a target sequence has multiple genomic locations, the variously sized DNA circles formed are difficult to amplify simultaneously. Also, the length of each restriction DNA fragment containing a target sequence must be determined by Southern hybridization followed by sub-genomic fractioning before intramolecular ligation and PCR amplification [4,8,9]. These difficulties render Southern hybridization and iPCR impractical as techniques for quickly surveying repetitive elements in genomes.
Vectorette PCR (vPCR) [10,11] is another method used to amplify unknown sequences flanking a characterized DNA fragment. It involves cutting genomic DNAs with a restriction enzyme, ligating vectorettes to the ends, and amplifying the flanking sequences of a known sequence using primers derived from the known sequence along with a vectorette primer (Fig. 1). This technique has found many applications, including sequencing cosmid insert termini , identifying telomeres  and microsatellite sequences , mapping deletions, insertions, and translocations [14,15], and determining the 5' and 3' ends of mRNAs . Here, we explore the efficiency of vPCR with regards to identifying and mapping IS elements in microbial genomes. We show that multiple copies of an IS are readily amplified using an IS specific primer in combination with a vectorette primer, and that their genomic locations are readily identified from the flanking DNA sequences.
Figure 1. Vectorette PCR for amplification of IS franking sequences. The shadowed area represents the IS sequence. The solid lines indicate the flanking DNA sequences. ∇ indicates the restriction site. A and B are the outward IS-specific primers located at the ends of the IS. V is a vectorette primer.
Results and discussion
The IS insertions of CGSC6300
We used E. coli strain CGSC6300, a close relative of the sequenced strain MG1655, against which to test the efficiency and reliability of vPCR in detecting IS copies. IS insertion sites were identified by sequencing flanking DNA fragments amplified using outward IS-specific primers in combination with the vectorette primer. Based on the whole genome sequence of strain MG1655 , there are 37 ISelements, including 7 copies of IS1, 6 copies of IS2, 5 copies of IS3, 1 copy of IS4, 11 copies of IS5, 3 copies of IS30, 1 copy of IS150, and 3 copies of IS186. Our results for each IS in CGSC6300 are summarized in Table 1 and described as follows:
Table 1. IS elements in closely related E. coli strains MG1655, CGSC6300, TD2 and TD10
Eight and 6 PCR bands, obtained with primers IS1-A and IS1-B respectively, were observed on ethidium bromide-stained agarose gels (Fig. 2). All 7 IS1 insertion sites in the sequenced genome of MG1655  were successfully identified by isolating and sequencing these fragments. Sequences obtained from both flanking sequences were used to locate 2 IS1 elements (IS1-5 and IS1-6). The remaining 5 IS1 locations were identified from single flanking sequences. Three additional IS1 elements (IS1-a in b0240, IS1-c in b1786, and IS1-f in b2635) were also found in CGSC6300.
Figure 2. PCR amplification of IS flanking DNA from E. coli strains CGSC6300, TD2 and TD10. Results for IS1, 2, 3, and 5 and 186 are shown. Genomic DNA was digested with Rsa I, ligated with vectorette units and amplified by vPCR. Each panel shows the PCR products generated by two outward IS-specific primers (arrows) of an IS in combination with the vectorette primer. Flanking DNA fragments from both sides of each IS location were amplified. The PCR products were excised, purified, sequenced and identified from the genome sequence of E. coli strain MG1655 . A PCR fragment flanking a known IS site in MG1655 is indicated by the element's name followed by an identifying numeral; for example, IS1-1 is one of 7 IS1 elements in the MG1655 genome. Additional flanking DNAs not found in MG1655 are labeled with the b# of the gene in which the IS is located. PCR products were separated in 1.4% agarose gels and stained with ethidium bromide. Intense bands in the 100 kb ladder correspond to 500 and 1000 bp.
Primers IS2-A and IS2-B produced 4 bands and 6 bands, respectively (Fig. 2). Three (IS2-1, IS2-4, and IS2-5) were located from both flanking sequences and the remaining 3 (IS2-2, IS2-3, and IS2-6) were located from one flanking sequence. Gene b1579, homologous to the IS2 transposase , was also amplified, sequenced and located in CGSC6300.
Three IS3 elements (IS3-2, -3, and -4) were each located by sequencing amplified flanking DNAs from both sides, and two (IS3-1 and IS3-5) were each located by sequencing a single flanking sequence (Fig. 2). Three additional IS3 elements were found at b0805 (IS3-b),b3604 (IS3-e), b4242 (IS3-f).
One IS4 was located based on flanking sequences amplified from both sides. No additional IS4 insertions were found.
Primers IS5-A and IS5-B produced 7 and 9 bands, respectively (Fig. 2). Purification and sequencing of these DNA fragments showed that they correspond to flanking sequences of IS5-2, -3, -5, -6, -7, -9 for the A-side and IS5-1, -2, -5, -6, -7, -8, -9, -10, -11 for the B-side. Fragments flanking either side of IS5-4 were not identified.
The three known IS30 insertions in MG1655/CGSC6300 were identified based on flanking sequences amplified from both sides, and an additional insertion was identified in b2156.
The one known IS150 insertion was identified and no other.
The three known IS186 insertions were identified based on flanking sequences amplified from both sides (Fig. 2).
Additional IS copies in laboratory strains
Several IS elements located in CGSC6300 are not found in the genomic sequence of MG1655 (Table 1). Lyophylized CGSC6300 was obtained from the E. coli Genetic Stock Center, Yale University, and is stored at our laboratory in 15% glycerol at -80°C. It seems likely that the additional IS transpositions arose after separation from the sequenced MG1655, but prior to arrival in our laboratory, probably during storage on agar slants at room temperature, a condition known to promote IS mobilization [6,7]. These results emphasize that the IS complement of each strain should be characterized prior to experimentation.
Two other E. coli K12 derivatives, TD2 and TD10, contain 16 and 18 additional IS insertions (Table 1), respectively. The two additional insertions found in TD10 are: the IS3-a insert between b0314 and b0315 and the IS1-b insert associated with a deletion between b0319 and b0326. Originally, TD2 and TD10 were constructed by P1 transduction of different lac operons into the Δlac of K12 derivative strain DD320 . The IS insertion differences between these two strains probably arose when sequences flanking the lac operon were cotransduced during strain construction.
Reliability of technique
Theoretically, the number of flanking DNA fragments amplified with each IS-specific primer should equal the number of copies of each IS element in the genome. Also, the location of each IS copy should be identifiable from the two flanking DNA sequences. However, some copies of IS elements 1, 2, 3 and 5 were initially located by a single flanking sequence only. DNA fragments not recovered may have been masked by fragments of similar size, amplified from other genomic copies of the IS element. This is evidenced by bands in ethidium-stained agarose gels appearing broader and/or staining more intensely (see Fig. 2). While these bands produce clearly readable sequence in the ISs themselves, their flanking sequences are unreadable or show high noisy background, indicating the presence of multiple fragments of similar size (data not shown). In the case where flanking sequences were readable, we located one of the fragments – presumably the one that was amplified most efficiently.
Despite missing fragments, vectorette PCR provides a reliable estimate of the copy number of elements in a genome. Let the number of copies of the ith IS element be ni, and the number of unidentified flanking sequences be ui. Then the probability that an IS copy is not identified is simply a product of the probabilities of not obtaining either the A-side or the B-side sequences, qi = (ui/ni)A-side·(ui/ni)B-side. The expected number, x, of missing copies is determined by summing over all ni copies of each of the j = 8 elements in MG1655. Our data provide an estimate of expected missing copies. In fact, only 1 copy was missed entirely. Even when digested by just a single four-cutter restriction enzyme, vectorette methodology is highly reliable with small error rates: 6.8% expected and 2.7% realized.
The actual error rates are even smaller. Our analysis is restricted to the 37 ISs found in the genomic sequence of MG1655; the 7 additional ISs in CGSC6300 were not used in the calculations even though they may serve to mask fragments and thereby increase the expected and observed error rates.
To determine the reliability of the technique when there are many more than 11 copies of an IS element in a genome requires estimating m, the maximum number of amplified fragments likely to be resolved per lane by agarose minigel electrophoresis. Only a small portion of the resolving power of an agarose gel is actually used because approximately 98% (approximately because the calculation assumes equal base frequencies) of amplified fragments produced by a 4-base cutter restriction enzyme are less than 1 kb (excluding the IS and the vectorette). Hence, m is less than the maximum number of fragments physically capable of being resolved by agarose minigel electrophoresis.
Consider m as the number of discrete positions that an amplified fragment might occupy. The probability that a particular position is not occupied given ni copies of an IS element i is . The expected number of unoccupied positions is and the expected number of occupied positions (i.e. bands visualized) is . Use fi as an estimate of the number of amplified fragments identified by sequencing. Nonlinear regression of fragments identified, , against the number of known genomic copies, ni, yields an estimate of m = 11.64 ± 1.79 (Fig. 3A). As a practical matter, no more than a dozen amplified fragments is ever likely to be resolved by agarose minigel electrophoresis when a four-cutter restriction enzyme is used to digest genomic DNA.
Figure 3. Estimation of IS flanking DNA likely to be resolved and missed. A. The maximum number of fragments likely to be resolved, m, can be estimated by plotting the number of bands observed against the genomic copy number. Only a finite number of bands can be visualized on a gel. Consequently, the likelihood that two amplified fragments comigrate increases with the number of IS copies in the genome. B. The number of amplified flanking sequences likely to be missed rapidly increases when 10 or more bands are visualized. Genomic digests with a single restriction enzyme should be restricted to IS elements with fewer than 10 copies per genome. Genomes with more than 10 copies of an IS element should be screened using high resolution agarose gels and/or using a second restriction enzyme to allow all IS copies to be identified.
Summing the expectations for missing A-side and B-side fragments (i.e. amplified fragments not identified by sequencing) for the j = 8 species of IS elements in MG1655 yields which is slightly larger than the 17 known masked fragments from MG1655 (each marked with an asterisk in Table 1). The probability that an IS copy is not identified is , where the prime designates that this expectation is based on an ability to resolve a maximum of m = 12 fragments per lane. The expected number of missing IS copies is , which is only slightly larger than the direct estimate x = 2.54. We conclude that the model provides a robust fit.
A plot of against ni (Fig. 3B) reveals that the number of missing fragments increases rapidly with the number of genomic copies. With ni = 20 = 5 copies (25%,) remain undetected, and even with ni = 10, = 1 (10%) is expected to be overlooked. To avoid underestimating the number of copies of a highly repeated element, we recommend digesting genomic DNA with a different restriction enzyme and repeating vPCR and sequencing. By using another four-base cutter restriction enzyme Bst UI, we identified all flanking sequences not recovered with the enzyme Rsa I for IS1, IS2, IS3, and IS5, as showed in Fig. 4 for IS2. Larger, temperature controlled high resolution agarose gel electrophoresis apparatus available in some laboratories would also improve resolution of the technique.
Figure 4. PCR amplification of IS2 flanking DNA from genomic DNA digested with Bst UI. Flanking DNA fragments IS2-3A and IS2-6A (left hand side) and IS2-2B (right hand side), masked by other amplified fragments when genomic DNA was digested with Rsa I (see Fig. 2), were recovered with Bst UI.
It is apparent that IS complements differ among very closely related laboratory E. coli K12 derivatives MG1655, CGSC6300, TD2 and TD10. The rapidity with which these differences have evolved suggests that ISs may play important roles in experimental evolution. Indeed, adaptation by E. coli to novel laboratory environments is often characterized by IS element mobilization [4,19-22]. Using vPCR will provide these workers with a comprehensive view of genomic reorganization during laboratory evolution. Using this method, we characterized IS elements in 40 isolates which evolved from TD2 and TD10 during chemostats and found a number of IS-mediated gene deletions, duplications and transpositions (unpublished data).
Surveys of natural isolates of E. coli reveal that the numbers and locations of IS elements differ widely among closely related strains, suggesting a brisk turnover of IS elements within and among host lineages [6,23-25]. Comparisons of E. coli genomic sequences confirm that IS elements are commonly associated with chromosomal rearrangements within lineages [17,26,27]. The ability to rapidly and accurately determine the IS complement of the genomes of natural isolates is not only desirable from a population genetic standpoint, but vPCR might also facilitate rapid typing of epidemiological outbreaks of pathogens otherwise indistinguishable from related strains. In this regard it is worth noting that IS sequences are highly conserved compared with most E. coli housekeeping genes . This will greatly aid using vPCR to type strains because only 1 pair of primers is needed for each type of IS element.
This study shows that vPCR is a swift, efficient, reliable method for typing microbial strains and identifying and mapping IS insertion sites present in microbial genomes. Flanking DNA sequences from 36 of the 37IS elements in the E. coli strain MG1655 were recovered by vPCR and confirmed by DNA sequencing. Unlike Southern hybridization and iPCR, our approach involves only one genomic digest and one ligation step. Vectorette PCR is then used to simultaneously amplify all IS elements of a given type, making vPCR a rapid and sensitive means to survey IS elements in genomes.
Three derivatives of the K12 isolate were used in this study. Strain CGSC6300, obtained from E. coli genetic Stock Center, Yale University, was used as a control because it is closely related to MG1655 whose entire genome has been sequenced . TD2 and TD10 (derivatives of DD320, itself a K12 derivative) are routinely used in our experiments in molecular evolution .
Genomic DNA was isolated from overnight culture in LB medium using DNAeasy DNA isolation kit (Qiagen, Valencia, CA, USA).
The vectorette unit was made using the protocol of Botstein lab http://genome-www.stanford.edu/group/botlab/protocols/vectorette.html webcite. The two anchor bubble primers
were synthesized by the Advanced Genetic Analysis Center at The University of Minnesota, St. Paul. To anneal bubble primers, 4 μM of each primer (in ddH2O) were combined in a total volume of 100 μl. The mixture was incubated at 65°C for 5 minutes, and then MgCl2 was added to a final concentration of 1–2 mM before cooling down to room temperature.
DNA digestion and ligation of vectorette units
Genomic DNA from each strain was digested using the restriction enzyme Rsa I to produce small, blunt-ended fragments (Fig. 1). The enzyme is a four-base cutter and has 0 to 3 restriction sites within open reading frames (orf) of the eight insertion sequences (IS1, IS2, IS3, IS4, IS5, IS30, IS150, IS186), but does not cut at the extreme ends of each orf. This allows for the design of outward primers to amplify the IS flanking sequence for both sides (see below). Digestion was carried out at 37°C overnight in a 50 μl reaction containing 1 × NEBbuffer (No. 1), 0.5 μg DNA and 10 units of Rsa I. After digestion, 2 μl of anchor bubble unit, 1 μl of 10 mM ATP and 1 unit of T4 DNA ligase (New England Biolabs, Beverly, MA) were added and the reaction was incubated for 5 cycles at 20°C for one hour followed by 37°C for 30 min.
Primers and PCR amplification
Outward primers (Table 2) from each end of the 8 IS sequences were designed and used for PCR amplification in combination with the vectorette primer (5' CGAATCGTAACCGTTCGTACGAGAATCGCT 3') (Fig. 1). The distance between an IS-specific primer position and the extreme end of the IS orf ranged from 16 to 184 bp, which facilitated identifying IS-associated PCR products from DNA sequences. PCR reactions were carried out using Qiagen Multiplex PCR kit (Qiagen, Valencia, CA, USA). Each reaction contained 1 × Qiagen Multiplex PCR Master Mix, 0.2 μM of outward IS primer and vectorette primer and 2 ng of DNA templates (Rsa I-digested DNA ligated with vectorettes). PCR cycling conditions were 95°C for 15 min, 35 cycles of 94°C for 30 s, 60°C for 90 s, 72°C for 2 min, and a final extension step at 72°C for 10 min. The amplified products were separated in 1.4% agarose gel, stained with ethidium bromide and visualized under UV light. DNA bands were excised and purified with Qiagen DNA Gel Extraction Kit (Qiagen, Valencia, CA, USA).
Table 2. Primers used for identification of ISs using vectorette PCR
DNA sequencing and analysis
DNA sequencing analysis was carried out on both DNA strands by the AGAC, University of Minnesota, using an IS-specific primer and the vectorette primer. DNA sequences were subjected to BLAST searches against the MG1655 genome sequence.
List of abbreviations
IS: insertion sequences; iPCR: inverse polymerase chain reaction, vPCR: vectorette polymerase chain reaction
SZ designed and performed the molecular experiments and prepared the manuscript. AMD provided scientific input and prepared the manuscript. All authors read and approved the final manuscript.
We gratefully acknowledge the thorough constructive criticism of an anonymous reviewer that helped improve this manuscript so very much. This study is supported by grants from National Institute of Health (to AMD).
Genetica 1994, 93:5-12. PubMed Abstract
Proc Natl Acad Sci USA 1984, 81:4500-4. PubMed Abstract
Riley J, Butler R, Ogilvie D, Finniear R, Jenner D, Powell S, Anand R, Smith JC, Markham AF: A novel, rapid method for the isolation of terminal sequences from yeast artificial chromosome (YAC) clones.
Nucleic Acids Res 1990, 18:2887-2890. PubMed Abstract
PCR Methods Appl 1991, 1:39-42. PubMed Abstract
Groenen PM, Garcia E, Thoelen R, Aly M, Schoenmakers EF, Devriendt K, Fryns JP, Van de Ven WJ: Isolation of cosmids corresponding to the chromosome breakpoints of a de novo autosomal translocation, t(6;19)(p21;q13.1), in a patient with multicystic renal dysplasia.
Cytogenet Cell Genet 1996, 75:210-5. PubMed Abstract
BioTechniques 1996, 21:526-34. PubMed Abstract
Blattner FR, Plunkett G, Bloch CA, Perna NT, Burland V, Riley M, Collado-Vides J, Glasner JD, Rode CK, Mayhew GF, Gregor J, Davis NW, Kirkpatrick HA, Goeden MA, Rose DJ, Mau B, Shao Y: The complete genome sequence of Escherichia coli K12.
Theor Popul Biol 1986, 30:1-16. PubMed Abstract
Hayashi T, Makino K, Ohnishi M, Kurokawa K, Ishii K, Yokoyama K, Han CG, Ohtsubo E, Nakayama K, Murata T, Tanaka M, Tobe T, Iida T, Takami H, Honda T, Sasakawa C, Ogasawara N, Yasunaga T, Kuhara S, Shiba T, Hattori M, Shinagawa H: Complete genome sequence of enterohemorrhagic Escherichia coli O157:H7 and genomic comparison with a laboratory strain K12.
DNA Res 2001, 8:11-22. PubMed Abstract
Jin Q, Yuan Z, Xu J, Wang Y, Shen Y, Lu W, Wang J, Liu H, Yang J, Yang F, Zhang X, Zhang J, Yang G, Wu H, Qu D, Dong J, Sun L, Xue Y, Zhao A, Gao Y, Zhu J, Kan B, Ding K, Chen S, Cheng H, Yao Z, He B, Chen R, Ma D, Qiang B, Wen Y, Hou Y, Yu J: Genome sequence of Shigella flexneri 2a: insights into pathogenicity through comparison with genomes of Escherichia coli K12 and O157.