Composition and organization of active centromere sequences in complex genomes
1 Genome Biology Group, Duke Institute for Genome Sciences & Policy, Duke University, Durham, NC, USA
2 Present address: Center for Biomolecular Science and Engineering, University of California, 501 Engineering 2 Building, Mailstop CBSE/ITI, UC Santa Cruz, 1156 High Street, Santa Cruz, CA, 95064, USA
BMC Genomics 2012, 13:324 doi:10.1186/1471-2164-13-324Published: 20 July 2012
Additional file 1:
Figure S1. Characterization of canine pericentromeric satellite families. (a) Locations of the eleven largest satellite families in the assembly are highlighted relative to 39 canine chromosomes, using the color code indicated in the figure. Each tile represents 10 kb of satellite sequence. Pericentromeric regions (defined as 2 Mb proximal to each centromere gap) are shown in gray. Open arrowheads indicating sites of pericentromere satellite enrichment, closed arrowheads indicate sites of CarSat1 and/or Sat1CF enrichment. (b) Satellite families in pericentromeric regions of the assembly are extensively represented in unmapped contigs (chrUn). Each tile equals a 100 kb bin of satellite sequence. (c) CarSat1 (red signals) and Sat1CF (blue signals) sequence hybridization to canine (MDCK) chromosome spreads show primary pericentromeric localization of both satellite families. Overlap of the two colors at some centromeres appears as a white signal. Two chromosomes (the X chromosomes, indicated by arrows) do not contain detectable CarSat1 or Sat1CF. (d) The physical sequence distance, or relative frequency of paired-reads connections, between the eleven largest satellite families are indicated, using the color code indicated in the figure. Size of each ball corresponds to the relative representation of each family in the genome. Lines represent at least 10 paired reads; bold lines represent >1000 paired reads. Additional file 1: Figure S2: CENP-A antibody to MDCK cells. Canine CENP-A was detected using mouse anti-centromere protein A (CENP-A) monoclonal antibody designed for human CENP-A (a.a. 3–19); (Stressgen; KAM-CC006) by immunoblotting (a), with canine CENP-A (XP_532899.2; ~16kD) shown relative to human CENP-A (NP_001800; ~17kD) compared to loading controls. CENP-A antibody is shown by immunofluorescence (FITC/green) to localize to dog (MDCK) centromeres and colocalize with centromeric satellite family CarSat1 (RHOD/red) (b). Figure S3: Identifying enrichment patterns in satellite transposable element junctions in CarSat1 satellite families. Relative enrichment scores of satellite-transposable element junction sequences are shown in a xy plot from two comparisons with genomic background. Those enrichment patterns that fall below log transformed enrichment value of 2 are shown in shaded box. Remaining single copy (shown as stars) and multi-copy (boxes) transposable element junctions for SINE (red), LINE (blue), and LTR (black) are provided. Additional file 1: Figure S4. Read Subtype assignments by k-means clustering of 200 bp sliding window. All CarSat1 reads reformatted relative to identified consensus sequence (737 bp; as determined from consensus bases from all assembled CarSat1 monomers (canFam2.0)). Reads were further divided into six 200 bp windows with 100 bp overlap/slide. Sequence windows were assigned to clusters using k-means (see Methods) and reads were relabeled as ordered clusters and sorted accordingly. Reads containing minimally four windows are shown above; demonstrating the clustering subgroups defined in paper Figure 3. Additional file 1: Figure S5. MNase digestion for Chromatin IP protocol, demonstrating that mono- and di- nucleosomes are enriched within this study. Lane 1 contains size markers, with appropriate bands (bp) and predicted sites of nucleosome-sized DNA indicated. Lane 2 contains MNase-digested input DNA used in this study.
Format: DOCX Size: 7.2MB Download file
Additional file 2:
Table S1. Global satellite descriptions and relative abundance and location in the canFam2.0 assembly.
Format: XLSX Size: 50KB Download file
Additional file 3:
Table S2. Satellite genomic distribution assignments in the canFam2.0 assembly. Column header information is defined as follows: chr, CanFam2.0 chromosome; chrS, chromosome start position; chrE, chromosome end position; bp_span, the length of the repeat unit (chrE-chrS); satellite name, the canine satellite name either assigned by RepBase, GenBank, or this study; tile_color, color assignments for each family as illustrate in Circos image (Additional file1: Figure S1a,b); type, either pericentromeric, or located within a 2 Mb window of a chromosome centromere gap, or ‘na’ if found within the chromosome arms or and unmapped assembled contig (chrUn).
Format: PDF Size: 2.9MB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 4:
Table S3. Paired read data between abundant (estimated ≥100 kb) satellite families.
Format: XLSX Size: 54KB Download file
Additional file 5:
Table S4. Annotation of centromeric associated unmapped contigs.
Format: XLSX Size: 149KB Download file
Additional file 6:
Table S5. Distribution of centromeric transposable elements. Repeat element representation for each centromeric satellite family, describing relative proportions of each repeat family and overall contribution to array. (PDF 37 kb)
Format: PDF Size: 38KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 7:
Table S6. Centromeric satellite family repeat class enrichment estimates.
Format: XLSX Size: 51KB Download file