Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

Open Access Highly Accessed Research article

Small RNA pyrosequencing in the protozoan parasite Entamoeba histolytica reveals strain-specific small RNAs that target virulence genes

Hanbang Zhang1, Gretchen M Ehrenkaufer1, Neil Hall2 and Upinder Singh134*

Author Affiliations

1 Division of Infectious Diseases, Department of Internal Medicine, Stanford University School of Medicine, Stanford, California, 94305-5107, USA

2 School of Biological Sciences, Biosciences Building, University of Liverpool, Crown Street, Liverpool, L69 7ZB, UK

3 Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, California, 94305-5107, USA

4 Department of Medicine, Division of Infectious Diseases, S-143 Grant Building, 300 Pasteur Drive, Stanford, CA, 94305, USA

For all author emails, please log on.

BMC Genomics 2013, 14:53  doi:10.1186/1471-2164-14-53

Published: 25 January 2013

Additional files

Additional file 1: Figure S1:

Flow-chart for small RNA sequence analysis. The pipeline for processing of the small RNA sequences is listed. Figure S2. The number of loci to which each small RNA maps. The genome mapping file for the E. histolytica HM-1:IMSS small RNA dataset was used to generate the mapping counts for each small RNA in R using base functions. The number of small RNA reads (y-axis) is plotted against counts of their mapped loci (x-axis). Figure S3. Nucleotide frequency at each position for the 17nt, 26nt and 28nt small RNA sequences. A 5-G sequence predominance is evident for the aligned 26nt and 28nt reads but not for 17nt reads when the nucleotide frequency at each position is plotted. Figure S4. Representative supercontig view of the mapped small RNAs. Small RNAs were binned into windows of 500 bp along the supercontig. The counts of small RNA reads (y-axis) were plotted against a normalized supercontig length of one (x-axis). Three major patterns were seen for the graphs of the binned distributions. (A) Abundant small RNAs from clusters with several hot areas; these are mostly for the 19 supercontigs with ≥5000 small RNAs. (B) Small RNAs largely confined to isolated peaks in supercontigs. (C) Very low numbers of small RNAs in a given supercontig. Figure S5. Expression of protein coding genes with mapped small RNAs, using different cutoffs (no cutoff, ≥10, ≥25 and ≥50 small RNAs mapping to the gene). We plotted the microarray expression value for three sets of protein coding genes: those with only antisense small RNAs (AS only); those with both antisense and sense small RNAs (AS + S); those with only sense small RNAs (S only). Using both the ≥25 and ≥50 small RNA cutoffs, we observed significantly lower expression values among genes with AS or AS + S small RNAs. The number of genes for each category are listed. Figure S6. The density of small RNAs on paired or clustered genes and associated intergenic regions. Box-and-whisker plots showing small RNA density (small RNA/bp) on paired or clustered genes vs. intergenic regions between genes. The top and bottom ends of each box represent the 75th and 25th percentile, respectively; the middle line represents the median value 0.54 (paired genes) vs. 0.12 (intergenic regions), p-value < 2.2e-16. Figure S7. Unusual mapping patterns for protein coding genes with only sense small RNAs. Genome browser view for EHI_189510 and EHI_070670, showing sense small RNAs as either having an abrupt boundary (EHI_189510) or crossing into the adjacent intergenic region (EHI_070670). Black arrow represents the predicted gene, red and blue bars represent mapped small RNAs; both are sense to genes. Figure S8. Biochemical analysis of small RNAs for genes with both antisense and sense small RNAs. (A) Antisense and sense small RNAs mapped to a region containing two annotated genes (EHI_130480 and EHI_130490, arrows) and one potential unannotated gene (red: small RNA mapped to upper strand, blue: small RNA mapped to lower strand). Probes for Northern blot analysis are represented by bars and numbers (black for detecting sense and red for detecting antisense to EHI_130480). (B) Northern blot analysis for small RNAs. Northern blot analysis detects antisense (probe 7) and sense (probe 6) small RNAs. The sense small RNA is resistant to Terminator cleavage assay, indicating that it does not have a 5-monoP structure. (C) Strand-specific RT-PCR detects both sense and antisense transcript for both EHI_130480 and EHI_130490. cDNA was generated using F and R primer (to detect antisense and sense transcript, respectively) as well as oligo dT primer. RT-PCR reveals both antisense and sense transcripts with antisense transcript at lower abundance than sense transcript. Figure S9. Examples of antisense small RNAs found at both exon-exon junctions and introns to the same gene. Genome browser view for EHI_197360 and EHI_135940, showing antisense small RNAs can map both to introns and exon-exon junctions of the same gene. Black arrow represents the predicted gene, with their exons represented by blue bars. Red bars represent mapped small RNAs with direction from left to right (antisense to both genes). Green arrows point to introns, and exon-exon junction small RNA reads are broken red bars connected with lines. Figure S10. Small RNA size distribution for small RNAs mapping to structural RNAs and repetitive elements. The small RNA length distributions for small RNAs that map to tRNAs (grey), rRNAs (red) and repetitive elements (blue) are shown. A 27nt peak is evident for repetitive element reads but not for the structural RNAs. The “tailed” 17nt peaks seen for all three plots are most likely non-specific degradation from highly expressed transcripts. Figure S11. Nucleotide frequency at each position for the 17nt tRNAs and rRNAs, and the 27nt tRNAs. Nucleotide frequency at each position was plotted: no 5-G sequence predominance was observed for 17nt rRNAs and 17nt tRNAs; a slight 5-G enrichment was observed for 27nt tRNAs. Figure S12. Nucleotide frequency at each position of LINEs/SINEs mapped 17nt and 27nt sequences. Nucleotide frequency at each position was plotted for the 17nt and 27nt LINE/SINE sequences. There is a clear 5-G sequence predominance observed for the 27nt sequences, but not for the 17nt sequences. Figure S13. Small RNAs mapped to EhLINE1 and Northern blot analysis. The EhRLE5 sequence, which belongs to the EhLINE1 family is used as an example to show small RNAs that map to repetitive elements. Upper panel: red, small RNA mapped to upper strand; blue, small RNA mapped to lower strand; long arrow, the complete EHRLE unit with arrow showing the RT transcription direction. The black bar is the position of selected EhRLE5 probe. Lower panel: Northern blot analysis revealed distinct bands at ~30nt size using probes selected for EhRLE5 and one locus of EHLINE1 (DS571716:427–2922). Figure S14. Small RNA mapping to genome duplication segment D1. Genome browser view for genome duplication segment D1 (red, small RNAs mapped to upper strand; blue, small RNAs mapped to lower strand). Annotated genes are shown below as dark blue blocks. All annotated genes are mapped with dense small RNAs in this scaffold indicating that the whole segment D1 might be a target of the RNAi pathway.

Format: PPTX Size: 6.6MB Download file

Open Data

Additional file 2: Table S1:

List of top 19 supercontigs that are highly enriched for small RNAs. The genome mapping file for the E. histolytica HM-1:IMSS small RNA dataset was used to generate the mapping counts for each supercontig in R using base functions. The supercontig number, size and number of mapped small RNAs are listed. Table S2. Analysis of paired or clustered protein coding genes that have small RNAs mapping to them. Listed features include gene name, sRNA orientation, contig number, whether genes are paired or clustered, genomic duplication segments, proximity to repeat regions, orientation (divergent/convergent/tandem), gene length, intergenic distance between paired/clustered genes, number of sRNAs mapped to each gene, small RNA density on the gene, number of sRNAs mapped to the intergenic region, and the sRNA density on the intergenic region. Table S3. Small RNA reads map to exon-exon junction and intron from HM-1:IMSS EhAGO2-2 IP small RNA library. Listed are numbers of unique small RNA reads in each category. Table S4. Small RNA density on EHI_135940 and EHI_197360. Listed are the number of small RNA mapped to exons and introns and the calculated small RNA density on these regions for each gene. Table S5. A global assessment of small RNA regulated genes in E. histolytica. Genes to which small RNAs mapped in either the antisense or sense and antisense orientation were analyzed for their expression data using previously published microarray data. The number of genes on the Affymetrix microarray and those with normalized array data <0.2 (not expressed) under wild type conditions for E. histolytica HM-1:IMSS are listed. The number of genes not expressed under all conditions tested, those expressed in other E. histolytica strains (200:NIH and Rahman), those expressed under specific culture conditions (under different drug treatment and serum starvation) and those expressed in developmental stages are listed. Microarray data are adapted from [41-45]. Table S6. Oligonucleotide probes used for Northern blot analysis. The probe name, targeting gene/LINEs, orientation/position of the probe, and sequence of the probe are shown. S: sense; AS: antisense. Table S7. Primers used for strand specific RT-PCR analysis. Primer sequences, the tested genes, and orientation (F/R) are listed.

Format: XLSX Size: 35KB Download file

Open Data

Additional file 3: Table S8:

Small RNAs mapping to protein coding genes containing introns or exon-exon junctions. We downloaded from AmoebaDB both genomic and mRNA gene sequences for all E. histolytica genes with at least one predicted intron. Small RNAs mapping to introns are reads that map to the genomic sequence but not the mRNA sequence. Small RNAs mapping to exon-exon junctions are reads that map to mRNA sequence but not the genomic sequence. Protein coding genes are shown in three categories: genes with only antisense small RNAs; genes with both antisense and sense small RNAs; genes with only sense small RNAs. The number of small RNAs mapping to a gene, intron, or exon-exon junction is indicated. Whether the intron has an in-frame stop codon or frame disruption is listed (Yes/NO). HM-1:IMSS EhAGO2-2 IP small RNA library dataset was used for the analysis. Only genes with ≥50 small RNAs are listed. Highly identical genes are indicated with same letter in column Paralog group.

Format: XLSX Size: 16KB Download file

Open Data