Open Access Highly Accessed Research article

Transcriptome analysis of the filamentous fungus Aspergillus nidulans directed to the global identification of promoters

Christopher Sibthorp1, Huihai Wu2, Gwendolyn Cowley1, Prudence W H Wong2, Paulius Palaima1, Igor Y Morozov13, Gareth D Weedall1* and Mark X Caddick1*

Author Affiliations

1 Institute of Integrative Biology, University of Liverpool, Biosciences Building, Crown Street, Liverpool L69 7ZB, UK

2 Department of Computer Science, University of Liverpool, Ashton Building, Ashton Street, Liverpool L69 3BX, UK

3 Department of Biomolecular and Sports Sciences, Faculty of Health and Life Sciences, Coventry University, James Starley Building, Coventry CV1 5FB, UK

For all author emails, please log on.

BMC Genomics 2013, 14:847  doi:10.1186/1471-2164-14-847

Published: 3 December 2013

Additional files

Additional file 1:

Additional study information. Additional text, tables and figures describing whole transcriptome and 5’-end alignments and analyses that were not included in the main text.

Format: PDF Size: 479KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 2:

Whole transcriptome read counts per locus. Raw and normalised (RPKM) read counts for all annotated loci (n = 10827). 6 loci (CADANIAG00010797, CADANIAG00010692, CADANIAG00010663, CADANIAG00010810, CADANIAG00010773, CADANIAG00010682) encoding spliceosomal RNAs and small nucleolar RNAs annotated as on the + strand were manually altered to the - strand based on depth of coverage across the loci.

Format: XLSX Size: 3MB Download file

Open Data

Additional file 3:

Transcripts predicted by whole transcriptome RNA-seq alignments. Text file (in ‘.gtf’ format) containing the locations of putative transcripts predicted from alignment of whole transcriptome RNA-seq libraries of cells grown under different conditions. Instructions to view the data using the integrative genome browser (IGV) software are given in Additional file 1.

Format: ZIP Size: 10MB Download file

Open Data

Additional file 4:

Analysis of novel ncRNA and protein coding genes. Results of searches of novel transcripts against the Pfam and Rfam databases. Each ID in the column ‘transcript_ID’ represents a transcribed region predicted by the Cufflinks software, based on read coverage and its sequence and length are in columns ‘transcript_sequence’ and ‘transcript_length(nt)’. ‘ORF_ID’ indicates the ID of an open reading frame present in the transcript, ‘translated_ORF_sequence’ is its amino acid sequence, ‘proportion_of_transcript_ORF’ indicates the proportion of the putative transcript that is ORF (a high value suggests a real protein coding gene). The following 15 columns descibe the results of a Pfam search, using the translated ORF as the query (see http://pfam.sanger.ac.uk/ webcite for description). The next 15 columns describe the results of an Rfam search, using the putative transcript as the query (see http://rfam.sanger.ac.uk/ webcite for description). The column ‘Since annotated as..’ records cases where the transcript was annotated in a later version of the annotation than the one analysed (red indicates no subsequent annotation).

Format: XLSX Size: 429KB Download file

Open Data

Additional file 5:

Introns. Text file (in ‘.bed’ format) containing the locations of Introns identified by alignment of whole transcriptome RNA-seq libraries of cells grown under different conditions. Instructions to view the data using the integrative genome browser (IGV) software are given in Additional file 1.

Format: BED Size: 6.9MB Download file

Open Data

Additional file 6:

Antisense transcription of annotated loci. Distribution of antisense reads across annotated loci. Raw read counts are shown for all annotated loci longer than 90 bp (n = 10697). 5 loci (CADANIAG00010797, CADANIAG00010663, CADANIAG00010810, CADANIAG00010773, CADANIAG00010682) encoding spliceosomal RNAs and small nucleolar RNAs annotated as on the + strand were manually altered to the - strand. Loci were scored 5’-biased if >40% of reads mapped to the 5’ third and <10% to the 3’ third, 3’-biased if >40% of reads mapped to the 3’ third and <10% to the 5’ third and middle-biased if >75% of reads mapped to the central third. Note that raw read counts were to left, middle and right thirds irrespective of the orientation of the locus. Therefore, for a locus on the positive strand the left third is the 5’ end, while for a locus on the negative strand the left third is the 3’ end.

Format: XLSX Size: 2.2MB Download file

Open Data

Additional file 7:

Analysis of uncapped transcripts. Peaks of read head depth >100 in the 'no treatment' 5’-end library. The peaks should represent the 5′ ends of uncapped transcripts. Many of the coverage peaks identify snoRNAs.

Format: XLSX Size: 22KB Download file

Open Data

Additional file 8:

Putative promoter regions. Text file (in ‘.gtf’ format) containing the locations of putative promoter regions predicted from alignment of 5’-enriched RNA-seq libraries of cells grown under a single growth condition (nitrate as nitrogen source). Instructions to view the data using the integrative genome browser (IGV) software are given in Additional file 1.

Format: GTF Size: 1.7MB Download file

Open Data

Additional file 9:

Putative promoter regions of annotated genes. Genomic locations of genes and their associated transcription start sites (TSS) and putative promoters. TSS are classified as 'tight', 'intermediate' and 'diffuse' by their confidence interval length (CIL) value, the length of the confidence interval around the major peak of read coverage representing the transcription start site (<2nt=’tight’, 2nt < CIL < 4nt=’intermediate’, >4nt=’diffuse’).

Format: XLSX Size: 730KB Download file

Open Data

Additional file 10:

Gene ontology analysis of genes with different promoter types. Gene ontologies (GO) for processes significantly enriched in gene sets associated with ‘tight’, ‘intermediate’ and/or ‘diffuse’ transcription start sites, according to our classification. For each GO category, the ‘cluster frequency’ (of the particular gene set) and ‘background frequency’ (of all annotated genes in the genome) associated with that GO category are reported. The ‘P-value’, Bonferroni-corrected for multiple testing, and ‘FDR’ (false discovery rate) are also reported, followed by colon-separated lists of the genes associated with the GO category and the GO IDs associated with the genes (which may be for the GO category itself, or a more specific sub-category of the broader GO category).

Format: XLSX Size: 442KB Download file

Open Data

Additional file 11:

Motifs that are significantly enriched (calculated by the YMF software) in promoters upstream of 'tight', 'intermediate' and 'diffuse' TSS. The table shows the motif, the number of occurrences of the motif in the set of promoter regions and the Z-score, measuring how much more common the motif is than a random sequence drawn from DNA with similar base frequencies. These are shown for each set of promoters associated with a type of TSS: ‘tight’, ‘intermediate’ and ‘diffuse’. To distinguish real ‘diffuse’ TSS from stochastic noise (to enrich for real promoters), only those within 500 bp upstream of annotated genes were used.

Format: XLSX Size: 55KB Download file

Open Data