Open Access Highly Accessed Research article

The word landscape of the non-coding segments of the Arabidopsis thaliana genome

Jens Lichtenberg1*, Alper Yilmaz2, Joshua D Welch1, Kyle Kurz1, Xiaoyu Liang1, Frank Drews1, Klaus Ecker1, Stephen S Lee3, Matt Geisler4, Erich Grotewold2 and Lonnie R Welch156

Author Affiliations

1 Bioinformatics Laboratory, School of Electrical Engineering and Computer Science, Ohio University, Athens, Ohio, USA

2 Department of Plant Cellular and Molecular Biology, Plant Biotechnology Center, The Ohio State University, Columbus, Ohio, USA

3 Department of Statistics, University of Idaho, Moscow, Idaho, USA

4 Department of Plant Biology, Southern Illinois University, Carbondale, Illinois, USA

5 Biomedical Engineering Program, Ohio University, Athens, Ohio, USA

6 Molecular and Cellular Biology Program, Ohio University, Athens, Ohio, USA

For all author emails, please log on.

BMC Genomics 2009, 10:463  doi:10.1186/1471-2164-10-463

Published: 8 October 2009

Additional files

Additional file 1:

Words discovered in 3'UTRs. Entire set of words discovered in the 3'UTRs with occurrences, expected occurrences, scores, reverse complement information and p-value.

Format: CSV Size: 5.5MB Download file

Open Data

Additional file 2:

Words discovered in 5'UTRs. Entire set of words discovered in the 5'UTRs with occurrences, expected occurrences, scores, reverse complement information and p-value.

Format: CSV Size: 5.4MB Download file

Open Data

Additional file 3:

Words discovered in introns. Entire set of words discovered in the introns with occurrences, expected occurrences, scores, reverse complement information and p-value.

Format: CSV Size: 5.6MB Download file

Open Data

Additional file 4:

Words discovered in core promoters. Entire set of words discovered in the core promoters [-100;+1] with occurrences, expected occurrences, scores, reverse complement information and p-value.

Format: CSV Size: 5.4MB Download file

Open Data

Additional file 5:

Words discovered in proximal promoters. Entire set of words discovered in the proximal promoters [-1,000;-101] with occurrences, expected occurrences, scores, reverse complement information and p-value.

Format: CSV Size: 5.7MB Download file

Open Data

Additional file 6:

Words discovered in distal promoters. Entire set of words discovered in the distal promoters [-3,000;-1,001] with occurrences, expected occurrences, scores, reverse complement information and p-value.

Format: CSV Size: 5.8MB Download file

Open Data

Additional file 7:

Words discovered in entire genome. Entire set of words discovered in the complete genome with occurrences, expected occurrences, scores, reverse complement information and p-value.

Format: CSV Size: 4.1MB Download file

Open Data

Additional file 8:

Words missed in 3'UTRs. Entire set of words expected to occur but not discovered in the 3'UTRs with expected occurrences.

Format: CSV Size: 10KB Download file

Open Data

Additional file 9:

Words missed in 5'UTRs. Entire set of words expected to occur but not discovered in the 5'UTRs with expected occurrences.

Format: CSV Size: 13KB Download file

Open Data

Additional file 10:

Words missed in introns. Entire set of words expected to occur but not discovered in the introns with expected occurrences.

Format: CSV Size: 2KB Download file

Open Data

Additional file 11:

Words missed in core promoters. Entire set of words expected to occur but not discovered in the core promoters with expected occurrences.

Format: CSV Size: 10KB Download file

Open Data

Additional file 12:

Word based clusters. Word-based clusters built around 2 overrepresented words of each non-coding segment of Arabidopsis thaliana represented by the word cluster and the sequence logo associated with said cluster. A word in a word cluster is presented through the nucleotide sequence associated with the word, the sequence count, the overall count and the SlnSES score.

Format: DOC Size: 376KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data

Additional file 13:

Word co-occurrences in 3'UTRs. Entire set of co-occurring words (taken from the top 25 words) discovered in the 3'UTRs with occurrence, expected occurrences and scores.

Format: CSV Size: 38KB Download file

Open Data

Additional file 14:

Word co-occurrences in 5'UTRs. Entire set of co-occurring words (taken from the top 25 words) discovered in the 5'UTRs with occurrence, expected occurrences and scores.

Format: CSV Size: 42KB Download file

Open Data

Additional file 15:

Word co-occurrences in introns. Entire set of co-occurring words (taken from the top 25 words) discovered in the introns with occurrence, expected occurrences and scores.

Format: CSV Size: 31KB Download file

Open Data

Additional file 16:

Word co-occurrences in core promoters. Entire set of co-occurring words (taken from the top 25 words) discovered in the core promoters with occurrence, expected occurrences and scores.

Format: CSV Size: 43KB Download file

Open Data

Additional file 17:

Word co-occurrences in proximal promoters. Entire set of co-occurring words (taken from the top 25 words) discovered in the proximal promoters with occurrence, expected occurrences and scores.

Format: CSV Size: 64KB Download file

Open Data

Additional file 18:

Word co-occurrences in distal promoters. Entire set of co-occurring words (taken from the top 25 words) discovered in the distal promoters with occurrence, expected occurrences and scores.

Format: CSV Size: 67KB Download file

Open Data

Additional file 19:

NASC Microarrays. Entire set of microarray experiments available in NASC that were used for the cellular functional analysis.

Format: XLS Size: 561KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data