Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

Open Access Highly Accessed Methodology article

Consistent annotation of gene expression arrays

Benoît Ballester, Nathan Johnson, Glenn Proctor and Paul Flicek*

Author Affiliations

European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK

For all author emails, please log on.

BMC Genomics 2010, 11:294  doi:10.1186/1471-2164-11-294

Published: 11 May 2010

Additional files

Additional file 1:

Distribution of probes mapped to genomes. Diamonds represents distribution of the numbers probes per number of hits. These are the numbers of all mapped probes per number of mappings once the mapping rules are applied for all human (red), mouse (blue), and rat (black) arrays. The values on the y-axis are the log scaled counts of probes for a number of hits on the genome. The crosses represent the percentage of probes targeting repeat regions, y-axis on the left.

Format: PDF Size: 88KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 2:

Probes mapping to multiple exons boundaries. Example of a probe being aligned on three exons, where the sequence in red corresponds to the probe and blue/black sequences correspond to exons.

Format: PDF Size: 20KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 3:

Genomic distribution of unannotated probesets in EST genes. In contrast to our protein based gene prediction methods, Ensembl predicts EST genes and transcripts by using evidence from ESTs only [27]. As represented by the figure below some of these unannotated probesets could also be linked to EST predicted genes. However, EST gene predictions often overlap with protein coding predictions thus probesets being linked to an EST gene do not necessarily mean a probeset annotation is missing. For example, the un-annotated probeset 1439918_at for the Mouse430_2 array has probes mapping to 6 EST transcripts from the EST gene ENSMUSESTG00000015935, but this probeset has also probes mapping to almost an identical protein-coding gene ENSMUSG00000026790. On the ternary graph, the mappings of un-annotated probesets are almost identical between the protein coding genes and the EST genes with clear concentration of probesets being mapped in non-coding regions. On the other hand, probesets with probes located in the coding regions have less than half of their probes matching the underlying transcript (blue dots) and thus do not satisfy our annotation rules.

Format: PDF Size: 380KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 4:

Comparison of Ensembl and Affymetrix annotations. Affymetrix annotations are compared with Ensembl annotations using external database identifiers cross referenced to the Ensembl gene predictions. For a given external database the percentage of common annotation between Ensembl and Affymetrix are shown.

Format: XLS Size: 20KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 5:

Example Perl script using the Ensembl API. This Perl script extracts probes mapping and probeset annotations for the Mouse Affymetrix array MOE430A, on the chromosome 2 between coordinates 26771920 and 26789448.

Format: PL Size: 4KB Download file

Open Data