Schematic representation of the GCOD databases. Publicly available gene expression data are downloaded from ArrayExpress or GEO. These CEL and sample annotation files are reprocessed and saved as flat files; the MAS5 normalized, RMA normalized, scaled-RMA expression data, and curated sample annotation data are loaded into an ETL database having a schema in 3rd normal form. There the data are further curated, and then transferred to a QA/QC database having a warehouse schema. In the QA/QC database the data are viewed on our internal web site to assess completeness. The data are then transferred to our GCOD database schema, which is accessed by the GCOD web application. Translation of GenBank and probeset identifiers is done by querying the TGI Resourcerer databases.
Liu et al. BMC Bioinformatics 2011 12:46 doi:10.1186/1471-2105-12-46