Schematic of database architecture. The diagram describes the database schema behind GDP. For convenience, it is organized into four major areas (color coded), (A) Datasets, samples and raw data (green), (B) statistics and annotations (orange), (C) convenience or lookup tables (red) and (D) data imported from MGI (green). Tables are populated in four steps. First, raw normalized intensity values (table name: raw data), samples and probes are added. Second, the analyzed data (statistics, comparisons) and groupings (describing the datasets such as molecular ONH and tissue ONH) are loaded. Third, a previously generated design file allows the associations between samples and groups to be established (sampletogroups). Finally, MGI annotations (all symbols, synonyms, markers, human_orthologs and probeset_to_mgid) are loaded and the convenience look up tables (probecounts, representativeprobes and ave_qvalue) established. Within each table, required columns are indicated. VARCHAR indicates a string is required, and the number in brackets indicates the number of characters allowed in that string. Lines indicated connections between tables.
Howell et al. BMC Genomics 2011 12:429 doi:10.1186/1471-2164-12-429