Figure 5 .

Principle component analysis of the dinucleotide frequencies of the S. espanaensis CDS. A) Using EDGAR, all CDS from S. espanaensis were divided into three groups: "core" (conserved in all six completely sequenced Pseudonocardiaceae; blue "*"), "other" (shared between S. espanaensis and at least one other Pseudonocardiaceae species; green "x") and singletons ("unique" in S. espanaensis; red " + "). For all genes the relative dinucleotide frequencies were calculated, a PCA was performed using the R package and the results for the two main components are plotted. In addition, the median values for all three distributions were calculated and plotted. (B) Using the same calculation as in A, the genes were divided in relation to their position in the genome relative to the origin of replication. Genes close to the oriC (corresponding to the "top half" of the genome) are given as red "x", genes closer to the terminus ("bottom half" of the genome) are depicted as green " + ". Median points are denoted as black "*" and " + ", green and black circles mark the 90% boundaries.

Strobel et al. BMC Genomics 2012 13:465   doi:10.1186/1471-2164-13-465
