Large synteny blocks revealed between Caenorhabditis elegans and Caenorhabditis briggsae genomes using OrthoCluster
-
* Corresponding author: Nansheng Chen chenn@sfu.ca
Department of Molecular Biology and Biochemistry, Simon Fraser University, 8888 University Drive, Burnaby, B.C., V5A 1S6, Canada
BMC Genomics 2010, 11:516 doi:10.1186/1471-2164-11-516
Published: 24 September 2010Additional files
Additional file 1:
new gene models for C. elegans. gff3 file with the structure of all new genes in C. elegans.
Format: GFF Size: 16KB Download file
Additional file 2:
new genome annotation for C. briggsae. gff3 file with the structure of all genes in the new genome annotation for C. briggsae New genes start with ID CBG5XXXX.
Format: GFF Size: 16.6MB Download file
Additional file 3:
Figure S1 genome view of the perfect synteny blocks between C. elegans and C. briggsae. Each chromosome in C. elegans has a distinctive color. The corresponding synteny blocks in C. briggsae can be mapped to the reference chromosome according to the color. This image was created using OrthoClusterDB http://genome.sfu.ca/orthoclusterdb/ webcite.
Format: PDF Size: 686KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 4:
Figure S2 an example of syntenic tandem gene expansion/contraction. A GST tandem gene cluster in C. elegans has nine genes, while its orthologous region in C. briggsae has four genes.
Format: PDF Size: 693KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 5:
Figure S3 Cumulative distribution of perfect synteny blocks in C. elegans. Black bars represent perfect synteny blocks found using WS180 annotation, while empty bars represent perfect synteny blocks found using improved annotation.
Format: PDF Size: 602KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 6:
Perfect synteny blocks and their corresponding genomic coverage in C. elegans for the improved and the WS180 annotations.
Format: DOC Size: 24KB Download file
This file can be viewed with: Microsoft Word Viewer
Additional file 7:
Figure S4 C. elegans distribution of the number of syntenic blocks as a function of both in-map and out-map mismatches.
Format: PDF Size: 116KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 8:
Figure S5 A new gene model in C. elegans. This new gene model, absent in WS180, was reported independently by WormBase curators in WS190 and found with our methodology.
Format: PDF Size: 702KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 9:
Figure S6 Conserved operon revealed by improved genome annotation. The improved annotation of C. briggsae identified two putative genes, CBG50308 and CBG50462, which are orthologs to the operonic genes C14A4.1 and C14A4.4, that were missing orthologs previous to the application of the gene model improvement procedure.
Format: PDF Size: 712KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 10:
Figure S7 Different types of order and strandedness handled by OrthoCluster. a) Consistent order and consistent strandedness. Blocks A1 in genome G1 and A2 in genome G2 are composed of four genes. The order of the genes within each block is the same, and each pair of genes has the same orientation. b) Consistent order, reversed strandedness. Blocks A1 in genome G1 and A2 in genome G2 are composed of four genes. The order of the genes within each block is the same, but each pair of genes has different orientation. c) Inverted order, consistent strandedness. Blocks A1 in genome G1 and A2 in genome G2 are composed of four genes. The order of the genes within block A1 is inverted with respect to that within block A2, and each pair of genes has the same orientation. d) Inverted order, reversed strandedness. Blocks A1 in genome G1 and A2 in genome G2 are composed of four genes. The order of the genes in block A1 is inverted with respect to that in block A2, and each pair of genes has different orientation. All four cases are found if the user sets -r -s when running OrthoCluster. Cases a) and d) are found only if user sets -rs when running OrthoCluster. For the synteny blocks detected in this work, the parameter -rs was used.
Format: PDF Size: 760KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 11:
Figure S8 input and output data for OrthoCluster. The input of the program consists of the genome annotation for each species (gene name, Chromosome/Contig, Start position, End Position, and Strand) and a correspondence file with the orthologous relationships among genes. The output corresponds to the synteny blocks found. In ths example, there are N genomes and a region of M genes is shown for each one.
Format: PDF Size: 772KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 12:
Figure S9 out-map and in-map mismatches. a) An out-map mismatch. Given the corresponding syntenic regions A1 and A2 in genomes G1 and G2 respectively, A1 contains a gene (shown in white) that has no correspondence in G2. b) An in-map mismatch. Given the corresponding syntenic regions A1 and A2 in genomes G1 and G2 respectively, A1 contains a gene, g5, which has a correspondence in G2, but is distant enough from the other genes conforming A2 so it can not be include within the synteny block. Different numbers of in-map and out-map mismatches can be included in each block by varying the parameters -i, -ip, for in-map mismatches, and -o, -op for out-map mismatches. c) A non-nested synteny block. Blocks A1 and B1 in genome G1 are located in different regions of the genome, and the corresponding regions A2 and B2 in genome G2 are also located in different regions. d) A nested synteny block. Block B1 in genome G1 is fully contained within block A1, but the corresponding syntenic regions B2 and A2 in genome G2 are located in different regions of that genome.
Format: PDF Size: 783KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 13:
out-map mismatches used for gene model improvement. Numbers in parentheses represent the number of unique genes that are associated to each number of mismatches.
Format: DOC Size: 24KB Download file
This file can be viewed with: Microsoft Word Viewer
Additional file 14:
Figure S10 Gene model improvement procedure for the reparation of genes. If the prediction hits a gene, then different procedures are defined depending on the gene been an in-map or out-map gene. If the gene hit is an in-map gene, then we measure the genomic coverage of the in-map gene. If the coverage is greater or equal than the threshold defined, then the prediction is discarded. If the coverage is less than the threshold, then the peptide of the ortholog of g1', g1, is used as query against the genomic span if g1'. If the predictions overlap, then they are discarded. If the predictions do not overlap and g1' is in C. briggsae, then g1' is replaced by o1' and g1''. If g1' is in C. elegans, then its peptide is used as query against the genomic span of g1 - o1 in C. briggsae to determine if those genes can be merged (g1'''). If the prediction hits an out-map gene and the coverage is less than the original gene model, then the prediction is discarded. If the coverage is greater or equal than the original gene model, then o1' is discarded if p1 is located in C. elegans. If p1 is located in C. briggsae, the prediction o1' replaces p1.
Format: PDF Size: 1.2MB Download file
This file can be viewed with: Adobe Acrobat Reader
