Open Access Highly Accessed Methodology article

Rapid gene-based SNP and haplotype marker development in non-model eukaryotes using 3'UTR sequencing

Tyson Koepke1, Scott Schaeffer1, Vandhana Krishnan2, Derick Jiwan1, Artemus Harper1, Matthew Whiting3, Nnadozie Oraguzie3 and Amit Dhingra1*

Author Affiliations

1 Department of Horticulture, Washington State University, Pullman, WA, USA

2 Graduate Program in Bioinformatics and Computational Biology, University of Idaho, ID, USA

3 Horticulture and Landscape Architecture Department, Irrigated Agriculture Research and Extension Center, Washington State University, Prosser, WA, USA

For all author emails, please log on.

BMC Genomics 2012, 13:18  doi:10.1186/1471-2164-13-18

Published: 12 January 2012

Additional files

Additional file 1:

Adaptor sequences for 3'UTR sequencing. Sequences of adaptors used in the 3'UTR sequencing of cDNA. AMID-B is an oligo-dT primer with a biotinylated 5'end. Adaptors AMID-1A to AMID12-B represent complementary oligonucleotide pairs with embedded barcode sequences. Column A is the primer name and B is the sequence.

Format: XLS Size: 24KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 2:

sup2.pl. Custom script used to remove index sequences and rename the header with the appropriate sequence. USAGE: sup2.pl {Reads FASTA format file} {Primers/MIDs/Barcodes with corresponding headers in csv format} {# bases from start of primer to the beginning of the barcode} {New FASTA filename to be written into} Example: Input (fasta file): >1300_8769_5430 length = 258 urnand = JHSK987KJSH2KJHJK8777 AGTCCCCCGGGGTTTAAAGGGGCCCCTTTTAAAAAAGTCGTCAATGCGGT AGTCTGCAAAAAAATTTCCCCCCCCCCGGGGGGGGGGGTAGCCGTATGCA Input (MIDs csv file): Sample1,ATAGTGA Sample2,ATGCATG Output: A fasta file of the remaining sequence after removing the primer/bar code/MID with corresponding header attached as specified in the input "MIDs csv" file.

Format: PL Size: 4KB Download file

Open Data

Additional file 3:

Primers and HRM analysis. The table represents contig number (column B), predicted amplicon length (column C), number of SNPs (column D), forward and reverse primers for each set (column E and F) used for HRM analysis. Included in the table is the Cultivar number of curve profiles (column G), number of Cowiche × Selah curve types (column H) and the Seedling number of curve profiles (column I).

Format: XLS Size: 68KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 4:

Contig sequences. A fasta file containing the 34,620 contigs from NGen v3.0.

Format: FAS Size: 5.5MB Download file

Open Data

Additional file 5:

Filtered SNP report. This table is modified output generated from NGen v3.0 and SeqMan. The contig number and all details about the SNP are given including number of calls for each base at the given position from Columns B-L. Column M is the 5' flanking sequence. Column N is the polymorphism. Column O is the 3' flanking sequence. Columns M and O have been provided to enable rapid analysis of other germplasm.

Format: XLS Size: 1.3MB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 6:

Haplotypes identified in sweet cherry. The table presents different haplotypes identified in each contig. Some contigs have multiple positions indicated as A, B or C positions. Nucleotides corresponding to a given position in an allele are presented. Cells are merged when the differences between alleles are no longer traceable. A questions mark (?) symbolizes incomplete depth for a confirmed call at this base.

Format: XLS Size: 193KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data