Open Access Highly Accessed Research article

Efficient assembly and annotation of the transcriptome of catfish by RNA-Seq analysis of a doubled haploid homozygote

Shikai Liu12, Yu Zhang1, Zunchun Zhou1, Geoff Waldbieser3, Fanyue Sun1, Jianguo Lu1, Jiaren Zhang1, Yanliang Jiang1, Hao Zhang1, Xiuli Wang1, KV Rajendran1, Lester Khoo4, Huseyin Kucuktas1, Eric Peatman1 and Zhanjiang Liu1*

Author Affiliations

1 The Fish Molecular Genetics and Biotechnology Laboratory, Department of Fisheries and Allied Aquacultures and Program of Cell and Molecular Biosciences, Aquatic Genomics Unit, Auburn University, Auburn, AL, 36849, USA

2 The Shellfish Genetics and Breeding Laboratory, Fisheries College, Ocean University of China, Qingdao, 266003, P.R. China

3 USDA, ARS, Catfish Genetics Research Unit, 141 Experiment Station Road, Stoneville, Mississippi, 38776, USA

4 College of Veterinary Medicine, Mississippi State University, 127 Experiment Station Road, Stoneville, Mississippi, 38776, USA

For all author emails, please log on.

BMC Genomics 2012, 13:595  doi:10.1186/1471-2164-13-595

Published: 5 November 2012

Additional files

Additional file 1:

Table BLASTX annotation of three assemblies from various de novo assemblers. Three assemblies, generated from CLC Genomics Workbench, ABySS, and Velvet respectively, were blasted against zebrafish RefSeq protein and Uniprot/Swiss-Prot databases, with the E-value cutoff of 1e-10.

Format: PDF Size: 20KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 2:

Figure ORF length distribution for contigs without significant protein hits from public protein database. X-axis represents the predicted ORF length in amino acids, and Y-axis is the number of catfish contigs.

Format: PDF Size: 9KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 3:

Table Results of conserved domain finding for contigs without protein hits by homology search. The predicted ORFs from the contigs without significant BLASTX hits were searched against the NCBI Conserved Domain database using the CD-search tool with the default settings.

Format: PDF Size: 83KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 4:

Figure Schematic presentation of principles for detection of putative catfish gene duplicates. The reconstructed transcripts from protein-coding genes that show signs of “SNPs” (PSVs/MSVs) can be assembled by short reads from duplicated genes.

Format: PDF Size: 90KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 5:

Table Detection of putative gene duplicates in catfish and comparison with the preliminary catfish genome assembly. The catfish gene duplicates were detected as the ones showing signs of PSVs/MSVs, and the evaluation of these duplicated genes was achieved by comparing with the preliminary catfish genome assembly.

Format: PDF Size: 89KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 6:

Figure Sequence contexts around stop codon of previously identified full-length cDNAs. The sequence contexts surrounding the stop codon of 1,087 previously identified full-length cDNAs were illustrated using WebLogo.

Format: PDF Size: 121KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 7:

Table Summary of sub-assemblies statistics for various sequencing read depths. The sequencing data were sub-sampled into several different sequencing read depths including 12 million, 24 million, 48 million, 124 million, 182 million, 258 million, and 308 million reads. These sub-datasets were assembled using CLC Genomics Workbench to evaluate the effect of sequencing read depth on catfish transcriptome assembly.

Format: PDF Size: 84KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 8:

Table Summary of sub-assemblies compared with zebrafish proteins using TBLASTN. The sub-assemblies assembled from reads of several different sequencing read depths were assessed for the number of genes covered by comparing with NCBI zebrafish RefSeq proteins with the E-value cutoff of 1e-10.

Format: PDF Size: 148KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data