Open Access Highly Accessed Open Badges Research article

Transcriptome deep-sequencing and clustering of expressed isoforms from Favia corals

Shaadi F Pooyaei Mehr12*, Rob DeSalle2, Hung-Teh Kao3, Apurva Narechania2, Zhou Han4, Dan Tchernov5, Vincent Pieribone4 and David F Gruber126

Author Affiliations

1 The Graduate Center, Molecular, Cellular and Developmental Biology, City University of New York, New York, NY 10065, USA

2 American Museum of Natural History, Sackler Institute of Comparative Genomics, New York, NY 10024, USA

3 Department of Psychiatry and Human Behavior, Division of Biology and Medicine, Warren Alpert Medical School, Brown University, Providence RI 02912, USA

4 John B. Pierce Laboratory, Cellular and Molecular Physiology, Yale University, New Haven, CT 06519, USA

5 Marine Biology Department, The Leon H. Charney School of Marine Sciences, University of Haifa, Mount Carmel, Haifa 31905, Israel

6 Department of Natural Sciences, City University of New York, Baruch College, Box A-0506, 17 Lexington Avenue, New York, NY 10010, USA

For all author emails, please log on.

BMC Genomics 2013, 14:546  doi:10.1186/1471-2164-14-546

Published: 12 August 2013



Genomic and transcriptomic sequence data are essential tools for tackling ecological problems. Using an approach that combines next-generation sequencing, de novo transcriptome assembly, gene annotation and synthetic gene construction, we identify and cluster the protein families from Favia corals from the northern Red Sea.


We obtained 80 million 75 bp paired-end cDNA reads from two Favia adult samples collected at 65 m (Fav1, Fav2) on the Illumina GA platform, and generated two de novo assemblies using ABySS and CAP3. After removing redundancy and filtering out low quality reads, our transcriptome datasets contained 58,268 (Fav1) and 62,469 (Fav2) contigs longer than 100 bp, with N50 values of 1,665 bp and 1,439 bp, respectively. Using the proteome of the sea anemone Nematostella vectensis as a reference, we were able to annotate almost 20% of each dataset using reciprocal homology searches. Homologous clustering of these annotated transcripts allowed us to divide them into 7,186 (Fav1) and 6,862 (Fav2) homologous transcript clusters (E-value ≤ 2e-30). Functional annotation categories were assigned to homologous clusters using the functional annotation of Nematostella vectensis. General annotation of the assembled transcripts was improved 1-3% using the Acropora digitifera proteome. In addition, we screened these transcript isoform clusters for fluorescent proteins (FPs) homologs and identified seven potential FP homologs in Fav1, and four in Fav2. These transcripts were validated as bona fide FP transcripts via robust fluorescence heterologous expression. Annotation of the assembled contigs revealed that 1.34% and 1.61% (in Fav1 and Fav2, respectively) of the total assembled contigs likely originated from the corals’ algal symbiont, Symbiodinium spp.


Here we present a study to identify the homologous transcript isoform clusters from the transcriptome of Favia corals using a far-related reference proteome. Furthermore, the symbiont-derived transcripts were isolated from the datasets and their contribution quantified. This is the first annotated transcriptome of the genus Favia, a major increase in genomics resources available in this important family of corals.

K-mer; Contig; Open reading frame; Fluorescent protein; Blast; Clustering; High-throughput sequencing; Illumina paired-end; Coral