An integrated database of Eucalyptusspp. genome project

Nascimento, Leandro Costa; Neto, Jorge Lepikson; Salaza, Marcela Mendes; Camargo, Eduardo Leal Oliveira; Marques, Wesley Leoricy; Gonçalves, Danieli Cristina; Vidal, Ramon Oliveira; Pereira, Gonçalo Amarante Guimarães; Carazzolle, Marcelo Falsarella

doi:10.1186/1753-6561-5-S7-P170

Volume 5 Supplement 7

IUFRO Tree Biotechnology Conference 2011: From Genomes to Integration and Delivery

Poster presentation
Open access
Published: 13 September 2011

An integrated database of Eucalyptusspp. genome project

Leandro Costa Nascimento¹,
Jorge Lepikson Neto¹,
Marcela Mendes Salaza¹,
Eduardo Leal Oliveira Camargo¹,
Wesley Leoricy Marques¹,
Danieli Cristina Gonçalves¹,
Ramon Oliveira Vidal²,
Gonçalo Amarante Guimarães Pereira¹ &
…
Marcelo Falsarella Carazzolle³

BMC Proceedings volume 5, Article number: P170 (2011) Cite this article

2212 Accesses
2 Citations
Metrics details

Background

The species of the genus Eucalyptus are the most planted for the fiber crop in the world. They are mainly utilized for timber, pulp and paper production. Brazil, helped by the favorable weather conditions, appears as a big producer and exporter of eucalyptus derivates. In 2002, the Brazilian network research of the Eucalyptus Genome (Genolyptus) was established with the goal of integrating several academic and private institutions currently working with eucalyptus genomics in Brazil. This project generated around 200.000 ESTs from several tissues and conditions. Consequently, several individual projects have been implemented generating other transcriptome databases, in special, using RNA-Seq technology. In 2010, a draft genome (http://eucalyptusdb.bi.up.ac.za) of the specie E. grandis was produced by researches of the Joint Genome Institute (DOE-JGI) and the Eucalyptus Genome Network (EUCAGEN). The main goal of this work is to develop an Eucalyptusdatabase (http://www.lge.ibi.unicamp.br/genolyptus) integrating public and private data in a friendly and secure web interface with bioinformatics tools that allowing the users perform complex searches.

Results and discussion

First, the public and private ESTs (130,290 from Genolyptus and 36,981 from NCBI) were assembled producing 48,760 unigenes (17,795 contigs and 30,765 singlets). Basically, the bdtrimmer [1] and CAP3 [2] programs were used to perform sequence trimming (exclude vector, ribosomal, low quality and too short reads) and sequence assembly, respectively.

The autofact pipeline [3] was used to perform an automatic annotation of the assembled unigenes based on BLAST [4] searches, e-value cutoff of 1e-5, against some protein databases, including: non-redundant (NR) database of NCBI, uniref90 and uniref100 – databases containing only curated proteins [5], pfam – database of proteins families [6], kegg – database of metabolic pathways [7] and Gene Ontology (GO) – database of functional annotation [8].

The Genomic and Expression Laboratory at State University of Campinas (http://www.lge.ibi.unicamp.br) sequenced ten RNA-Seq libraries from four species (E. Urograndis, E. globulus, E. grandis and E. urophylla) using the Illumina/Solexa technology. Additionally, three RNA-seq libraries [9] were downloaded from NCBI (SRA – sequence read archive). All RNA-seq reads were aligned against the assembled unigenes and genome assembly using the SOAP2 [10] and TopHat [11] aligners, configured to allow up two mismatches, discard sequences with “N”s and return all optimal alignments.

In order to perform a differential expression analysis between ESTs or RNA-seq libraries some normalization pipelines and statistical tests have been implemented.From ESTs, the differentially expressed genes between libraries were performed applying AC test [12] in assembled unigenes. The results are available to the users by a web interface (called Electronic Northern) that allows searches by gene or library name. Furthermore, it is possible to compare the gene expression between two or more libraries. From the RNA-seq libraries, the DEG-seq software [13] was used to perform normalization and statistical analysis considering 99% of confidence rate (cut-off of 0.01).

To integrate all data described above, we developed a web site (Fig. 1) hosted in a Fedora Linux machine with MySQL database server. The web interface is based on a combination of CGI scripts using PERL language (including BioPerl module) and the Apache Web Server. The site contains many bioinformatics tools allowing the user perform keyword or local BLAST search in assembled unigenes. Also it is possible to connect these results with gene expression analysis. Moreover, the Gbrowse software (Generic Genome Browser) (Fig. 2) was used to visualization the data in a genomic context, integrating the different information by clickable tracks. The top track is the reference genome assembly and the other tracks correspond to assembled unigenes and RNA-seq data mapped into reference.

References

Baudet C, Dias Z: New EST Trimming Strategy. Brazilian Symposium on Bioinformatics. 2005, Lecture Notes in Bioinformatics – Berlin – Germany: Springer – Verlag, 3594: 206-209.
Google Scholar
Huang X, Madan A: CAP3: A DNA Sequence Assembly Program. Genome Res. 1999, 9: 868-877. 10.1101/gr.9.9.868.
Article PubMed Central CAS PubMed Google Scholar
Koski LB, Gray LW, Lang BF, Burger G: AutoFACT: An Automatic Functional Annotation and Classification Tool. BMC Bioinformatics. 2005, 6: 151-10.1186/1471-2105-6-151.
Article PubMed Central PubMed Google Scholar
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acids Res. 1997, 25 (17): 3389-3402. 10.1093/nar/25.17.3389.
Article PubMed Central CAS PubMed Google Scholar
Suzek BE, Huang H, McGarvey P, Mazumber R, Wu CH: Uniref: comprehensive and non-redundant UniProt reference clusters. Bioinformatics. 2007, 23 (10): 1282-1288. 10.1093/bioinformatics/btm098.
Article CAS PubMed Google Scholar
Bateman A, Birney E, Cerruti L, Durbin R, Etwiller L, Griffiths-Jones S, Howe KL, Marshall M, Sonnhammer ELL: The Pfam Protein Families Database. Nucl. Acids Res. 2002, 30 (1): 276-280. 10.1093/nar/30.1.276.
Article PubMed Central CAS PubMed Google Scholar
Kanehisa M, Goto S: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucl. Acids Res. 2000, 28 (1): 27-30. 10.1093/nar/28.1.27.
Article PubMed Central CAS PubMed Google Scholar
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene Ontology: tool for the unification of biology. Nature Genetics. 2000, 2525-29.
Google Scholar
Mizrachi E, Hefer CA, Ranik M, Joubert F, Myburg AA: De novoassembled expressed gene catalogue f a fast-growing Eucalyptus tree produced by Illumina mRNA-Seq. BMC Genomics. 2010, 11 (681): 1471-2164.
Google Scholar
Li R, Yu C, Li Y, Lam T-W, Yiu S-M, Kristiansen K, Wang J: SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics. 2009, 25 (15): 1966-1967. 10.1093/bioinformatics/btp336.
Article CAS PubMed Google Scholar
Trapnell C, Pachter L, Salzberg SL: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009, 25 (9): 1105-1111. 10.1093/bioinformatics/btp120.
Article PubMed Central CAS PubMed Google Scholar
Audic S, Claverie JM: The significance of Digital Gene Expression Profiles. Genome Res. 1997, 7: 986-995.
CAS PubMed Google Scholar
Wang L, Feng Z, Wang X, Wang X, Zhang X: DEGSeq: an R package for identifying differentially expressed genes from RNA-seq data. Bioinformatics. 2010, 26 (1): 136-138. 10.1093/bioinformatics/btp612.
Article PubMed Google Scholar

Download references

Acknowledgments

The authors would like to acknowledge all researches of the Joint Genome Institute (DOE-JGI) and the Eucalyptus Genome Network (EUCAGEN), responsible to produce the draft genome of the E. grandis. Moreover, we thank the CNPQ (Conselho Nacional de Desenvolvimento Científico e Tecnológico - Brazil) for the financial support of this work.

Author information

Authors and Affiliations

Laboratório de Genômica e Expressão - Instituto de Biologia, Universidade Estadual de Campinas – UNICAMP, Brazil
Leandro Costa Nascimento, Jorge Lepikson Neto, Marcela Mendes Salaza, Eduardo Leal Oliveira Camargo, Wesley Leoricy Marques, Danieli Cristina Gonçalves & Gonçalo Amarante Guimarães Pereira
Laboratório de Genômica e Expressão - Instituto de Biologia - Universidade Estadual de Campinas - UNICAMP/LNBio, Laboratório Nacional de Biociências – ABTLuS, Brazil
Ramon Oliveira Vidal
Laboratório de Genômica e Expressão - Instituto de Biologia - Universidade Estadual de Campinas - UNICAMP/Centro Nacional de Processamento de Alto Desempenho em São Paulo, Universidade Estadual de Campinas-UNICAMP, Brazil
Marcelo Falsarella Carazzolle

Authors

Leandro Costa Nascimento
View author publications
You can also search for this author in PubMed Google Scholar
Jorge Lepikson Neto
View author publications
You can also search for this author in PubMed Google Scholar
Marcela Mendes Salaza
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo Leal Oliveira Camargo
View author publications
You can also search for this author in PubMed Google Scholar
Wesley Leoricy Marques
View author publications
You can also search for this author in PubMed Google Scholar
Danieli Cristina Gonçalves
View author publications
You can also search for this author in PubMed Google Scholar
Ramon Oliveira Vidal
View author publications
You can also search for this author in PubMed Google Scholar
Gonçalo Amarante Guimarães Pereira
View author publications
You can also search for this author in PubMed Google Scholar
Marcelo Falsarella Carazzolle
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leandro Costa Nascimento.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Nascimento, L.C., Neto, J.L., Salaza, M.M. et al. An integrated database of Eucalyptusspp. genome project. BMC Proc 5 (Suppl 7), P170 (2011). https://doi.org/10.1186/1753-6561-5-S7-P170

Download citation

Published: 13 September 2011
DOI: https://doi.org/10.1186/1753-6561-5-S7-P170

IUFRO Tree Biotechnology Conference 2011: From Genomes to Integration and Delivery

An integrated database of Eucalyptusspp. genome project

Background

Results and discussion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

BMC Proceedings

Contact us

IUFRO Tree Biotechnology Conference 2011: From Genomes to Integration and Delivery

An integrated database of Eucalyptusspp. genome project

Background

Results and discussion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Proceedings

Contact us