Open Access Highly Accessed Open Badges Research article

A transcriptional sketch of a primary human breast cancer by 454 deep sequencing

Alessandro Guffanti12*, Michele Iacono1, Paride Pelucchi1, Namshin Kim34, Giulia Soldà5, Larry J Croft6, Ryan J Taft6, Ermanno Rizzi1, Marjan Askarian-Amiri6, Raoul J Bonnal1, Maurizio Callari7, Flavio Mignone8, Graziano Pesole19, Giovanni Bertalot1011, Luigi Rossi Bernardi12, Alberto Albertini1, Christopher Lee3, John S Mattick6, Ileana Zucchi1 and Gianluca De Bellis1

Author Affiliations

1 Institute of Biomedical Technologies, National Research Council, Milan, Italy

2 Current address: Genomnia srl, via Nerviano, 31 – 20020 Lainate, Milano, Italy

3 Department of Biochemistry and Molecular Biology, University of California Los Angeles, CA, USA

4 Current address: Korean Bioinformation Center, Korea Research Institute of Bioscience and Biotechnology, 52 Eoeun-dong, Yuseong-gu, Daejeon, 305-806, South Korea

5 Department of Biology and Genetics for Medical Sciences, University of Milan, Milan, Italy

6 ARC Special Research Centre for Functional and Applied Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, QLD 4072, Australia

7 Translational Research Unit, Department of Experimental Oncology, Istituto Nazionale Tumori, Milan, Italy

8 Faculty of Pharmacological Sciences, University of Milan, Milan, Italy

9 Department of Biochemistry and Molecular Biology, University of Bari, Bari, Italy

10 Division of Pathology and Laboratory Medicine, European Institute of Oncology, Milan, Italy

11 Current address: Department of Pathology, Desenzano sul Garda Hospital, Leno, Italy

12 Science and Technology Pole, Istituto di Ricovero e Cura a Carattere Scientifico MultiMedica, Milan, Italy

For all author emails, please log on.

BMC Genomics 2009, 10:163  doi:10.1186/1471-2164-10-163

Published: 20 April 2009



The cancer transcriptome is difficult to explore due to the heterogeneity of quantitative and qualitative changes in gene expression linked to the disease status. An increasing number of "unconventional" transcripts, such as novel isoforms, non-coding RNAs, somatic gene fusions and deletions have been associated with the tumoral state. Massively parallel sequencing techniques provide a framework for exploring the transcriptional complexity inherent to cancer with a limited laboratory and financial effort. We developed a deep sequencing and bioinformatics analysis protocol to investigate the molecular composition of a breast cancer poly(A)+ transcriptome. This method utilizes a cDNA library normalization step to diminish the representation of highly expressed transcripts and biology-oriented bioinformatic analyses to facilitate detection of rare and novel transcripts.


We analyzed over 132,000 Roche 454 high-confidence deep sequencing reads from a primary human lobular breast cancer tissue specimen, and detected a range of unusual transcriptional events that were subsequently validated by RT-PCR in additional eight primary human breast cancer samples. We identified and validated one deletion, two novel ncRNAs (one intergenic and one intragenic), ten previously unknown or rare transcript isoforms and a novel gene fusion specific to a single primary tissue sample. We also explored the non-protein-coding portion of the breast cancer transcriptome, identifying thousands of novel non-coding transcripts and more than three hundred reads corresponding to the non-coding RNA MALAT1, which is highly expressed in many human carcinomas.


Our results demonstrate that combining 454 deep sequencing with a normalization step and careful bioinformatic analysis facilitates the discovery and quantification of rare transcripts or ncRNAs, and can be used as a qualitative tool to characterize transcriptome complexity, revealing many hitherto unknown transcripts, splice isoforms, gene fusion events and ncRNAs, even at a relatively low sequence sampling.