Chronic lung diseases affect a significant portion of the population, and the incidences of chronic obstructive pulmonary disease (COPD)/emphysema and idiopathic pulmonary fibrosis (IPF) are increasing. COPD is the fourth leading cause of death in the USA and the incidence of IPF has doubled over the past decade. Identification of novel transcripts and transcript isoforms (alternative splicing patterns) associated with these diseases may help us better understand their molecular pathogenesis, and identify both novel disease-specific biomarkers and therapeutic targets.
Materials and methods
Using lung tissue sections from the NHLBI Lung Tissue Research Consortium, we sequenced the mRNA (75 or 99 nt paired-end sequencing; Illumina GAIIx or HiSeq) from 145 lung tissue samples that were subsequently split into an initial training cohort of 89 samples and an independent filtering set of 56 samples. Genome-guided transcriptome reconstruction using Cufflinks was performed on the training and independent filtering set. A final conservatively filtered assembly was created by requiring complete overlap of all transcripts present for a gene in the two assemblies. Next, the algorithm MISO was used to quantify isoform proportions for known and novel transcripts found in each gene. These were modeled as a function of the disease state, isoform and the interaction between disease state and isoform to identify disease-associated differentially spliced genes.
The filtered transcriptome assembly (overlap set) is more similar to known genes (based on comparisons with Ensembl) than the initial training and independent filtering set. A set of 38 novel gene candidates were selected based on gene structure parameters computed from Ensembl annotation. Differential expression (DE) analysis was performed, and five of the candidate genes were DE in emphysema and eight in IPF (P<0.01) compared with control. Three of these candidate genes were DE in both diseases. Several examples of disease-associated differential splicing were also identified. These new disease-associated isoforms are being further investigated to identify their biological function and relevance to COPD and IPF.
RNA-Seq of a large number of lung tissue samples has allowed us to identify novel disease-associated genes and alternative splicing patterns that may contribute to our understanding of the pathogenesis of IPF and COPD.