<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
	<ui>1471-2164-14-14</ui>
	<ji>1471-2164</ji>
	<fm>
		<dochead>Methodology article</dochead>
		<bibl>
			<title>
				<p>Quality assessment and data handling methods for Affymetrix Gene 1.0 ST arrays with variable RNA integrity</p>
			</title>
			<aug>
				<au id="A1"><snm>Viljoen</snm><mi>S</mi><fnm>Katie</fnm><insr iid="I1"/><email>katie.viljoen@uct.ac.za</email></au>
				<au id="A2" ca="yes"><snm>Blackburn</snm><mi>M</mi><fnm>Jonathan</fnm><insr iid="I1"/><email>jonathan.blackburn@uct.ac.za</email></au>
			</aug>
			<insg>
				<ins id="I1"><p>Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Anzio Road, Observatory, Cape Town, 7925, South Africa</p></ins>
			</insg>
			<source>BMC Genomics</source>
			<section><title><p>Transcriptomics</p></title></section><issn>1471-2164</issn>
			<pubdate>2013</pubdate>
			<volume>14</volume>
			<issue>1</issue>
			<fpage>14</fpage>
			<url>http://www.biomedcentral.com/1471-2164/14/14</url>
			<xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-14-14</pubid><pubid idtype="pmpid">23324084</pubid></pubidlist></xrefbib>
		</bibl>
		<history><rec><date><day>13</day><month>9</month><year>2012</year></date></rec><acc><date><day>2</day><month>1</month><year>2013</year></date></acc><pub><date><day>16</day><month>1</month><year>2013</year></date></pub></history>
		<cpyrt><year>2013</year><collab>Viljoen and Blackburn; licensee BioMed Central Ltd.</collab><note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
		<kwdg>
			<kwd>Gene expression profiling</kwd>
			<kwd>Microarray</kwd>
			<kwd>RNA quality</kwd>
			<kwd>RNA integrity number</kwd>
			<kwd>Quality control</kwd>
			<kwd>ComBat</kwd>
			<kwd>Surrogate variable analysis</kwd>
			<kwd>Non-biological experimental variance</kwd>
		</kwdg>
		<abs>
			<sec>
				<st>
					<p>Abstract</p>
				</st>
				<sec>
					<st>
						<p>Background</p>
					</st>
					<p>RNA and microarray quality assessment form an integral part of gene expression analysis and, although methods such as the RNA integrity number (RIN) algorithm reliably asses RNA integrity, the relevance of RNA integrity in gene expression analysis as well as analysis methods to accommodate the possible effects of degradation requires further investigation. We investigated the relationship between RNA integrity and array quality on the commonly used Affymetrix Gene 1.0 ST array platform using reliable within-array and between-array quality assessment measures. The possibility of a transcript specific bias in the apparent effect of RNA degradation on the measured gene expression signal was evaluated after either excluding quality-flagged arrays or compensation for RNA degradation at different steps in the analysis.</p>
				</sec>
				<sec>
					<st>
						<p>Results</p>
					</st>
					<p>Using probe-level and inter-array quality metrics to assess 34 Gene 1.0 ST array datasets derived from historical, paired tumour and normal primary colorectal cancer samples, 7 arrays (20.6%), with a mean sample RIN of 3.2 (SD = 0.42), were flagged during array quality assessment while 10 arrays from samples with RINs &lt; 7 passed quality assessment, including one sample with a RIN &lt; 3. We detected a transcript length bias in RNA degradation in only 5.8% of annotated transcript clusters (p-value 0.05, FC &#8805; |2|), with longer and shorter than average transcripts under- and overrepresented in quality-flagged samples respectively. Applying compensatory measures for RNA degradation performed at least as well as excluding quality-flagged arrays, as judged by hierarchical clustering, gene expression analysis and Ingenuity Pathway Analysis; importantly, use of these compensatory measures had the significant benefit of enabling lower quality array data from irreplaceable clinical samples to be retained in downstream analyses.</p>
				</sec>
				<sec>
					<st>
						<p>Conclusions</p>
					</st>
					<p>Here, we demonstrate an effective array-quality assessment strategy, which will allow the user to recognize lower quality arrays that can be included in the analysis once appropriate measures are applied to account for known or unknown sources of variation, such as array quality- and batch- effects, by implementing ComBat or Surrogate Variable Analysis. This approach of quality control and analysis will be especially useful for clinical samples with variable and low RNA qualities, with RIN scores &#8805; 2.</p>
				</sec>
			</sec>
		</abs>
	</fm>
	<bdy>
		<sec>
			<st>
				<p>Background</p>
			</st>
			<p>RNA degradation is a common concern in gene expression analysis, especially for clinical samples where RNA degradation may occur before sample collection 
				<abbrgrp>
					<abbr bid="B1">1</abbr>
				</abbrgrp>. A wealth of archival material, either snap frozen or formalin fixed and paraffin embedded (FFPE), could potentially be used for gene expression analysis, given an appropriate method to evaluate and account for the effect of RNA degradation on the quality of downstream gene expression data. Methods such as the RNA integrity number (RIN) algorithm reliably assesses RNA integrity by extracting features from the RNA electropherogram. The RIN algorithm was developed using learning tools to identify regions (features) indicative of RNA integrity in the electropherogram, which are then used to compile the RNA integrity number on a scale of 1 to 10. However, the relevance of RNA integrity in gene expression analysis, especially when there is large variability between samples, requires further investigation and validation on a platform specific basis. The impact of RNA integrity on gene expression analysis has been investigated on both qRT-PCR and certain microarray platforms 
				<abbrgrp>
					<abbr bid="B2">2</abbr>
					<abbr bid="B3">3</abbr>
					<abbr bid="B4">4</abbr>
					<abbr bid="B5">5</abbr>
					<abbr bid="B6">6</abbr>
					<abbr bid="B7">7</abbr>
				</abbrgrp>. Opitz et al investigated the impact of RNA degradation on Agilent 44 k gene expression profiling by subjecting RNA from clinical biopsies to temperature-induced RNA degradation and comparing gene expression to the original, intact samples. Notably, less than 1% of genes were affected, even after substantial RNA degradation, where control and test samples had RINs of 9 and 5 respectively. The affected transcripts were relatively shorter, had lower GC content, or had probes relatively closer to the 5' region of the gene compared to more robust genes 
				<abbrgrp>
					<abbr bid="B6">6</abbr>
				</abbrgrp>. Although the process of RNA degradation is not fully understood, both exonuclease and endonuclease activity is likely to play an important role 
				<abbrgrp>
					<abbr bid="B6">6</abbr>
				</abbrgrp>. Classical oligo-dT based cDNA synthesis, which starts at the poly-A tail, will most certainly be compromised by exonuclease activity. In contrast random priming does not rely on full length mRNA and therefore is in theory at least partially relieved from the affects of RNA degradation 
				<abbrgrp>
					<abbr bid="B6">6</abbr>
					<abbr bid="B7">7</abbr>
					<abbr bid="B8">8</abbr>
					<abbr bid="B9">9</abbr>
				</abbrgrp>.</p>
			<p>When using semi-degraded RNA for gene expression studies, reliable measures of array quality provide valuable information that can be used to guide downstream analysis. Microarray data quality may be defined in terms of accuracy (systematic bias between the true and measured value), precision (the uncertainty in replicated measures), specificity (the selective power of the measurement to respond only to the specific targets) and sensitivity (the expression range potentially covered by the measurement) 
				<abbrgrp>
					<abbr bid="B10">10</abbr>
				</abbrgrp>. Any attempt to utilise array quality results to guide downstream analysis should ideally take into account the possible effects of RNA degradation on sensitivity, specificity and accuracy. In previous work, Binder et al proposed a single-array preprocessing method that allows correction for systematic biases such as RNA degradation by utilising information on the 3'/5'-amplification bias and the sample-specific calling rate 
				<abbrgrp>
					<abbr bid="B10">10</abbr>
				</abbrgrp>. Lassmann et al proposed using a data adjustment method to allow comparative analysis of microarray datasets derived from fresh frozen vs. FFPE samples by centering the log intensities of each probe set independently to a mean of zero in both groups 
				<abbrgrp>
					<abbr bid="B8">8</abbr>
				</abbrgrp>. Chow et al evaluated the suitability of different quality control and preprocessing strategies for use with partially degraded RNA samples on the Illumina DASL-based gene expression assay using mean inter-array correlation and multivariate distance matrix regression (MDMR) as a measure of success 
				<abbrgrp>
					<abbr bid="B11">11</abbr>
				</abbrgrp>. Unfortunately none of these studies are directly applicable to one of the most commonly used human transcriptomic microarray platforms, namely Affymetrix Gene 1.0 ST arrays, either because they do not use a random priming approach or because the design of the microarray platform differs substantially from Gene 1.0 ST arrays. We therefore identified two alternative approaches that might be used as compensatory methods: Firstly, Johnson et al developed an empirical Bayes algorithm, ComBat, to directly adjust for non-biological experimental variation. As the name implies, this method is most often used to adjust for batch effects i.e. when microarrays are processed on different dates 
				<abbrgrp>
					<abbr bid="B12">12</abbr>
				</abbrgrp>. Secondly, Leek et al developed a method called Surrogate Variable Analysis (SVA), which examines the contribution of sources ofsignal due to unknown (surrogate) variables in high-dimensional data sets, which may confound the biological signal of interest 
				<abbrgrp>
					<abbr bid="B13">13</abbr>
				</abbrgrp>. The surrogate variables are constructed directly from the gene expression data where groups of genes that are affected by each source of variation are identified, factors are then estimated for each array which can be included in a linear model to adjust for unknown sources of noise e.g. RNA- or array-quality.</p>
			<p>Here, we investigate the relationship between RNA integrity and array quality on Affymetrix Gene 1.0 ST arrays for 34 paired colorectal tumour and adjacent normal biopsies of highly variable RNA integrity. We assume that at a certain point on the RIN scale, RNA will be degraded to the extent where fragments are too small to analyse reliably and for the purpose of this analysis we arbitrarily select a RIN cutoff of 2. We describe the within- and between-array quality control measures and analysis methods that we found most relevant for gene expression analysis of samples with highly variable RINs on Affymetrix Gene 1.0 ST arrays. We then investigate the possibility of a transcript-length dependency in RNA degradation. Finally, we apply array quality information to either exclude quality-flagged arrays, to directly adjust the data using the ComBat algorithm, or to account for unknown sources of variation (such as RNA integrity or array quality) in the model fitting process using SVA. The data discussed, have been submitted to ArrayExpress, with accession number E-MEXP-3715.</p>
		</sec>
		<sec>
			<st>
				<p>Results</p>
			</st>
			<sec>
				<st>
					<p>Array quality</p>
				</st>
				<p>We assessed array quality using within- and between-array measures &#8211; the former to assess raw data quality (Figure 
					<figr fid="F1">1a</figr> &amp; 
					<figr fid="F1">1b</figr>), and the latter to assess the quality of an array relative to a large publically available collection of high quality Gene 1.0 ST arrays (Figure 
					<figr fid="F1">1c</figr>). Raw array quality was investigated at the probe level by calculating the difference between the means of perfect match- and background-probes for each array as well as the coefficient of variation (CV) across all probes for each array. Preprocessed data quality was assessed using the global normalised, unscaled standard error (GNUSE) 
					<abbrgrp>
						<abbr bid="B14">14</abbr>
					</abbrgrp>. See Methods section for details.</p>
				<fig id="F1"><title><p>Figure 1</p></title><caption><p>Array quality metrics</p></caption><text>
   <p><b>Array quality metrics. a</b>) Raw coefficient of variation across all probes by sample, the red line represents our chosen threshold which is calculated as 2SD from the mean of CVs for arrays with RINs > 6. <b>b</b>) Raw perfect match mean - background mean <b>c</b>) Global normalised unscaled errors (GNUSE) across probes for each array. Samples that were flagged during quality assessment are highlighted in red.</p>
</text><graphic file="1471-2164-14-14-1"/></fig>
				<p>The 34 RNA samples used in this study had a mean RIN of 6.3 and a standard deviation of 2.0. Samples that failed all three measures of quality had RINs between 2 and 3.3 as summarised in Table 
					<tblr tid="T1">1</tblr>. Samples were ranked by GNUSE median and we found a good concordance in terms of ranking between the different quality control metrics. Samples that failed at least two out of the three quality measures were flagged for downstream analysis, resulting in 7 out of 34 samples being flagged (mean RIN = 3.2; SD = 0.42). Interestingly, for one sample with a RIN of 2.6, array quality was not compromised, judged by our quality measures. The possibility of a RIN-independent RNA quality factor, such as chemical purity, was investigated by performing a two-tailed Student&#8217;s t-test, comparing A260/230 ratios between quality-flagged and quality-passed sample groups but no significant association was found (p-value = 0.14).</p>
				<table id="T1">
					<title>
						<p>Table 1</p>
					</title>
					<caption>
						<p>
							<b>Array quality assessment summary</b>
						</p>
					</caption>
					<tgroup align="left" cols="7">
						<colspec align="left" colname="c1" colnum="1" colwidth="1*"/>
						<colspec align="center" colname="c2" colnum="2" colwidth="1*"/>
						<colspec align="center" colname="c3" colnum="3" colwidth="1*"/>
						<colspec align="center" colname="c4" colnum="4" colwidth="1*"/>
						<colspec align="center" colname="c5" colnum="5" colwidth="1*"/>
						<colspec align="center" colname="c6" colnum="6" colwidth="1*"/>
						<colspec align="center" colname="c7" colnum="7" colwidth="1*"/>
						<thead valign="top">
							<row rowsep="1">
								<entry colname="c1">
									<p>
										<b>Sample ID</b>
									</p>
								</entry>
								<entry align="center" colname="c2">
									<p>
										<b>RIN</b>
									</p>
								</entry>
								<entry align="center" colname="c3">
									<p>
										<b>RNA 260/230 ratio</b>
									</p>
								</entry>
								<entry align="center" colname="c4">
									<p>
										<b>GNUSE</b>
									</p>
								</entry>
								<entry align="center" colname="c5">
									<p>
										<b>probe-level CV</b>
									</p>
								</entry>
								<entry align="center" colname="c6">
									<p>
										<b>PM-BG</b>
									</p>
								</entry>
								<entry align="center" colname="c7">
									<p>
										<b>Array weight</b>
									</p>
								</entry>
							</row>
						</thead>
						<tfoot>
							<p>Array performance is ranked for each measure with 1 considered the worst quality. Samples highlighted in bold were flagged for downstream analysis.</p>
						</tfoot>
						<tbody valign="top">
							<row rowsep="1">
								<entry colname="c1">
									<p>
										<b>44N</b>
									</p>
								</entry>
								<entry align="center" colname="c2">
									<p>3</p>
								</entry>
								<entry align="center" colname="c3">
									<p>2.41</p>
								</entry>
								<entry align="center" colname="c4">
									<p>
										<b>fail (1)</b>
									</p>
								</entry>
								<entry align="center" colname="c5">
									<p>
										<b>fail (3)</b>
									</p>
								</entry>
								<entry align="center" colname="c6">
									<p>
										<b>fail (1)</b>
									</p>
								</entry>
								<entry align="center" colname="c7">
									<p>
										<b>0.22 (1)</b>
									</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>
										<b>33T</b>
									</p>
								</entry>
								<entry align="center" colname="c2">
									<p>2.8</p>
								</entry>
								<entry align="center" colname="c3">
									<p>2.08</p>
								</entry>
								<entry align="center" colname="c4">
									<p>
										<b>fail (2)</b>
									</p>
								</entry>
								<entry align="center" colname="c5">
									<p>
										<b>fail (5)</b>
									</p>
								</entry>
								<entry align="center" colname="c6">
									<p>
										<b>fail (2)</b>
									</p>
								</entry>
								<entry align="center" colname="c7">
									<p>
										<b>0.28 (2)</b>
									</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>
										<b>60N</b>
									</p>
								</entry>
								<entry align="center" colname="c2">
									<p>3.2</p>
								</entry>
								<entry align="center" colname="c3">
									<p>2.03</p>
								</entry>
								<entry align="center" colname="c4">
									<p>
										<b>fail (3)</b>
									</p>
								</entry>
								<entry align="center" colname="c5">
									<p>
										<b>fail (1)</b>
									</p>
								</entry>
								<entry align="center" colname="c6">
									<p>
										<b>fail (3)</b>
									</p>
								</entry>
								<entry align="center" colname="c7">
									<p>
										<b>0.42 (3)</b>
									</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>
										<b>63T</b>
									</p>
								</entry>
								<entry align="center" colname="c2">
									<p>3</p>
								</entry>
								<entry align="center" colname="c3">
									<p>2.2</p>
								</entry>
								<entry align="center" colname="c4">
									<p>
										<b>fail (4)</b>
									</p>
								</entry>
								<entry align="center" colname="c5">
									<p>
										<b>fail (4)</b>
									</p>
								</entry>
								<entry align="center" colname="c6">
									<p>
										<b>pass</b>
									</p>
								</entry>
								<entry align="center" colname="c7">
									<p>
										<b>0.59 (6)</b>
									</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>
										<b>10T</b>
									</p>
								</entry>
								<entry align="center" colname="c2">
									<p>3.2</p>
								</entry>
								<entry align="center" colname="c3">
									<p>2.18</p>
								</entry>
								<entry align="center" colname="c4">
									<p>
										<b>fail (5)</b>
									</p>
								</entry>
								<entry align="center" colname="c5">
									<p>
										<b>fail (2)</b>
									</p>
								</entry>
								<entry align="center" colname="c6">
									<p>
										<b>fail (4)</b>
									</p>
								</entry>
								<entry align="center" colname="c7">
									<p>
										<b>0.60 (7)</b>
									</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>
										<b>56T</b>
									</p>
								</entry>
								<entry align="center" colname="c2">
									<p>3.3</p>
								</entry>
								<entry align="center" colname="c3">
									<p>1.87</p>
								</entry>
								<entry align="center" colname="c4">
									<p>
										<b>fail (6)</b>
									</p>
								</entry>
								<entry align="center" colname="c5">
									<p>
										<b>fail (10)</b>
									</p>
								</entry>
								<entry align="center" colname="c6">
									<p>
										<b>fail (5)</b>
									</p>
								</entry>
								<entry align="center" colname="c7">
									<p>
										<b>0.42 (4)</b>
									</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>
										<b>41T</b>
									</p>
								</entry>
								<entry align="center" colname="c2">
									<p>4.2</p>
								</entry>
								<entry align="center" colname="c3">
									<p>2.21</p>
								</entry>
								<entry align="center" colname="c4">
									<p>
										<b>fail (7)</b>
									</p>
								</entry>
								<entry align="center" colname="c5">
									<p>
										<b>fail (9)</b>
									</p>
								</entry>
								<entry align="center" colname="c6">
									<p>
										<b>pass</b>
									</p>
								</entry>
								<entry align="center" colname="c7">
									<p>
										<b>0.78 (8)</b>
									</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>13N</p>
								</entry>
								<entry align="center" colname="c2">
									<p>4.6</p>
								</entry>
								<entry align="center" colname="c3">
									<p>2.24</p>
								</entry>
								<entry align="center" colname="c4">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c5">
									<p>fail (7)</p>
								</entry>
								<entry align="center" colname="c6">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c7">
									<p>0.82 (9)</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>15T</p>
								</entry>
								<entry align="center" colname="c2">
									<p>4.8</p>
								</entry>
								<entry align="center" colname="c3">
									<p>2.15</p>
								</entry>
								<entry align="center" colname="c4">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c5">
									<p>fail (8)</p>
								</entry>
								<entry align="center" colname="c6">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c7">
									<p>1.07 (15)</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>4N</p>
								</entry>
								<entry align="center" colname="c2">
									<p>2.6</p>
								</entry>
								<entry align="center" colname="c3">
									<p>1.62</p>
								</entry>
								<entry align="center" colname="c4">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c5">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c6">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c7">
									<p>0.44 (5)</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>18N</p>
								</entry>
								<entry align="center" colname="c2">
									<p>7.1</p>
								</entry>
								<entry align="center" colname="c3">
									<p>1.66</p>
								</entry>
								<entry align="center" colname="c4">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c5">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c6">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c7">
									<p>0.83 (10)</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>8T</p>
								</entry>
								<entry align="center" colname="c2">
									<p>8.5</p>
								</entry>
								<entry align="center" colname="c3">
									<p>2.16</p>
								</entry>
								<entry align="center" colname="c4">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c5">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c6">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c7">
									<p>0.85 (11)</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>56N</p>
								</entry>
								<entry align="center" colname="c2">
									<p>6.5</p>
								</entry>
								<entry align="center" colname="c3">
									<p>1.94</p>
								</entry>
								<entry align="center" colname="c4">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c5">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c6">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c7">
									<p>0.95 (12)</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>20T</p>
								</entry>
								<entry align="center" colname="c2">
									<p>7.4</p>
								</entry>
								<entry align="center" colname="c3">
									<p>1.6</p>
								</entry>
								<entry align="center" colname="c4">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c5">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c6">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c7">
									<p>1.02 (13)</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>44T</p>
								</entry>
								<entry align="center" colname="c2">
									<p>6.9</p>
								</entry>
								<entry align="center" colname="c3">
									<p>1.72</p>
								</entry>
								<entry align="center" colname="c4">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c5">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c6">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c7">
									<p>1.03 (14)</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>11T</p>
								</entry>
								<entry align="center" colname="c2">
									<p>8.6</p>
								</entry>
								<entry align="center" colname="c3">
									<p>2.16</p>
								</entry>
								<entry align="center" colname="c4">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c5">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c6">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c7">
									<p>1.07 (16)</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>60T</p>
								</entry>
								<entry align="center" colname="c2">
									<p>6.4</p>
								</entry>
								<entry align="center" colname="c3">
									<p>1.64</p>
								</entry>
								<entry align="center" colname="c4">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c5">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c6">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c7">
									<p>1.09 (17)</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>14T</p>
								</entry>
								<entry align="center" colname="c2">
									<p>6.4</p>
								</entry>
								<entry align="center" colname="c3">
									<p>1.76</p>
								</entry>
								<entry align="center" colname="c4">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c5">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c6">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c7">
									<p>1.09 (18)</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>13T</p>
								</entry>
								<entry align="center" colname="c2">
									<p>8.3</p>
								</entry>
								<entry align="center" colname="c3">
									<p>2</p>
								</entry>
								<entry align="center" colname="c4">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c5">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c6">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c7">
									<p>1.11 (19)</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>23T</p>
								</entry>
								<entry align="center" colname="c2">
									<p>7</p>
								</entry>
								<entry align="center" colname="c3">
									<p>2.17</p>
								</entry>
								<entry align="center" colname="c4">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c5">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c6">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c7">
									<p>1.18 (20)</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>8N</p>
								</entry>
								<entry align="center" colname="c2">
									<p>7.1</p>
								</entry>
								<entry align="center" colname="c3">
									<p>2.22</p>
								</entry>
								<entry align="center" colname="c4">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c5">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c6">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c7">
									<p>1.25 (21)</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>18T</p>
								</entry>
								<entry align="center" colname="c2">
									<p>7.4</p>
								</entry>
								<entry align="center" colname="c3">
									<p>1.85</p>
								</entry>
								<entry align="center" colname="c4">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c5">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c6">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c7">
									<p>1.26 (22)</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>33N</p>
								</entry>
								<entry align="center" colname="c2">
									<p>8.1</p>
								</entry>
								<entry align="center" colname="c3">
									<p>1.82</p>
								</entry>
								<entry align="center" colname="c4">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c5">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c6">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c7">
									<p>1.45 (23)</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>34T</p>
								</entry>
								<entry align="center" colname="c2">
									<p>8</p>
								</entry>
								<entry align="center" colname="c3">
									<p>2.25</p>
								</entry>
								<entry align="center" colname="c4">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c5">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c6">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c7">
									<p>1.49 (24)</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>11N</p>
								</entry>
								<entry align="center" colname="c2">
									<p>6.8</p>
								</entry>
								<entry align="center" colname="c3">
									<p>1.94</p>
								</entry>
								<entry align="center" colname="c4">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c5">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c6">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c7">
									<p>1.50 (25)</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>20N</p>
								</entry>
								<entry align="center" colname="c2">
									<p>7.3</p>
								</entry>
								<entry align="center" colname="c3">
									<p>2.11</p>
								</entry>
								<entry align="center" colname="c4">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c5">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c6">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c7">
									<p>1.50 (26)</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>63N</p>
								</entry>
								<entry align="center" colname="c2">
									<p>7.5</p>
								</entry>
								<entry align="center" colname="c3">
									<p>2.13</p>
								</entry>
								<entry align="center" colname="c4">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c5">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c6">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c7">
									<p>1.61 (27)</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>23N</p>
								</entry>
								<entry align="center" colname="c2">
									<p>8.4</p>
								</entry>
								<entry align="center" colname="c3">
									<p>2.02</p>
								</entry>
								<entry align="center" colname="c4">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c5">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c6">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c7">
									<p>1.61 (28)</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>34N</p>
								</entry>
								<entry align="center" colname="c2">
									<p>8.3</p>
								</entry>
								<entry align="center" colname="c3">
									<p>2.21</p>
								</entry>
								<entry align="center" colname="c4">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c5">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c6">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c7">
									<p>1.61 (29)</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>14N</p>
								</entry>
								<entry align="center" colname="c2">
									<p>8.1</p>
								</entry>
								<entry align="center" colname="c3">
									<p>2.36</p>
								</entry>
								<entry align="center" colname="c4">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c5">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c6">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c7">
									<p>1.74 (30)</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>41N</p>
								</entry>
								<entry align="center" colname="c2">
									<p>5.4</p>
								</entry>
								<entry align="center" colname="c3">
									<p>2.07</p>
								</entry>
								<entry align="center" colname="c4">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c5">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c6">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c7">
									<p>1.76 (31)</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>10N</p>
								</entry>
								<entry align="center" colname="c2">
									<p>7.3</p>
								</entry>
								<entry align="center" colname="c3">
									<p>1.78</p>
								</entry>
								<entry align="center" colname="c4">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c5">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c6">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c7">
									<p>1.78 (32)</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>15N</p>
								</entry>
								<entry align="center" colname="c2">
									<p>6.9</p>
								</entry>
								<entry align="center" colname="c3">
									<p>2.16</p>
								</entry>
								<entry align="center" colname="c4">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c5">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c6">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c7">
									<p>1.90 (33)</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>4T</p>
								</entry>
								<entry align="center" colname="c2">
									<p>8.4</p>
								</entry>
								<entry align="center" colname="c3">
									<p>2.25</p>
								</entry>
								<entry align="center" colname="c4">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c5">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c6">
									<p>pass</p>
								</entry>
								<entry align="center" colname="c7">
									<p>2.14 (34</p>
								</entry>
							</row>
						</tbody>
					</tgroup>
				</table>
			</sec>
			<sec>
				<st>
					<p>Transcript-dependent effects of RNA degradation on accuracy</p>
				</st>
				<p>To investigate a possible probe-positional intensity bias related to RNA integrity, we plotted the mean probe intensity from the 5'- to 3' end of the sequence using 4644/32321 (14.4%) of transcript clusters for Gene 1.0 ST arrays and 54130/54675 (99%) of probesets for HGU133-plus2 arrays. The number of probes per set varies for GeneST arrays, so we selected the largest group (N = 4664), which had exactly 25 probes/set. Interestingly, from the 4644 transcript clusters displayed in Figure 
					<figr fid="F2">2</figr>, Gene ST 1.0 arrays, do not display the same probe-positional intensity bias typically seen in oligo-dT based arrays such as the HGU133-plus2 arrays.</p>
				<fig id="F2"><title><p>Figure 2</p></title><caption><p>Mean probe intensity by probe position</p></caption><text>
   <p><b>Mean probe intensity by probe position. </b>Each line represents an array for <b>a</b>) Gene 1.0 ST arrays: transcript clusters with exactly 25 probes (N = 4644) and <b>b</b>) HGU133-plus2 arrays previously analysed with a subset of the cohort: probesets with exactly 11 probes per probeset.</p>
</text><graphic file="1471-2164-14-14-2"/></fig>
				<p>We next investigated which genes were most affected in our quality-flagged category and identified 1994 out of 21943 annotated transcript clusters (with 1172 uniquely identified genes) that were significantly different (fold change &#8805; |2|, adjusted p-value &#8804; 0.05) between the two quality categories previously discussed. Of the 1172 uniquely identified genes, 1032 and 140 showed decreased or increased intensity in the quality-flagged category respectively (Figure 
					<figr fid="F3">3a</figr>). To investigate transcript characteristics in the genes most affected, we compared transcript lengths (taken as the median cDNA length for each gene) between the different groups. Compared to the unaffected genes, median cDNA lengths of genes that showed increased intensity were significantly shorter (p-value &lt; 2.2 <it>e</it> &#8722; 16) while those with decreased intensity significantly longer (p-value = 2.9 <it>e</it> &#8722; 9) with regards to quality, judged using the Mann Whitney test (Figure 
					<figr fid="F3">3b</figr>).</p>
				<fig id="F3"><title><p>Figure 3</p></title><caption><p>Characteristics of genes most affected by RNA degradation</p></caption><text>
   <p><b>Characteristics of genes most affected by RNA degradation. </b>Comparison of samples that either passed or were flagged during QC. <b>a</b>) Fold change distribution of annotated transcript clusters comparing samples that were flagged vs. samples that passed QC <b>b</b>) Gene lengths of uniquely identified genes. Expression signal significantly increased (Up) or decreased (Down) with respect to the &#8216;Unaffected&#8217; group, judged using a Mann-Whitney test. Adjusted p-value &#8804; 0.01, |fold change| > 2.</p>
</text><graphic file="1471-2164-14-14-3"/></fig>
			</sec>
			<sec>
				<st>
					<p>Quality dependent methods of data adjustment and analysis</p>
				</st>
				<p>After assigning samples to two categories according to array quality measures, we next assessed the performance of the five preprocessing and analysis methods. Broadly speaking, the data was either directly adjusted for quality effects using ComBat, or quality-flagged samples were excluded from the analysis, or possible quality effects were addressed by including known or unknown sources of non-biological variance in the linear model fit to assess differential expression.</p>
				<p>The five methods of data preprocessing and analysis, further detailed in the Methods section, were: 1) Estimating array quality weights which were then included in the linear model fit; 2) Excluding quality-flagged arrays from the analysis; 3) Applying a batch correction algorithm, ComBat, 
					<abbrgrp>
						<abbr bid="B12">12</abbr>
					</abbrgrp> to directly adjust the data according to quality, where arrays were divided into two categories according to the array quality assessment; 4) &#8220;Quality&#8221; and &#8220;batch&#8221; were included as a factors in the linear model together with disease status; 5) Possible unknown sources of non-biological variance, such as quality, was estimated by SVA, with the output incorporated into the linear model fit 
					<abbrgrp>
						<abbr bid="B13">13</abbr>
					</abbrgrp>.</p>
				<p>To assess the effect of using ComBat for direct data adjustment, hierarchical clustering using Euclidian distance was performed before and after direct adjustment (Figure 
					<figr fid="F4">4</figr>). We chose to use Euclidian distance based on research by Gibbons et al who demonstrated that, for log-transformed expression data, using Euclidian distance is more appropriate than Pearson&#8217;s correlation coefficients 
					<abbrgrp>
						<abbr bid="B15">15</abbr>
					</abbrgrp>. Before adjustment, samples that were flagged during quality assessment cluster closely together, irrespective of the disease status of the samples. After adjustment, the maximum distance between samples is greatly reduced, and quality-flagged samples no longer cluster together. Also, samples segregate more clearly by disease status after adjustment. Furthermore, applying ComBat clearly has a stabilising effect on the transcript clusters most affected by RNA quality (Figure 
					<figr fid="F5">5b</figr> &amp; 
					<figr fid="F5">5c</figr>).</p>
				<fig id="F4"><title><p>Figure 4</p></title><caption><p>Expression profiles of samples clustered using average linkage hierarchical clustering</p></caption><text>
   <p><b>Expression profiles of samples clustered using average linkage hierarchical clustering. a</b>) Sample clustering after preprocessing. <b>b</b>) Sample clustering after preprocessing and correction for batch and quality using ComBat. Samples that were flagged during quality assessment are highlighted in red. The dissimilarity measure (height) used was 1- Pearson correlation of the log<sub>2</sub>-transformed expression values.</p>
</text><graphic file="1471-2164-14-14-4"/></fig>
				<fig id="F5"><title><p>Figure 5</p></title><caption><p>Boxplots of frma expression</p></caption><text>
   <p><b>Boxplots of frma expression. a</b>) All transcript clusters. <b>b</b>) Genes most affected by quality (adjusted p-value &#8804; 0.01, |fold change| > 2. <b>c</b>) Samples that were flagged during quality assessment are highlighted in red.</p>
</text><graphic file="1471-2164-14-14-5"/></fig>
				<p>SVA identified two surrogate variables that were subsequently used in downstream analysis. Plotting the estimates of these surrogate variables for each sample revealed a pattern whereby samples were clearly grouped by batch and quality (Figure 
					<figr fid="F6">6</figr>). Importantly, SVA identified these two variables without supervision.</p>
				<fig id="F6"><title><p>Figure 6</p></title><caption><p>Surrogate variable analysis results</p></caption><text>
   <p><b>Surrogate variable analysis results. </b>Samples that were flagged during quality assessment are highlighted in red. Two latent variables were identified by SVA. Circles and triangles represent samples from two different batches.</p>
</text><graphic file="1471-2164-14-14-6"/></fig>
				<p>To evaluate the performance of each method, we first compared the number of differentially expressed genes detected between tumour and normal samples at a stringent p-value of 0.01. For our analysis, we did not use a fold change cutoff since we feel that artificial fold change cutoffs, which exclude subtle changes in the expression of many genes, may result in the loss of valuable biological information, or worse, affect the interpretation of the data &#8211; this is particularly true for applications such as network/pathway analysis 
					<abbrgrp>
						<abbr bid="B16">16</abbr>
					</abbrgrp>.</p>
				<p>SVA and ComBat detected 2137 and 1945 genes (p-value &#8804; 0.01), respectively. The top four methods had 1117 differentially expressed genes in common (Figure 
					<figr fid="F7">7</figr>). At the commonly used p-value- and fold change-cutoffs of 0.05 and 2 respectively, SVA, Combat, ArrayWeights and excluding arrays, produced 447, 475, 461 and 521 differentially expressed genes respectively, suggesting similar performance under these criteria. We next assessed the relevance of these differentially expressed genes in colorectal cancer using Ingenuity Pathway Analysis where, statistically significant over-representation of our listed genes in a given process such as &#8220;colorectal tumour&#8221; or &#8220;infection of embryonic cell lines&#8221; is scored by p-value.</p>
				<fig id="F7"><title><p>Figure 7</p></title><caption><p>Venn diagram of unique differentially expressed genes (tumour vs. normal) with adjusted p-values &#8804; 0.01 for the four best-performing methods</p></caption><text>
   <p><b>Venn diagram of unique differentially expressed genes (tumour vs. normal) with adjusted p-values &#8804; 0.01 for the four best-performing methods. </b><b>A</b> - removing quality-flagged arrays before analysis. <b>B</b> - applying SVA to batch corrected data. <b>C</b> - ComBat used to correct for batch and quality. <b>D</b> - Array weights included in the linear model.</p>
</text><graphic file="1471-2164-14-14-7"/></fig>
				<p>We considered the top 10 functions for each method (Table 
					<tblr tid="T2">2</tblr>) from which it was clear that the 615 and 423 additional genes identified as differentially expressed by SVA and ComBat, compared to that obtained when excluding quality-flagged arrays, were certainly relevant to colorectal cancer. Using IPA, we considered the top 10 upstream regulators (highest absolute activation z-scores) when comparing tumour vs. normal samples, to further investigate the utility of SVA or ComBat as suitable analysis methods when including low-RIN samples (Table 
					<tblr tid="T3">3</tblr>). We found considerable overlap in the identity and direction of activation of these upstream regulators between the methods compared.</p>
				<table id="T2">
					<title>
						<p>Table 2</p>
					</title>
					<caption>
						<p>
							<b>P-values for evidence for overrepresentation in the functions listed for each method</b>
						</p>
					</caption>
					<tgroup align="left" cols="6">
						<colspec align="left" colname="c1" colnum="1" colwidth="1*"/>
						<colspec align="center" colname="c2" colnum="2" colwidth="1*"/>
						<colspec align="center" colname="c3" colnum="3" colwidth="1*"/>
						<colspec align="center" colname="c4" colnum="4" colwidth="1*"/>
						<colspec align="center" colname="c5" colnum="5" colwidth="1*"/>
						<colspec align="center" colname="c6" colnum="6" colwidth="1*"/>
						<thead valign="top">
							<row rowsep="1">
								<entry colname="c1">
									<p>
										<b>Functions</b>
									</p>
								</entry>
								<entry align="center" colname="c2">
									<p>
										<b>A</b>
									</p>
								</entry>
								<entry align="center" colname="c3">
									<p>
										<b>B</b>
									</p>
								</entry>
								<entry align="center" colname="c4">
									<p>
										<b>C</b>
									</p>
								</entry>
								<entry align="center" colname="c5">
									<p>
										<b>D</b>
									</p>
								</entry>
								<entry align="center" colname="c6">
									<p>
										<b>E</b>
									</p>
								</entry>
							</row>
						</thead>
						<tfoot>
							<p>A - excluding quality-flagged arrays from the analysis. B - applying SVA to batch corrected data. C - ComBat used to correct for batch and quality. D - Array weights included in the linear model. E - including batch and quality as factors in the linear model.</p>
						</tfoot>
						<tbody valign="top">
							<row rowsep="1">
								<entry colname="c1">
									<p>Cancer</p>
								</entry>
								<entry align="center" colname="c2">
									<p>7.72E-29</p>
								</entry>
								<entry align="center" colname="c3">
									<p>NA</p>
								</entry>
								<entry align="center" colname="c4">
									<p>8.15E-24</p>
								</entry>
								<entry align="center" colname="c5">
									<p>NA</p>
								</entry>
								<entry align="center" colname="c6">
									<p>3.38E-23</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>cancer</p>
								</entry>
								<entry align="center" colname="c2">
									<p>NA</p>
								</entry>
								<entry align="center" colname="c3">
									<p>2.58E-25</p>
								</entry>
								<entry align="center" colname="c4">
									<p>NA</p>
								</entry>
								<entry align="center" colname="c5">
									<p>2.70E-26</p>
								</entry>
								<entry align="center" colname="c6">
									<p>NA</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>carcinoma</p>
								</entry>
								<entry align="center" colname="c2">
									<p>8.64E-37</p>
								</entry>
								<entry align="center" colname="c3">
									<p>2.52E-33</p>
								</entry>
								<entry align="center" colname="c4">
									<p>1.87E-34</p>
								</entry>
								<entry align="center" colname="c5">
									<p>2.87E-32</p>
								</entry>
								<entry align="center" colname="c6">
									<p>5.56E-30</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>colon cancer</p>
								</entry>
								<entry align="center" colname="c2">
									<p>1.30E-26</p>
								</entry>
								<entry align="center" colname="c3">
									<p>1.19E-36</p>
								</entry>
								<entry align="center" colname="c4">
									<p>1.10E-26</p>
								</entry>
								<entry align="center" colname="c5">
									<p>1.99E-21</p>
								</entry>
								<entry align="center" colname="c6">
									<p>3.29E-21</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>colon tumor</p>
								</entry>
								<entry align="center" colname="c2">
									<p>1.10E-26</p>
								</entry>
								<entry align="center" colname="c3">
									<p>4.31E-37</p>
								</entry>
								<entry align="center" colname="c4">
									<p>3.65E-27</p>
								</entry>
								<entry align="center" colname="c5">
									<p>1.80E-21</p>
								</entry>
								<entry align="center" colname="c6">
									<p>7.28E-22</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>colorectal cancer</p>
								</entry>
								<entry align="center" colname="c2">
									<p>2.27E-26</p>
								</entry>
								<entry align="center" colname="c3">
									<p>4.74E-29</p>
								</entry>
								<entry align="center" colname="c4">
									<p>2.43E-26</p>
								</entry>
								<entry align="center" colname="c5">
									<p>1.11E-21</p>
								</entry>
								<entry align="center" colname="c6">
									<p>1.98E-23</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>colorectal tumor</p>
								</entry>
								<entry align="center" colname="c2">
									<p>2.28E-26</p>
								</entry>
								<entry align="center" colname="c3">
									<p>6.80E-29</p>
								</entry>
								<entry align="center" colname="c4">
									<p>2.97E-26</p>
								</entry>
								<entry align="center" colname="c5">
									<p>4.67E-22</p>
								</entry>
								<entry align="center" colname="c6">
									<p>1.72E-23</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>digestive organ tumor</p>
								</entry>
								<entry align="center" colname="c2">
									<p>2.68E-31</p>
								</entry>
								<entry align="center" colname="c3">
									<p>6.82E-32</p>
								</entry>
								<entry align="center" colname="c4">
									<p>1.24E-28</p>
								</entry>
								<entry align="center" colname="c5">
									<p>7.27E-27</p>
								</entry>
								<entry align="center" colname="c6">
									<p>2.72E-29</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>epithelial tumor</p>
								</entry>
								<entry align="center" colname="c2">
									<p>2.16E-38</p>
								</entry>
								<entry align="center" colname="c3">
									<p>NA</p>
								</entry>
								<entry align="center" colname="c4">
									<p>2.27E-35</p>
								</entry>
								<entry align="center" colname="c5">
									<p>NA</p>
								</entry>
								<entry align="center" colname="c6">
									<p>1.11E-30</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>gastrointestinal tract cancer</p>
								</entry>
								<entry align="center" colname="c2">
									<p>2.35E-25</p>
								</entry>
								<entry align="center" colname="c3">
									<p>2.42E-28</p>
								</entry>
								<entry align="center" colname="c4">
									<p>4.00E-24</p>
								</entry>
								<entry align="center" colname="c5">
									<p>3.19E-21</p>
								</entry>
								<entry align="center" colname="c6">
									<p>5.31E-22</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>intestinal cancer</p>
								</entry>
								<entry align="center" colname="c2">
									<p>2.02E-26</p>
								</entry>
								<entry align="center" colname="c3">
									<p>5.77E-29</p>
								</entry>
								<entry align="center" colname="c4">
									<p>2.58E-26</p>
								</entry>
								<entry align="center" colname="c5">
									<p>1.03E-21</p>
								</entry>
								<entry align="center" colname="c6">
									<p>1.55E-23</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>neoplasia</p>
								</entry>
								<entry align="center" colname="c2">
									<p>NA</p>
								</entry>
								<entry align="center" colname="c3">
									<p>1.63E-24</p>
								</entry>
								<entry align="center" colname="c4">
									<p>NA</p>
								</entry>
								<entry align="center" colname="c5">
									<p>1.10E-25</p>
								</entry>
								<entry align="center" colname="c6">
									<p>NA</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>solid tumor</p>
								</entry>
								<entry align="center" colname="c2">
									<p>3.31E-35</p>
								</entry>
								<entry align="center" colname="c3">
									<p>8.07E-32</p>
								</entry>
								<entry align="center" colname="c4">
									<p>6.80E-33</p>
								</entry>
								<entry align="center" colname="c5">
									<p>4.65E-31</p>
								</entry>
								<entry align="center" colname="c6">
									<p>3.88E-29</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>tumorigenesis</p>
								</entry>
								<entry align="center" colname="c2">
									<p>NA</p>
								</entry>
								<entry align="center" colname="c3">
									<p>1.55E-26</p>
								</entry>
								<entry align="center" colname="c4">
									<p>NA</p>
								</entry>
								<entry align="center" colname="c5">
									<p>3.31E-28</p>
								</entry>
								<entry align="center" colname="c6">
									<p>NA</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>uterine serous papillary cancer</p>
								</entry>
								<entry align="center" colname="c2">
									<p>3.46E-21</p>
								</entry>
								<entry align="center" colname="c3">
									<p>1.71E-20</p>
								</entry>
								<entry align="center" colname="c4">
									<p>8.61E-25</p>
								</entry>
								<entry align="center" colname="c5">
									<p>1.26E-22</p>
								</entry>
								<entry align="center" colname="c6">
									<p>1.14E-15</p>
								</entry>
							</row>
						</tbody>
					</tgroup>
				</table>
				<table id="T3">
					<title>
						<p>Table 3</p>
					</title>
					<caption>
						<p>
							<b>Top 10 IPA-derived upstream regulators, by absolute activation z-score</b>
						</p>
					</caption>
					<tgroup align="left" cols="6">
						<colspec align="left" colname="c1" colnum="1" colwidth="1*"/>
						<colspec align="center" colname="c2" colnum="2" colwidth="1*"/>
						<colspec align="center" colname="c3" colnum="3" colwidth="1*"/>
						<colspec align="center" colname="c4" colnum="4" colwidth="1*"/>
						<colspec align="center" colname="c5" colnum="5" colwidth="1*"/>
						<colspec align="center" colname="c6" colnum="6" colwidth="1*"/>
						<thead valign="top">
							<row rowsep="1">
								<entry colname="c1">
									<p>
										<b>A</b>
									</p>
								</entry>
								<entry colname="c2"/>
								<entry colname="c3"/>
								<entry colname="c4"/>
								<entry colname="c5"/>
								<entry colname="c6"/>
							</row>
						</thead>
						<tfoot>
							<p>A - excluding quality-flagged arrays from the analysis. B - applying SVA to batch corrected data. C - ComBat used to correct for batch and quality.</p>
						</tfoot>
						<tbody valign="top">
							<row rowsep="1">
								<entry colname="c1">
									<p>
										<b>Upstream Regulator</b>
									</p>
								</entry>
								<entry align="center" colname="c2">
									<p>
										<b>Log Ratio</b>
									</p>
								</entry>
								<entry align="center" colname="c3">
									<p>
										<b>Molecule Type</b>
									</p>
								</entry>
								<entry align="center" colname="c4">
									<p>
										<b>Predicted Activation State</b>
									</p>
								</entry>
								<entry align="center" colname="c5">
									<p>
										<b>Activation z-score</b>
									</p>
								</entry>
								<entry align="center" colname="c6">
									<p>
										<b>p-value of overlap</b>
									</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>TP53</p>
								</entry>
								<entry colname="c2"/>
								<entry align="center" colname="c3">
									<p>transcription regulator</p>
								</entry>
								<entry align="center" colname="c4">
									<p>Inhibited</p>
								</entry>
								<entry align="center" colname="c5">
									<p>&#8722;4.88</p>
								</entry>
								<entry align="center" colname="c6">
									<p>1.05E-16</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>CDKN1A</p>
								</entry>
								<entry align="center" colname="c2">
									<p>&#8722;0.469</p>
								</entry>
								<entry align="center" colname="c3">
									<p>kinase</p>
								</entry>
								<entry align="center" colname="c4">
									<p>Inhibited</p>
								</entry>
								<entry align="center" colname="c5">
									<p>&#8722;3.274</p>
								</entry>
								<entry align="center" colname="c6">
									<p>4.20E-10</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>TRAF2</p>
								</entry>
								<entry colname="c2"/>
								<entry align="center" colname="c3">
									<p>enzyme</p>
								</entry>
								<entry align="center" colname="c4">
									<p>Activated</p>
								</entry>
								<entry align="center" colname="c5">
									<p>2.804</p>
								</entry>
								<entry align="center" colname="c6">
									<p>3.06E-06</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>CCNK</p>
								</entry>
								<entry colname="c2"/>
								<entry align="center" colname="c3">
									<p>other</p>
								</entry>
								<entry align="center" colname="c4">
									<p>Activated</p>
								</entry>
								<entry align="center" colname="c5">
									<p>2.905</p>
								</entry>
								<entry align="center" colname="c6">
									<p>3.83E-04</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>TNF</p>
								</entry>
								<entry colname="c2"/>
								<entry align="center" colname="c3">
									<p>cytokine</p>
								</entry>
								<entry align="center" colname="c4">
									<p>Activated</p>
								</entry>
								<entry align="center" colname="c5">
									<p>2.935</p>
								</entry>
								<entry align="center" colname="c6">
									<p>7.69E-04</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>IL1B</p>
								</entry>
								<entry colname="c2"/>
								<entry align="center" colname="c3">
									<p>cytokine</p>
								</entry>
								<entry align="center" colname="c4">
									<p>Activated</p>
								</entry>
								<entry align="center" colname="c5">
									<p>2.952</p>
								</entry>
								<entry align="center" colname="c6">
									<p>1.76E-01</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>TP63</p>
								</entry>
								<entry colname="c2"/>
								<entry align="center" colname="c3">
									<p>transcription regulator</p>
								</entry>
								<entry align="center" colname="c4">
									<p>Activated</p>
								</entry>
								<entry align="center" colname="c5">
									<p>3.181</p>
								</entry>
								<entry align="center" colname="c6">
									<p>8.37E-10</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>TREM1</p>
								</entry>
								<entry colname="c2"/>
								<entry align="center" colname="c3">
									<p>other</p>
								</entry>
								<entry align="center" colname="c4">
									<p>Activated</p>
								</entry>
								<entry align="center" colname="c5">
									<p>3.352</p>
								</entry>
								<entry align="center" colname="c6">
									<p>3.69E-05</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>FOXM1</p>
								</entry>
								<entry align="center" colname="c2">
									<p>1.37</p>
								</entry>
								<entry align="center" colname="c3">
									<p>transcription regulator</p>
								</entry>
								<entry align="center" colname="c4">
									<p>Activated</p>
								</entry>
								<entry align="center" colname="c5">
									<p>4.28</p>
								</entry>
								<entry align="center" colname="c6">
									<p>3.71E-17</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>Mek</p>
								</entry>
								<entry colname="c2"/>
								<entry align="center" colname="c3">
									<p>group</p>
								</entry>
								<entry align="center" colname="c4">
									<p>Activated</p>
								</entry>
								<entry align="center" colname="c5">
									<p>4.336</p>
								</entry>
								<entry align="center" colname="c6">
									<p>2.38E-07</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>
										<b>B</b>
									</p>
								</entry>
								<entry colname="c2"/>
								<entry colname="c3"/>
								<entry colname="c4"/>
								<entry colname="c5"/>
								<entry colname="c6"/>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>
										<b>Upstream Regulator</b>
									</p>
								</entry>
								<entry align="center" colname="c2">
									<p>
										<b>Log Ratio</b>
									</p>
								</entry>
								<entry align="center" colname="c3">
									<p>
										<b>Molecule Type</b>
									</p>
								</entry>
								<entry align="center" colname="c4">
									<p>
										<b>Predicted Activation State</b>
									</p>
								</entry>
								<entry align="center" colname="c5">
									<p>
										<b>Activation z-score</b>
									</p>
								</entry>
								<entry align="center" colname="c6">
									<p>
										<b>p-value of overlap</b>
									</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>TP53</p>
								</entry>
								<entry align="center" colname="c2">
									<p>0.622</p>
								</entry>
								<entry align="center" colname="c3">
									<p>transcription regulator</p>
								</entry>
								<entry align="center" colname="c4">
									<p>Inhibited</p>
								</entry>
								<entry align="center" colname="c5">
									<p>&#8722;5.749</p>
								</entry>
								<entry align="center" colname="c6">
									<p>6.48E-12</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>TGM2</p>
								</entry>
								<entry colname="c2"/>
								<entry align="center" colname="c3">
									<p>enzyme</p>
								</entry>
								<entry align="center" colname="c4">
									<p>Inhibited</p>
								</entry>
								<entry align="center" colname="c5">
									<p>&#8722;4.243</p>
								</entry>
								<entry align="center" colname="c6">
									<p>3.64E-02</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>CDKN1A</p>
								</entry>
								<entry align="center" colname="c2">
									<p>&#8722;0.485</p>
								</entry>
								<entry align="center" colname="c3">
									<p>kinase</p>
								</entry>
								<entry align="center" colname="c4">
									<p>Inhibited</p>
								</entry>
								<entry align="center" colname="c5">
									<p>&#8722;3.548</p>
								</entry>
								<entry align="center" colname="c6">
									<p>1.85E-10</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>KDM5B</p>
								</entry>
								<entry colname="c2"/>
								<entry align="center" colname="c3">
									<p>transcription regulator</p>
								</entry>
								<entry align="center" colname="c4">
									<p>Inhibited</p>
								</entry>
								<entry align="center" colname="c5">
									<p>&#8722;3.126</p>
								</entry>
								<entry align="center" colname="c6">
									<p>3.31E-08</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>NFkB (complex)</p>
								</entry>
								<entry colname="c2"/>
								<entry align="center" colname="c3">
									<p>complex</p>
								</entry>
								<entry align="center" colname="c4">
									<p>Activated</p>
								</entry>
								<entry align="center" colname="c5">
									<p>3.034</p>
								</entry>
								<entry align="center" colname="c6">
									<p>3.59E-03</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>TREM1</p>
								</entry>
								<entry colname="c2"/>
								<entry align="center" colname="c3">
									<p>other</p>
								</entry>
								<entry align="center" colname="c4">
									<p>Activated</p>
								</entry>
								<entry align="center" colname="c5">
									<p>3.073</p>
								</entry>
								<entry align="center" colname="c6">
									<p>2.18E-05</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>TP63</p>
								</entry>
								<entry colname="c2"/>
								<entry align="center" colname="c3">
									<p>transcription regulator</p>
								</entry>
								<entry align="center" colname="c4">
									<p>Activated</p>
								</entry>
								<entry align="center" colname="c5">
									<p>3.63</p>
								</entry>
								<entry align="center" colname="c6">
									<p>6.25E-06</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>IL1B</p>
								</entry>
								<entry colname="c2"/>
								<entry align="center" colname="c3">
									<p>cytokine</p>
								</entry>
								<entry align="center" colname="c4">
									<p>Activated</p>
								</entry>
								<entry align="center" colname="c5">
									<p>3.686</p>
								</entry>
								<entry align="center" colname="c6">
									<p>4.13E-01</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>FOXM1</p>
								</entry>
								<entry align="center" colname="c2">
									<p>1.29</p>
								</entry>
								<entry align="center" colname="c3">
									<p>transcription regulator</p>
								</entry>
								<entry align="center" colname="c4">
									<p>Activated</p>
								</entry>
								<entry align="center" colname="c5">
									<p>3.925</p>
								</entry>
								<entry align="center" colname="c6">
									<p>5.82E-11</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>Mek</p>
								</entry>
								<entry colname="c2"/>
								<entry align="center" colname="c3">
									<p>group</p>
								</entry>
								<entry align="center" colname="c4">
									<p>Activated</p>
								</entry>
								<entry align="center" colname="c5">
									<p>4.771</p>
								</entry>
								<entry align="center" colname="c6">
									<p>7.08E-08</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>
										<b>C</b>
									</p>
								</entry>
								<entry colname="c2"/>
								<entry colname="c3"/>
								<entry colname="c4"/>
								<entry colname="c5"/>
								<entry colname="c6"/>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>
										<b>Upstream Regulator</b>
									</p>
								</entry>
								<entry align="center" colname="c2">
									<p>
										<b>Log Ratio</b>
									</p>
								</entry>
								<entry align="center" colname="c3">
									<p>
										<b>Molecule Type</b>
									</p>
								</entry>
								<entry align="center" colname="c4">
									<p>
										<b>Predicted Activation State</b>
									</p>
								</entry>
								<entry align="center" colname="c5">
									<p>
										<b>Activation z-score</b>
									</p>
								</entry>
								<entry align="center" colname="c6">
									<p>
										<b>p-value of overlap</b>
									</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>TP53</p>
								</entry>
								<entry colname="c2"/>
								<entry align="center" colname="c3">
									<p>transcription regulator</p>
								</entry>
								<entry align="center" colname="c4">
									<p>Inhibited</p>
								</entry>
								<entry align="center" colname="c5">
									<p>&#8722;5.126</p>
								</entry>
								<entry align="center" colname="c6">
									<p>1.30E-13</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>CDKN1A</p>
								</entry>
								<entry align="center" colname="c2">
									<p>&#8722;0.496</p>
								</entry>
								<entry align="center" colname="c3">
									<p>kinase</p>
								</entry>
								<entry align="center" colname="c4">
									<p>Inhibited</p>
								</entry>
								<entry align="center" colname="c5">
									<p>&#8722;3.534</p>
								</entry>
								<entry align="center" colname="c6">
									<p>5.99E-10</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>TGM2</p>
								</entry>
								<entry colname="c2"/>
								<entry align="center" colname="c3">
									<p>enzyme</p>
								</entry>
								<entry align="center" colname="c4">
									<p>Inhibited</p>
								</entry>
								<entry align="center" colname="c5">
									<p>&#8722;3.402</p>
								</entry>
								<entry align="center" colname="c6">
									<p>4.25E-02</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>miR-483-3p</p>
								</entry>
								<entry colname="c2"/>
								<entry align="center" colname="c3">
									<p>mature microRNA</p>
								</entry>
								<entry align="center" colname="c4">
									<p>Inhibited</p>
								</entry>
								<entry align="center" colname="c5">
									<p>&#8722;3.153</p>
								</entry>
								<entry align="center" colname="c6">
									<p>6.49E-03</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>EGFR</p>
								</entry>
								<entry colname="c2"/>
								<entry align="center" colname="c3">
									<p>kinase</p>
								</entry>
								<entry align="center" colname="c4">
									<p>Activated</p>
								</entry>
								<entry align="center" colname="c5">
									<p>3.104</p>
								</entry>
								<entry align="center" colname="c6">
									<p>4.43E-03</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>IL1B</p>
								</entry>
								<entry colname="c2"/>
								<entry align="center" colname="c3">
									<p>cytokine</p>
								</entry>
								<entry align="center" colname="c4">
									<p>Activated</p>
								</entry>
								<entry align="center" colname="c5">
									<p>3.281</p>
								</entry>
								<entry align="center" colname="c6">
									<p>1.73E-01</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>TP63</p>
								</entry>
								<entry colname="c2"/>
								<entry align="center" colname="c3">
									<p>transcription regulator</p>
								</entry>
								<entry align="center" colname="c4">
									<p>Activated</p>
								</entry>
								<entry align="center" colname="c5">
									<p>3.524</p>
								</entry>
								<entry align="center" colname="c6">
									<p>1.48E-09</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>TREM1</p>
								</entry>
								<entry colname="c2"/>
								<entry align="center" colname="c3">
									<p>other</p>
								</entry>
								<entry align="center" colname="c4">
									<p>Activated</p>
								</entry>
								<entry align="center" colname="c5">
									<p>3.845</p>
								</entry>
								<entry align="center" colname="c6">
									<p>5.74E-06</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>FOXM1</p>
								</entry>
								<entry align="center" colname="c2">
									<p>1.398</p>
								</entry>
								<entry align="center" colname="c3">
									<p>transcription regulator</p>
								</entry>
								<entry align="center" colname="c4">
									<p>Activated</p>
								</entry>
								<entry align="center" colname="c5">
									<p>4.386</p>
								</entry>
								<entry align="center" colname="c6">
									<p>4.18E-16</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>Mek</p>
								</entry>
								<entry colname="c2"/>
								<entry align="center" colname="c3">
									<p>group</p>
								</entry>
								<entry align="center" colname="c4">
									<p>Activated</p>
								</entry>
								<entry align="center" colname="c5">
									<p>4.654</p>
								</entry>
								<entry align="center" colname="c6">
									<p>9.72E-08</p>
								</entry>
							</row>
						</tbody>
					</tgroup>
				</table>
			</sec>
			<sec>
				<st>
					<p>qRT-PCR validation of select genes</p>
				</st>
				<p>In order to ascertain whether or not data obtained by microarray analysis with low-RIN samples were comparable to the results obtained using the method designed by Antonov et al for qPCR analysis of low-RIN samples, we selected two genes, dipeptidase 1 (DPEP1) and claudin 1 (CLDN1), for qRT-PCR validation. Given that our microarray data analysis suggests ~95% of genes are unaffected by RNA integrity, we wished to compare microarray and qPCR data for genes that were apparently unaffected by RNA integrity; DPEP1 and CLDN1 were found to be significantly differentially expressed in our microarray data by all of the five methods used and, in addition, there is strong literature evidence for their differential expression between tumour and normal samples. From reference genes previously cited as suitable for colorectal cancer studies, we selected those most stably expressed in our cohort using the Normfinder algorithm (UBC, B2M, ATP5E) 
					<abbrgrp>
						<abbr bid="B17">17</abbr>
						<abbr bid="B18">18</abbr>
						<abbr bid="B19">19</abbr>
						<abbr bid="B20">20</abbr>
						<abbr bid="B21">21</abbr>
					</abbrgrp>. We found good correlations, for both CLDN1 (Adjusted R<sup>2</sup> = 0.81) and DPEP1 (Adjusted R<sup>2</sup> = 0.83), between qRT-PCR- and microarray-based fold change values (Figure 
					<figr fid="F8">8</figr>), irrespective of RIN score.</p>
				<fig id="F8"><title><p>Figure 8</p></title><caption><p>DPEP1 and CLDN1 tumour vs. normal fold change (FC) results for qRT-PCR and microarray results</p></caption><text>
   <p><b>DPEP1 and CLDN1 tumour vs. normal fold change (FC) results for qRT-PCR and microarray results. </b>Samples that were flagged during quality assessment are highlighted in red.</p>
</text><graphic file="1471-2164-14-14-8"/></fig>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Discussion</p>
			</st>
			<p>RNA is extremely vulnerable to degradation and as such has the potential to introduce a systematic bias in gene expression measures. Reliable measures of sample and data quality are therefore essential to evaluate the effects of RNA integrity on accuracy, sensitivity and specificity of gene expression results. From previous studies as well as our own, it is now clear that the level of acceptable RNA degradation within an experiment depends largely on the experimental design, platform and application. Multiple studies have demonstrated an improvement in microarray and qRT-PCR performance by using random priming when RNA integrity is in doubt. Here we observed a direct association between RINs and array quality in the majority of cases. To gauge the consequences of using these arrays in downstream analysis, we compared quality-flagged to quality-passed arrays and found a relatively small subset of genes, 1172/20019, to be significantly affected (p-value 0.05, FC &#8805; |2|) in our samples on the Gene 1.0 ST platform. It is of course possible that the exact identity and proportion of the affected genes may differ between studies on Gene 1.0 ST arrays but, based on our data, we suggest that the overall proportion of affected genes is unlikely to be significantly different to that observed here. Depending on the application, this may or may not have an effect on the study outcome. However, the most common microarray applications such as finding differentially expressed genes between two conditions, pathway analysis, and clustering do not rely on interrogating specific genes and appear to be largely robust to the effects of RNA degradation on this platform (Table 
				<tblr tid="T2">2</tblr>).</p>
			<p>Using within- and between-array quality measures, we investigated the relationship between RNA integrity and array quality on Affymetrix Gene 1.0 ST arrays. We found a combination of within- and between-array quality measures useful to rank samples by array quality. However, the single most useful array quality measure appears to be GNUSE, since it provides a more general measure of array quality relative to a large set of publically available arrays. We found that 86% of samples with RINs &#8804; 3.3 were flagged by at least two of our quality control measures. One sample with RIN score &lt; 3 passed all three quality measures, although it did have relatively low array quality weight. Furthermore, 10 out of 17 samples with RIN scores &#8804; 7 passed at least 2 out of 3 quality measures, suggesting that the widely used RIN cutoff of 7 is too stringent for Gene 1.0 ST arrays.</p>
			<p>We then examined the genes most affected by RNA degradation and demonstrated a relationship between accuracy and length of the original transcript, with both longer than average, and very short transcripts being under- and overrepresented in quality-flagged samples respectively. This is in contrast to the findings by Opitz et al who found that short transcripts were more vulnerable to the perceived effects of degradation, whereas long transcripts were more stable relative to the average length transcript 
				<abbrgrp>
					<abbr bid="B6">6</abbr>
				</abbrgrp>. Interestingly, of the genes that were overrepresented in quality-flagged samples, 70% were small non-protein coding RNAs, including 94 small nucleolar RNAs, and 4 microRNAs, consistent with reports that microRNAs are more robust to RNA degradation compared to mRNA 
				<abbrgrp>
					<abbr bid="B22">22</abbr>
				</abbrgrp>, perhaps because they are more thermodynamically stable than mRNAs.</p>
			<p>Without excluding any genes, we then compared the orthogonal approaches of either excluding quality-flagged arrays or compensating for RNA degradation at different steps in the analysis. Sample clustering showed that when using ComBat adjustment, quality-flagged samples no longer clustered together. Furthermore, samples tend to segregate more clearly by disease status following adjustment, which suggests that the algorithm is not introducing artifacts. It is worth noting that patients 13, 4 and 18 were diagnosed with a hereditary form of CRC (HNPCC) &#8211; it is therefore not surprising that the &#8216;normal&#8217; samples from these patients form a separate cluster.</p>
			<p>Irrespective of sample/array quality, applying compensatory measures for RNA degradation performed at least as well as excluding arrays that were flagged during quality assessment, as judged by gene expression analysis and IPA. At a p-value of 0.01, SVA and Combat detected the highest number of differentially expressed genes between tumour and normal samples and the top four methods applied here had 1117 differentially expressed genes in common. To evaluate the biological plausibility of the genes deemed significantly differentially expressed between tumour and normal samples, we harnessed the results from IPA to show that, in terms of the top scoring biological functions and upstream regulators, there is considerable overlap in the identity and direction of biological activation when comparing analysis methods that either excluded or included quality-flagged arrays. These results suggest that our analysis strategies are biologically sound and not biased by non-biological variance.</p>
			<p>The relevance of each method will depend on the downstream application and the proportion of quality-flagged arrays: If a small percentage of arrays are flagged, there might not be much benefit in including them for downstream analysis. However, if a large proportion of the arrays are affected by RNA quality &#8211; which is likely to often be the case where the RNA is derived from irreplaceable historical clinical samples &#8211; the ability to retain all arrays and to account for these effects in the analysis will be valuable. Here, ComBat may be useful if direct data adjustment is required, e.g. for sample/gene clustering. On the other hand, for analysis of differential expression, especially when the source of non-biological variance is not immediately apparent, SVA may be most useful since it does not require supervision; notably, in our hands SVA was able to identify two surrogate variables which closely corresponded to &#8220;batch&#8221; and &#8220;quality&#8221; factors, judged by the grouping of samples. To establish whether the measures used here to compensate for quality-effects are superior to excluding these arrays from the analysis will require a controlled study with known true- and false-positives where the discriminatory power of each method can be objectively investigated. However, the significant overlap observed between the differentially expressed genes identified by the different approaches used here, combined with the considerable overlaps in both biological function and upstream regulators identified by pathway analysis of the resultant data, argues against a simple expansion of false positives when lower quality array data is included in the analyses. The quality assessment and data analysis methods discussed here should in principle be as useful for Affymetrix Exon ST array analysis as well.</p>
		</sec>
		<sec>
			<st>
				<p>Conclusions</p>
			</st>
			<p>In conclusion, array quality measures can be used to set quality thresholds, to provide valuable information that can be used to improve the linear model of differential expression, or to correct expression signal prior to assessing differential expression. We suggest that accounting for known or unknown sources of variation, such as variable RNA integrity and batch, by implementing ComBat or Surrogate Variable Analysis for analysis of differential gene expression enables robust analysis of microarray datasets derived from variable and low quality RNA, thereby extending the range of clinical samples that are suitable for microarray analysis.</p>
		</sec>
		<sec>
			<st>
				<p>Methods</p>
			</st>
			<sec>
				<st>
					<p>Sample collection and storage</p>
				</st>
				<p>Paired colorectal patient samples (diseased tumour tissue and adjacent healthy gut epithelial tissue) were collected during surgical resection of previously untreated patients at the Groote Schuur Hospital, Cape Town, South Africa. The samples were frozen immediately in liquid nitrogen and stored at -80&#176;C. Ethical consent was obtained (UCT HREC REF 416/2005) and each patient provided written informed consent to donate samples from the tissues left over after surgical resection to subsequent molecular studies.</p>
			</sec>
			<sec>
				<st>
					<p>Sample preparation and quality control</p>
				</st>
				<p>Frozen samples were transitioned to RNA&#174;later-ICE (Ambion), an RNA stabilisation solution, using dry ice to prevent thawing of the tissue at any stage. RNA was extracted using a Dounce homogenizer and the AllPrep DNA/RNA/Protein kit (Qiagen) including DNAse treatment. RNAseZap (Ambion) was used to eliminate RNAse from the work surface, pipettes and glassware. RNA integrity assessment was conducted on an Agilent Bioanalyser 2100.</p>
			</sec>
			<sec>
				<st>
					<p>Quantitative real-time PCR</p>
				</st>
				<p>From a biological perspective, we used the stability of expression of housekeeping genes to investigate the effect of RNA integrity on array- and qRT-PCR performance. Gene candidates were selected from those previously been specifically identified as good reference genes for colorectal cancer 
					<abbrgrp>
						<abbr bid="B17">17</abbr>
						<abbr bid="B18">18</abbr>
						<abbr bid="B19">19</abbr>
						<abbr bid="B20">20</abbr>
						<abbr bid="B21">21</abbr>
					</abbrgrp>. Expression stabilities were ranked using the Normfinder algorithm 
					<abbrgrp>
						<abbr bid="B23">23</abbr>
					</abbrgrp> and three genes were selected for use as reference genes. All primers except those for <it>b2m</it> 
					<abbrgrp>
						<abbr bid="B24">24</abbr>
					</abbrgrp> were designed using Primer-BLAST - sequences are shown in Table 
					<tblr tid="T4">4</tblr>. Experiments were performed in triplicate on a Roche LightCycler&#174; 480 Real-Time PCR System in 96-well format. Efficiency was determined for each primer pair using a two-fold dilution series across five points for five patient samples of varying RNA integrity. For each patient, tumour vs. normal fold change was determined based on the method of Antonov et al whereby the Ct of the test gene is normalised by the geometric mean of multiple control genes 
					<abbrgrp>
						<abbr bid="B9">9</abbr>
					</abbrgrp>. Since our efficiencies were quite low in some cases, we adapted the Antonov et al method to include primer efficiency as shown in the equation below:	</p>
				<p>
					<display-formula>
						<m:math name="1471-2164-14-14-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mfrac>
   <m:mrow>
      <m:msup>
         <m:mi>e</m:mi>
         <m:mrow>
            <m:mi mathvariant="normal">&#916;</m:mi>
            <m:mi mathvariant="italic">Ct</m:mi>
            <m:mfenced open="(" close=")">
               <m:mi>t</m:mi>
            </m:mfenced>
         </m:mrow>
      </m:msup>
   </m:mrow>
   <m:mrow>
      <m:mroot>
         <m:mrow>
            <m:mi>n</m:mi>
            <m:mo>+</m:mo>
            <m:mn>1</m:mn>
         </m:mrow>
         <m:mrow>
            <m:mspace width="0.25em"/>
            <m:msubsup>
               <m:mi>e</m:mi>
               <m:mi>i</m:mi>
               <m:mrow>
                  <m:mi mathvariant="normal">&#916;</m:mi>
                  <m:mi mathvariant="italic">Ct</m:mi>
                  <m:mfenced open="(" close=")">
                     <m:mi>i</m:mi>
                  </m:mfenced>
               </m:mrow>
            </m:msubsup>
            <m:mo>&#215;</m:mo>
            <m:msubsup>
               <m:mi>e</m:mi>
               <m:mrow>
                  <m:mi>i</m:mi>
                  <m:mo>+</m:mo>
                  <m:mn>1</m:mn>
               </m:mrow>
               <m:mrow>
                  <m:mi mathvariant="normal">&#916;</m:mi>
                  <m:mi mathvariant="italic">Ct</m:mi>
                  <m:mfenced open="(" close=")">
                     <m:mrow>
                        <m:mi>i</m:mi>
                        <m:mo>+</m:mo>
                        <m:mn>1</m:mn>
                     </m:mrow>
                  </m:mfenced>
               </m:mrow>
            </m:msubsup>
            <m:mo>&#8230;</m:mo>
            <m:mo>&#215;</m:mo>
            <m:msubsup>
               <m:mi>e</m:mi>
               <m:mrow>
                  <m:mi>i</m:mi>
                  <m:mo>+</m:mo>
                  <m:mi>n</m:mi>
               </m:mrow>
               <m:mrow>
                  <m:mi mathvariant="normal">&#916;</m:mi>
                  <m:mi mathvariant="italic">Ct</m:mi>
                  <m:mfenced open="(" close=")">
                     <m:mrow>
                        <m:mi>i</m:mi>
                        <m:mo>+</m:mo>
                        <m:mi>n</m:mi>
                     </m:mrow>
                  </m:mfenced>
               </m:mrow>
            </m:msubsup>
         </m:mrow>
      </m:mroot>
   </m:mrow>
</m:mfrac>
</m:math>
					</display-formula>
				</p>
				<p>where <it>t</it> represents the test gene, <it>e</it> represents efficiency and <it>i</it> represents the control gene(s).</p>
				<table id="T4">
					<title>
						<p>Table 4</p>
					</title>
					<caption>
						<p>
							<b>Primers used for qRT-PCR analysis</b>
						</p>
					</caption>
					<tgroup align="left" cols="4">
						<colspec align="left" colname="c1" colnum="1" colwidth="1*"/>
						<colspec align="left" colname="c2" colnum="2" colwidth="1*"/>
						<colspec align="left" colname="c3" colnum="3" colwidth="1*"/>
						<colspec align="left" colname="c4" colnum="4" colwidth="1*"/>
						<thead valign="top">
							<row rowsep="1">
								<entry colname="c1">
									<p>
										<b>Test genes</b>
									</p>
								</entry>
								<entry colname="c2">
									<p>
										<b>Forward primers (5' - 3')</b>
									</p>
								</entry>
								<entry colname="c3">
									<p>
										<b>Reverse primers (5' - 3')</b>
									</p>
								</entry>
								<entry colname="c4">
									<p>
										<b>Product (bp)</b>
									</p>
								</entry>
							</row>
						</thead>
						<tbody valign="top">
							<row rowsep="1">
								<entry colname="c1">
									<p>dpep1</p>
								</entry>
								<entry colname="c2">
									<p>
										<monospace>GACAACTGGCTGGTGGACA</monospace>
									</p>
								</entry>
								<entry colname="c3">
									<p>
										<monospace>ACCACACGCTGCCCAAA</monospace>
									</p>
								</entry>
								<entry colname="c4">
									<p>74</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>cldn1</p>
								</entry>
								<entry colname="c2">
									<p>
										<monospace>GCTGTCATTGGGGGTGCGAT</monospace>
									</p>
								</entry>
								<entry colname="c3">
									<p>
										<monospace>GGCAACTAAAATAGCCAGACCTGC</monospace>
									</p>
								</entry>
								<entry colname="c4">
									<p>54</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>
										<b>Reference genes</b>
									</p>
								</entry>
								<entry colname="c2"/>
								<entry colname="c3"/>
								<entry colname="c4"/>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>ubc</p>
								</entry>
								<entry colname="c2">
									<p>
										<monospace>GGTCGCAGTTCTTGTTTGTGG</monospace>
									</p>
								</entry>
								<entry colname="c3">
									<p>
										<monospace>CACGAAGATCTGCATTGTCAAG</monospace>
									</p>
								</entry>
								<entry colname="c4">
									<p>59</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>b2m</p>
								</entry>
								<entry colname="c2">
									<p>
										<monospace>TGCTGTCTCCATGTTTGATGTATCT</monospace>
									</p>
								</entry>
								<entry colname="c3">
									<p>
										<monospace>TCTCTGCTCCCCACCTCTAAGT</monospace>
									</p>
								</entry>
								<entry colname="c4">
									<p>86</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>atp5e</p>
								</entry>
								<entry colname="c2">
									<p>
										<monospace>CTGGACTCAGCTACATCCGA</monospace>
									</p>
								</entry>
								<entry colname="c3">
									<p>
										<monospace>GCATCTCTCACTGCTTTTGCAC</monospace>
									</p>
								</entry>
								<entry colname="c4">
									<p>55</p>
								</entry>
							</row>
						</tbody>
					</tgroup>
				</table>
			</sec>
			<sec>
				<st>
					<p>Microarray analysis: Affymetrix HuGene 1.0 ST expression arrays</p>
				</st>
				<p>Thirty-four samples with A260/230 ratios of at least 1.6, RINs of at least 2 and no sign of genomic DNA contamination, were selected for microarray analysis. The samples were amplified from 200ng of total RNA in accordance with the Ambion&#174; WT Expression assay kit and fragmented and end labeled in accordance with the Affymetrix&#174; GeneChip&#174; WT Terminal Labeling protocol. The prepared targets were hybridized overnight to Affymetrix Human Gene 1.0 ST arrays. Following hybridization, the arrays were washed and stained using the GeneChip Fluidics Station 450 and scanned using the GeneChip&#174; Scanner 3000 7G. Arrays were processed in two batches - batch one had 10 arrays, and batch two 24. Individual patient pairs were not split across batches.</p>
			</sec>
			<sec>
				<st>
					<p>Microarray quality assessment and data analysis</p>
				</st>
				<p>Standard Affymetrix quality control was conducted using Expression Console&#174; Software: The quality of cDNA preparation and array hybridisation was assessed using appropriate spike-in controls at each stage.</p>
				<p>Raw array quality was investigated at the probe level by 1) the difference between the mean of the perfect match probes and the mean of the background probes for each array as well as 2) the coefficient of variation (CV) across all probes for each array. A threshold for the CV across probes was set as two standard deviations from the mean CV, where the mean was calculated from arrays with RINs &gt; 6. The data was preprocessed in R using the Bioconductor packages frma 
					<abbrgrp>
						<abbr bid="B25">25</abbr>
					</abbrgrp>, oligo 
					<abbrgrp>
						<abbr bid="B26">26</abbr>
					</abbrgrp>, and the ComBat algorithm for batch correction 
					<abbrgrp>
						<abbr bid="B12">12</abbr>
					</abbrgrp>. Preprocessed data quality was assessed using the global normalised, unscaled standard error (GNUSE) 
					<abbrgrp>
						<abbr bid="B14">14</abbr>
					</abbrgrp>. The SE estimates are normalized such that for each probe set, the median standard error across all arrays is equal to 1. Since most genes are not expected to be differentially expressed, boxplots for each array should be centered around 1. Samples with a median GNUSE of greater than 1.25 were flagged for downstream analysis. This threshold is fairly arbitrary and has not been validated for the Gene 1.0 ST platform but roughly equates to having a precision that is on average 25% worse than the average Gene 1.0 ST array 
					<abbrgrp>
						<abbr bid="B14">14</abbr>
					</abbrgrp>.</p>
				<p>Five comparative methods for analysis of differential expression were individually applied to the preprocessed data: 1) The arrayWeights function in the Bioconductor package limma 
					<abbrgrp>
						<abbr bid="B27">27</abbr>
					</abbrgrp> was used to estimate array quality weights which were then included in the linear model fit; 2) Arrays that were flagged in array quality assessment were excluded from the analysis; 3) The ComBat algorithm 
					<abbrgrp>
						<abbr bid="B12">12</abbr>
					</abbrgrp> for batch correction was applied to directly adjust the data according to quality, where arrays were divided into two categories according to the array quality assessment; 4) &#8220;Quality&#8221; and &#8220;batch&#8221; were included as a factors in the linear model together with disease status; 5) Surrogate variable analysis was applied to frma-processed data without any direct adjustment, the output from SVA being incorporated into the linear model fit 
					<abbrgrp>
						<abbr bid="B13">13</abbr>
					</abbrgrp>.</p>
				<p>To rank genes by evidence for differential expression, the eBayes function in limma was applied to compute moderated t-statistics, moderated F-statistic, and log-odds of differential expression by empirical Bayes shrinkage of the standard errors towards a common value 
					<abbrgrp>
						<abbr bid="B27">27</abbr>
					</abbrgrp>. Next, using the topTable function in limma, p-values were adjusted for multiple hypothesis testing using the Benjamini and Hochberg method 
					<abbrgrp>
						<abbr bid="B28">28</abbr>
					</abbrgrp>. Transcript clusters were annotated in R using the Bioconductor package hugene10sttranscriptcluster.db (Affymetrix Human Gene 1.0-ST Array Transcriptcluster Revision 8 annotation data, assembled using data from public repositories).</p>
				<p>The subset of genes differentially affected by RNA quality was similarly obtained, now using array quality for grouping, instead of disease status. Genes with adjusted p-values &#8804; 0.05 and FCs &#8805; |2| were included in the analysis. Transcript length was obtained for all annotated transcript clusters using the Bioconductor package goseq 
					<abbrgrp>
						<abbr bid="B29">29</abbr>
					</abbrgrp>. Hierarchical clustering with average linkage and Euclidian distance as distance measure was performed in R using the hclust function.</p>
				<p>For Ingenuity Pathway Analysis, genes that were found to be significantly differentially expressed for each method (adjusted p-value &#8804; 0.01), were used as input for IPAs &#8220;Core Analysis&#8221;. Here, statistically significant over-representation of our listed genes in a given process such as &#8220;colorectal tumour&#8221; or &#8220;infection of embryonic cell lines&#8221; is scored by p-value, using the right-tailed Fisher&#8217;s Exact Test. In the case of upstream regulators, the predicted activation state and activation z-score is based on the direction of fold change values for genes in the input dataset for which an experimentally observed causal relationship has been established. Performance was assessed using the top 10 functions in terms of p-values for each method while taking into account the relevance of the function to colorectal cancer.</p>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Competing interests</p>
			</st>
			<p>The authors declare that they have no competing interests.</p>
		</sec>
		<sec>
			<st>
				<p>Authors&#8217; contributions</p>
			</st>
			<p>KV carried out the sample preparation and RNA extraction and performed the data analysis. KV and JB conceived and designed the study and wrote the manuscript. Both authors read and approved the final manuscript.</p>
		</sec>
	</bdy>
	<bm>
		<ack>
			<sec>
				<st>
					<p>Acknowledgements</p>
				</st>
				<p>We wish to thank the Cancer Association of South Africa, (CANSA) for their financial support of this project. KV wishes to thank the University of Cape Town, the Harry Crossley Foundation, the National Research Foundation (NRF) and the German Academic Exchange Service (DAAD) for their financial support through bursaries. JB thanks the NRF for a South African Research Chair grant. We thank Mr. Ryan Goosen and Ms. Jo McBride for their technical assistance.</p>
			</sec>
		</ack>
		<refgrp><bibl id="B1"><title><p>Effect of agonal and postmortem factors on gene expression profile: quality control in microarray analyses of postmortem human brain</p></title><aug><au><snm>Tomita</snm><fnm>H</fnm></au><au><snm>Vawter</snm><fnm>MP</fnm></au><au><snm>Walsh</snm><fnm>DM</fnm></au><au><snm>Evans</snm><fnm>SJ</fnm></au><au><snm>Choudary</snm><fnm>PV</fnm></au><au><snm>Li</snm><fnm>J</fnm></au><au><snm>Overman</snm><fnm>KM</fnm></au><au><snm>Atz</snm><fnm>ME</fnm></au><au><snm>Myers</snm><fnm>RM</fnm></au><au><snm>Jones</snm><fnm>EG</fnm></au><au><snm>Watson</snm><fnm>SJ</fnm></au><au><snm>Akil</snm><fnm>H</fnm></au><au><snm>Bunney</snm><fnm>WE</fnm></au></aug><source>Biological Psychiatry</source><pubdate>2004</pubdate><volume>55</volume><issue>4</issue><fpage>346</fpage><lpage>352</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.biopsych.2003.10.013</pubid><pubid idtype="pmcid">3098566</pubid><pubid idtype="pmpid" link="fulltext">14960286</pubid></pubidlist></xrefbib></bibl><bibl id="B2"><title><p>Partially Degraded RNA from Bladder Washing is a Suitable Sample for Studying Gene Expression Profiles in Bladder Cancer</p></title><aug><au><snm>Mengual</snm><fnm>L</fnm></au><au><snm>Burset</snm><fnm>M</fnm></au><au><snm>Ars</snm><fnm>E</fnm></au><au><snm>Ribal</snm><fnm>MJ</fnm></au><au><snm>Lozano</snm><fnm>JJ</fnm></au><au><snm>Minana</snm><fnm>B</fnm></au><au><snm>Sumoy</snm><fnm>L</fnm></au><au><snm>Alcaraz</snm><fnm>A</fnm></au></aug><source>European Urology</source><pubdate>2006</pubdate><volume>50</volume><fpage>1347</fpage><lpage>1356</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.eururo.2006.05.039</pubid><pubid idtype="pmpid" link="fulltext">16815626</pubid></pubidlist></xrefbib></bibl><bibl id="B3"><title><p>Acquisition of biologically relevant gene expression data by Affymetrix microarray analysis of archival formalin-fixed paraffin-embedded tumours</p></title><aug><au><snm>Linton</snm><fnm>KM</fnm></au><au><snm>Hey</snm><fnm>Y</fnm></au><au><snm>Saunders</snm><fnm>E</fnm></au><au><snm>Jeziorska</snm><fnm>M</fnm></au><au><snm>Denton</snm><fnm>J</fnm></au><au><snm>Wilson</snm><fnm>CL</fnm></au><au><snm>Swindell</snm><fnm>R</fnm></au><au><snm>Dibben</snm><fnm>S</fnm></au><au><snm>Miller</snm><fnm>CJ</fnm></au><au><snm>Pepper</snm><fnm>SD</fnm></au><au><snm>Radford</snm><fnm>JA</fnm></au><au><snm>Freemont</snm><fnm>AJ</fnm></au></aug><source>British Journal of Cancer</source><pubdate>2008</pubdate><volume>98</volume><fpage>1403</fpage><lpage>1414</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/sj.bjc.6604316</pubid><pubid idtype="pmcid">2361698</pubid><pubid idtype="pmpid" link="fulltext">18382428</pubid></pubidlist></xrefbib></bibl><bibl id="B4"><title><p>Methods comparison for high-resolution transcriptional analysis of archival material on Affymetrix Plus 2.0 and Exon 1.0 microarrays</p></title><aug><au><snm>Linton</snm><fnm>K</fnm></au><au><snm>Hey</snm><fnm>Y</fnm></au><au><snm>Dibben</snm><fnm>S</fnm></au><au><snm>Miller</snm><fnm>C</fnm></au><au><snm>Freemont</snm><fnm>A</fnm></au><au><snm>Radford</snm><fnm>J</fnm></au><au><snm>Pepper</snm><fnm>S</fnm></au></aug><source>BioTechniques</source><pubdate>2009</pubdate><volume>47</volume><fpage>587</fpage><lpage>596</lpage><xrefbib><pubidlist><pubid idtype="doi">10.2144/000113169</pubid><pubid idtype="pmpid" link="fulltext">19594443</pubid></pubidlist></xrefbib></bibl><bibl id="B5"><title><p>Whole-Genome Gene Expression Profiling of Formalin-Fixed, Paraffin-Embedded Tissue Samples</p></title><aug><au><snm>April</snm><fnm>C</fnm></au><au><snm>Klotzle</snm><fnm>B</fnm></au><au><snm>Royce</snm><fnm>T</fnm></au><au><snm>Wickham-garcia</snm><fnm>E</fnm></au><au><snm>Boyaniwsky</snm><fnm>T</fnm></au><au><snm>Izzo</snm><fnm>J</fnm></au><au><snm>Cox</snm><fnm>D</fnm></au><au><snm>Jones</snm><fnm>W</fnm></au><au><snm>Rubio</snm><fnm>R</fnm></au><au><snm>Holton</snm><fnm>K</fnm></au><au><snm>Matulonis</snm><fnm>U</fnm></au><au><snm>Quackenbush</snm><fnm>J</fnm></au><au><snm>Fan</snm><fnm>J</fnm></au></aug><source>PloS one</source><pubdate>2009</pubdate><volume>4</volume><issue>12</issue><fpage>1</fpage><lpage>10</lpage></bibl><bibl id="B6"><title><p>Impact of RNA degradation on gene expression profiling</p></title><aug><au><snm>Opitz</snm><fnm>L</fnm></au><au><snm>Salinas-riester</snm><fnm>G</fnm></au><au><snm>Grade</snm><fnm>M</fnm></au><au><snm>Jung</snm><fnm>K</fnm></au><au><snm>Jo</snm><fnm>P</fnm></au><au><snm>Emons</snm><fnm>G</fnm></au><au><snm>Ghadimi</snm><fnm>BM</fnm></au><au><snm>Bei&#255; barth</snm><fnm>T</fnm></au><au><snm>Gaedcke</snm><fnm>J</fnm></au></aug><source>BMC Medical Genomics</source><pubdate>2010</pubdate><volume>3</volume><issue>36</issue><fpage>1</fpage><lpage>14</lpage><xrefbib><pubidlist><pubid idtype="pmcid">2822734</pubid><pubid idtype="pmpid" link="fulltext">20092628</pubid></pubidlist></xrefbib></bibl><bibl id="B7"><title><p>RNA integrity and the effect on the real-time qRT-PCR performance</p></title><aug><au><snm>Fleige</snm><fnm>S</fnm></au><au><snm>Pfaffl</snm><fnm>MW</fnm></au></aug><source>Molecular aspects of medicine</source><pubdate>2006</pubdate><volume>27</volume><fpage>126</fpage><lpage>139</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.mam.2005.12.003</pubid><pubid idtype="pmpid" link="fulltext">16469371</pubid></pubidlist></xrefbib></bibl><bibl id="B8"><title><p>A novel approach for reliable microarray analysis of microdissected tumor cells from formalin-fixed and paraffin-embedded colorectal cancer resection specimens</p></title><aug><au><snm>Lassmann</snm><fnm>S</fnm></au><au><snm>Kreutz</snm><fnm>C</fnm></au><au><snm>Schoepflin</snm><fnm>A</fnm></au><au><snm>Hopt</snm><fnm>U</fnm></au><au><snm>Timmer</snm><fnm>J</fnm></au><au><snm>Werner</snm><fnm>M</fnm></au></aug><source>Journal of molecular medicine</source><pubdate>2009</pubdate><volume>87</volume><fpage>211</fpage><lpage>224</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1007/s00109-008-0419-y</pubid><pubid idtype="pmpid" link="fulltext">19066834</pubid></pubidlist></xrefbib></bibl><bibl id="B9"><title><p>Reliable gene expression measurements from degraded RNA by quantitative real-time PCR depend on short amplicons and a proper normalization</p></title><aug><au><snm>Antonov</snm><fnm>J</fnm></au><au><snm>Goldstein</snm><fnm>DR</fnm></au><au><snm>Oberli</snm><fnm>A</fnm></au><au><snm>Baltzer</snm><fnm>A</fnm></au><au><snm>Pirotta</snm><fnm>M</fnm></au><au><snm>Fleischmann</snm><fnm>A</fnm></au><au><snm>Altermatt</snm><fnm>HJ</fnm></au><au><snm>Jaggi</snm><fnm>R</fnm></au></aug><source>Laboratory Investigation</source><pubdate>2005</pubdate><volume>85</volume><fpage>1040</fpage><lpage>1050</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/labinvest.3700303</pubid><pubid idtype="pmpid" link="fulltext">15951835</pubid></pubidlist></xrefbib></bibl><bibl id="B10"><title><p>"Hook"-calibration of GeneChip-microarrays: chip characteristics and expression measures</p></title><aug><au><snm>Binder</snm><fnm>H</fnm></au><au><snm>Krohn</snm><fnm>K</fnm></au><au><snm>Preibisch</snm><fnm>S</fnm></au></aug><source>Algorithms for molecular biology</source><pubdate>2008</pubdate><note><b>3:</b>11.</note></bibl><bibl id="B11"><title><p>Preprocessing and quality control strategies for Illumina DASL assay-based brain gene expression studies with semi- degraded samples</p></title><aug><au><snm>Chow</snm><fnm>ML</fnm></au><au><snm>Winn</snm><fnm>ME</fnm></au><au><snm>Li</snm><fnm>HR</fnm></au><au><snm>April</snm><fnm>C</fnm></au><au><snm>Wynshaw-Boris</snm><fnm>A</fnm></au><au><snm>Fan</snm><fnm>JB</fnm></au><au><snm>Fu</snm><fnm>X</fnm></au><au><snm>Courchesne</snm><fnm>E</fnm></au><au><snm>Schork</snm><fnm>NJ</fnm></au></aug><source>Frontiers in Genetics</source><pubdate>2008</pubdate><volume>3</volume><fpage>11</fpage></bibl><bibl id="B12"><title><p>Adjusting batch effects in microarray expression data using empirical Bayes methods</p></title><aug><au><snm>Johnson</snm><fnm>WE</fnm></au><au><snm>Li</snm><fnm>C</fnm></au></aug><source>Biostatistics</source><pubdate>2007</pubdate><volume>8</volume><fpage>118</fpage><lpage>127</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/biostatistics/kxj037</pubid><pubid idtype="pmpid" link="fulltext">16632515</pubid></pubidlist></xrefbib></bibl><bibl id="B13"><title><p>Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis</p></title><aug><au><snm>Leek</snm><fnm>JT</fnm></au><au><snm>Storey</snm><fnm>JD</fnm></au></aug><source>PLoS Genetics</source><pubdate>2007</pubdate><volume>3</volume><issue>9</issue><fpage>1724</fpage><lpage>1735</lpage><xrefbib><pubidlist><pubid idtype="pmcid">1994707</pubid><pubid idtype="pmpid" link="fulltext">17907809</pubid></pubidlist></xrefbib></bibl><bibl id="B14"><title><p>Assessing affymetrix GeneChip microarray quality</p></title><aug><au><snm>McCall</snm><fnm>MN</fnm></au><au><snm>Murakami</snm><fnm>PN</fnm></au><au><snm>Lukk</snm><fnm>M</fnm></au><au><snm>Huber</snm><fnm>W</fnm></au><au><snm>Irizarry</snm><fnm>R</fnm></au></aug><source>BMC bioinformatics</source><pubdate>2011</pubdate><note><b>12:</b>137.</note></bibl><bibl id="B15"><title><p>Judging the Quality of Gene Expression-Based Clustering Methods Using Gene Annotation</p></title><aug><au><snm>Gibbons</snm><fnm>FD</fnm></au><au><snm>Roth</snm><fnm>FP</fnm></au></aug><source>Genome Research</source><pubdate>2002</pubdate><volume>12</volume><fpage>1574</fpage><lpage>1581</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.397002</pubid><pubid idtype="pmcid">187526</pubid><pubid idtype="pmpid" link="fulltext">12368250</pubid></pubidlist></xrefbib></bibl><bibl id="B16"><title><p>Fold change and p-value cutoffs significantly alter microarray interpretations</p></title><aug><au><snm>Dalman</snm><fnm>MR</fnm></au><au><snm>Anthony</snm><fnm>D</fnm></au><au><snm>Gayathri</snm><fnm>N</fnm></au><au><snm>Zhong-Hui</snm><fnm>D</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2012</pubdate><volume>13</volume><issue>Suppl 2</issue><fpage>S11</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-13-S2-S11</pubid><pubid idtype="pmcid">3426808</pubid><pubid idtype="pmpid" link="fulltext">23320832</pubid></pubidlist></xrefbib></bibl><bibl id="B17"><title><p>Gene expression signature to improve prognosis prediction of stage II and III colorectal cancer</p></title><aug><au><snm>Salazar</snm><fnm>R</fnm></au><au><snm>Roepman</snm><fnm>P</fnm></au><au><snm>Capella</snm><fnm>G</fnm></au><au><snm>Moreno</snm><fnm>V</fnm></au><au><snm>Simon</snm><fnm>I</fnm></au><au><snm>Dreezen</snm><fnm>C</fnm></au><au><snm>Lopez-Doriga</snm><fnm>A</fnm></au><au><snm>Santos</snm><fnm>C</fnm></au><au><snm>Marijnen</snm><fnm>C</fnm></au><au><snm>Westerga</snm><fnm>J</fnm></au><au><snm>Bruin</snm><fnm>S</fnm></au><au><snm>Kerr</snm><fnm>D</fnm></au><au><snm>Kuppen</snm><fnm>P</fnm></au><au><snm>van de Velde</snm><fnm>C</fnm></au><au><snm>Morreau</snm><fnm>H</fnm></au><au><snm>Van Velthuysen</snm><fnm>L</fnm></au><au><snm>Glas</snm><fnm>AM</fnm></au><au><snm>Van&apos;t Veer</snm><fnm>LJ</fnm></au><au><snm>Tollenaar</snm><fnm>R</fnm></au></aug><source>Journal of clinical oncology</source><pubdate>2011</pubdate><volume>29</volume><fpage>17</fpage><lpage>24</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1200/JCO.2010.30.1077</pubid><pubid idtype="pmpid" link="fulltext">21098318</pubid></pubidlist></xrefbib></bibl><bibl id="B18"><title><p>Relationship between tumor gene expression and recurrence in four independent studies of patients with stage II/III colon cancer treated with surgery alone or surgery plus adjuvant fluorouracil plus leucovorin</p></title><aug><au><snm>O&apos;Connell</snm><fnm>MJ</fnm></au><au><snm>Lavery</snm><fnm>I</fnm></au><au><snm>Yothers</snm><fnm>G</fnm></au><au><snm>Paik</snm><fnm>S</fnm></au><au><snm>Clark-Langone</snm><fnm>KM</fnm></au><au><snm>Lopatin</snm><fnm>M</fnm></au><au><snm>Watson</snm><fnm>D</fnm></au><au><snm>Baehner</snm><fnm>FL</fnm></au><au><snm>Shak</snm><fnm>S</fnm></au><au><snm>Baker</snm><fnm>J</fnm></au><au><snm>Cowens</snm><fnm>JW</fnm></au><au><snm>Wolmark</snm><fnm>N</fnm></au></aug><source>Journal of clinical oncology</source><pubdate>2010</pubdate><volume>28</volume><issue>25</issue><fpage>3937</fpage><lpage>3944</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1200/JCO.2010.28.9538</pubid><pubid idtype="pmcid">2940392</pubid><pubid idtype="pmpid" link="fulltext">20679606</pubid></pubidlist></xrefbib></bibl><bibl id="B19"><title><p>Normalizing genes for quantitative RT-PCR in differentiating human intestinal epithelial cells and adenocarcinomas of the colon</p></title><aug><au><snm>Dydensborg</snm><fnm>AB</fnm></au><au><snm>Herring</snm><fnm>E</fnm></au><au><snm>Auclair</snm><fnm>J</fnm></au><au><snm>Tremblay</snm><fnm>E</fnm></au><au><snm>Beaulieu</snm><fnm>JF</fnm></au></aug><source>American Journal of Gastrointestinal and Liver Physiology</source><pubdate>2006</pubdate><volume>290</volume><fpage>G1067</fpage><lpage>G1074</lpage></bibl><bibl id="B20"><title><p>Housekeeping gene variability in normal and cancerous colorectal, pancreatic, esophageal, gastric and hepatic tissues</p></title><aug><au><snm>Rubie</snm><fnm>C</fnm></au><au><snm>Kempf</snm><fnm>K</fnm></au><au><snm>Hans</snm><fnm>J</fnm></au><au><snm>Su</snm><fnm>T</fnm></au><au><snm>Tilton</snm><fnm>B</fnm></au><au><snm>Georg</snm><fnm>T</fnm></au><au><snm>Brittner</snm><fnm>B</fnm></au><au><snm>Ludwig</snm><fnm>B</fnm></au><au><snm>Schilling</snm><fnm>M</fnm></au></aug><source>Molecular and cellular probes</source><pubdate>2005</pubdate><volume>19</volume><fpage>101</fpage><lpage>109</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">15680211</pubid></xrefbib></bibl><bibl id="B21"><title><p>Identification of endogenous control genes for normalisation of real-time quantitative PCR data in colorectal cancer</p></title><aug><au><snm>Kheirelseid</snm><fnm>EH</fnm></au><au><snm>Chang</snm><fnm>KH</fnm></au><au><snm>Newell</snm><fnm>J</fnm></au><au><snm>Kerin</snm><fnm>MJ</fnm></au><au><snm>Miller</snm><fnm>N</fnm></au></aug><source>BMC molecular biology</source><pubdate>2010</pubdate><note><b>11:</b>12.</note></bibl><bibl id="B22"><title><p>Robust microRNA stability in degraded RNA preparations from human tissue and cell samples</p></title><aug><au><snm>Jung</snm><fnm>M</fnm></au><au><snm>Schaefer</snm><fnm>A</fnm></au><au><snm>Steiner</snm><fnm>I</fnm></au><au><snm>Kempkensteffen</snm><fnm>C</fnm></au><au><snm>Stephan</snm><fnm>C</fnm></au><au><snm>Erbersdobler</snm><fnm>A</fnm></au><au><snm>Jung</snm><fnm>K</fnm></au></aug><source>Clinical chemistry</source><pubdate>2010</pubdate><volume>56</volume><issue>6</issue><fpage>998</fpage><lpage>1006</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1373/clinchem.2009.141580</pubid><pubid idtype="pmpid" link="fulltext">20378769</pubid></pubidlist></xrefbib></bibl><bibl id="B23"><title><p>Normalization of real-time quantitative reverse transcription-PCR data: a model-based variance estimation approach to identify genes suited for normalization, applied to bladder and colon cancer data sets</p></title><aug><au><snm>Andersen</snm><fnm>CL</fnm></au><au><snm>Jensen</snm><fnm>JL</fnm></au><au><snm>&#216;rntoft</snm><fnm>TF</fnm></au></aug><source>Cancer research</source><pubdate>2004</pubdate><volume>64</volume><issue>15</issue><fpage>5245</fpage><lpage>5250</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1158/0008-5472.CAN-04-0496</pubid><pubid idtype="pmpid" link="fulltext">15289330</pubid></pubidlist></xrefbib></bibl><bibl id="B24"><title><p>Accurate normalization of real-time quantitative RT -PCR data by geometric averaging of multiple internal control genes</p></title><aug><au><snm>Vandesompele</snm><fnm>J</fnm></au><au><snm>Preter</snm><fnm>KD</fnm></au><au><snm>Poppe</snm><fnm>B</fnm></au><au><snm>Roy</snm><fnm>NV</fnm></au><au><snm>Paepe</snm><fnm>AD</fnm></au></aug><source>Genome biology</source><pubdate>2002</pubdate><volume>3</volume><issue>7</issue><fpage>1</fpage><lpage>12</lpage></bibl><bibl id="B25"><title><p>Frozen robust multiarray analysis (fRMA)</p></title><aug><au><snm>Mccall</snm><fnm>MN</fnm></au><au><snm>Bolstad</snm><fnm>BM</fnm></au><au><snm>Irizarry</snm><fnm>RA</fnm></au></aug><source>Biostatistics</source><pubdate>2010</pubdate><volume>11</volume><issue>2</issue><fpage>242</fpage><lpage>253</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/biostatistics/kxp059</pubid><pubid idtype="pmcid">2830579</pubid><pubid idtype="pmpid" link="fulltext">20097884</pubid></pubidlist></xrefbib></bibl><bibl id="B26"><title><p>Processing and Analyzing Affymetrix SNP Chips with Bioconductor</p></title><aug><au><snm>Carvalho</snm><fnm>B</fnm></au><au><snm>Irizarry</snm><fnm>RA</fnm></au><au><snm>Scharpf</snm><fnm>RB</fnm></au><au><snm>Carey</snm><fnm>VJ</fnm></au></aug><source>Stat Biosci</source><pubdate>2009</pubdate><volume>1</volume><fpage>160</fpage><lpage>180</lpage><xrefbib><pubid idtype="doi">10.1007/s12561-009-9015-0</pubid></xrefbib></bibl><bibl id="B27"><title><p>Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments</p></title><aug><au><snm>Smyth</snm><fnm>GK</fnm></au></aug><source>Statistical Applications in Genetics and Molecular Biology</source><pubdate>2004</pubdate><note><b>3:</b>3.</note></bibl><bibl id="B28"><title><p>Controlling the False Discovery Rate: a Practical and Powerful Approach to Multiple Testing</p></title><aug><au><snm>Benjamini</snm><fnm>Y</fnm></au><au><snm>Hochberg</snm><fnm>Y</fnm></au></aug><source>Journal of the Royal Statistical Society</source><pubdate>1995</pubdate><volume>57</volume><fpage>289</fpage><lpage>300</lpage></bibl><bibl id="B29"><title><p>Gene ontology analysis for RNA-seq: accounting for selection bias</p></title><aug><au><snm>Young</snm><fnm>MD</fnm></au><au><snm>Wakefield</snm><fnm>MJ</fnm></au><au><snm>Smyth</snm><fnm>GK</fnm></au><au><snm>Oshlack</snm><fnm>A</fnm></au></aug><source>Genome Biology</source><pubdate>2010</pubdate><volume>11</volume><fpage>R14</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/gb-2010-11-2-r14</pubid><pubid idtype="pmcid">2872874</pubid><pubid idtype="pmpid" link="fulltext">20132535</pubid></pubidlist></xrefbib></bibl></refgrp>
	</bm>
</art>