<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
	<ui>1471-2105-8-S2-S6</ui>
	<ji>1471-2105</ji>
	<fm>
		<dochead>Research</dochead>
		<bibl>
			<title>
				<p>Robust imputation method for missing values in microarray data</p>
			</title>
			<aug>
				<au id="A1">
					<snm>Yoon</snm>
					<fnm>Dankyu</fnm>
					<insr iid="I1"/>
					<email>avanti@chol.com</email>
				</au>
				<au id="A2">
					<snm>Lee</snm>
					<fnm>Eun-Kyung</fnm>
					<insr iid="I2"/>
					<email>lee.eunk@gmail.com</email>
				</au>
				<au id="A3" ca="yes">
					<snm>Park</snm>
					<fnm>Taesung</fnm>
					<insr iid="I2"/>
					<email>tspark@stats.snu.ac.kr</email>
				</au>
			</aug>
			<insg>
				<ins id="I1">
					<p>Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Korea</p>
				</ins>
				<ins id="I2">
					<p>Department of Statistics, College of Natural Science, Seoul National University, San 56-1, Shin Lim-Dong, Kwanak-ku, Seoul, 151-742, Korea</p>
				</ins>
			</insg>
			<source>BMC Bioinformatics</source>
			<supplement>
				<title>
					<p>Probabilistic Modeling and Machine Learning in Structural and Systems Biology</p>
				</title>
				<editor>Samuel Kaski, Juho Rousu, Esko Ukkonen</editor>
				<note>Research</note>
			</supplement>
			<conference>
				<title>
					<p>Probabilistic Modeling and Machine Learning in Structural and Systems Biology</p>
				</title>
				<location>Tuusula, Finland</location>
				<date-range>17&#8211;18 June 2006</date-range>
				<url>http://www.cs.helsinki.fi/group/bioinfo/events/pmsb06/</url>
			</conference>
			<issn>1471-2105</issn>
			<pubdate>2007</pubdate>
			<volume>8</volume>
			<issue>Suppl 2</issue>
			<fpage>S6</fpage>
			<url>http://www.biomedcentral.com/1471-2105/8/S2/S6</url>
			<xrefbib>
				<pubidlist><pubid idtype="pmpid">17493255</pubid><pubid idtype="doi">10.1186/1471-2105-8-S2-S6</pubid>
				</pubidlist></xrefbib>
		</bibl>
		<history>
			<pub>
				<date>
					<day>03</day>
					<month>5</month>
					<year>2007</year>
				</date>
			</pub>
		</history>
		<cpyrt>
			<year>2007</year>
			<collab>Yoon et al; licensee BioMed Central Ltd.</collab>
			<note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
		</cpyrt>
		<abs>
			<sec>
				<st>
					<p>Abstract</p>
				</st>
				<sec>
					<st>
						<p>Background</p>
					</st>
					<p>When analyzing microarray gene expression data, missing values are often encountered. Most multivariate statistical methods proposed for microarray data analysis cannot be applied when the data have missing values. Numerous imputation algorithms have been proposed to estimate the missing values. In this study, we develop a robust least squares estimation with principal components (RLSP) method by extending the local least square imputation (LLSimpute) method. The basic idea of our method is to employ quantile regression to estimate the missing values, using the estimated principal components of a selected set of similar genes.</p>
				</sec>
				<sec>
					<st>
						<p>Results</p>
					</st>
					<p>Using the normalized root mean squares error, the performance of the proposed method was evaluated and compared with other previously proposed imputation methods. The proposed RLSP method clearly outperformed the weighted <it>k</it>-nearest neighbors imputation (kNNimpute) method and LLSimpute method, and showed competitive results with Bayesian principal component analysis (BPCA) method.</p>
				</sec>
				<sec>
					<st>
						<p>Conclusion</p>
					</st>
					<p>Adapting the principal components of the selected genes and employing the quantile regression model improved the robustness and accuracy of missing value imputation. Thus, the proposed RLSP method is, according to our empirical studies, more robust and accurate than the widely used kNNimpute and LLSimpute methods.</p>
				</sec>
			</sec>
		</abs>
	</fm>
	<bdy>
		<sec>
			<st>
				<p>Background</p>
			</st>
			<p>Microarray experiment technique has been successfully applied to a variety of biological studies including cancer classification, discovery of the unknown gene function, and identification of effects of a specific therapy. When analyzing microarray data, we often face missing values due to various factors such as scratches on the slide, spotting problems, dusts, experimental errors, and so on. In practice, every experiment contains missing entries and sometimes more than 90% of the genes in the microarray experiment are affected <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Moreover, most of the classic multivariate analysis methods for microarray data cannot be used when the data have missing values. Therefore, we need to treat missing values appropriately.</p>
			<p>An easy way to handle missing data is to repeat the whole experiment. However, often it is not a realistic option secondary to economic limitations and/or scarcity of available biological material <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. Accordingly, many missing value estimation methods have been developed. The weighted <it>k</it>-nearest neighbors imputation method (kNNimpute) selects genes with expression profiles similar to the gene of interest to impute missing values <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. The singular value decomposition method (SVDimpute) employs a singular value decomposition to obtain to a set of mutually orthogonal patterns that can be linearly combined to approximate the expression of all genes in the data set <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. In a comparative study presented by Troyanskaya et al. <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, kNNimpute is more robust and accurate than SVDimpute.</p>
			<p>Least squares imputation (LSimpute) is a regression-based method using the correlation between both genes and arrays <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. LSimpute showed best performance when data have a strong local correlation structure. Local least squares imputation (LLSimpute) is an extension of LSimpute method which selects <it>k </it>similar genes by <it>L</it><sub>2</sub>-norm or Pearson correlation and applies multiple regression to impute missing values <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>.</p>
			<p>Bayesian principal component analysis (BPCA) uses a Bayesian estimation algorithm to predict missing values <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. BPCA suggests using the number of samples minus 1 as the number of principal axes. Since BPCA uses an EM-like repetitive algorithm to estimate missing values, it needs intensive computations to impute missing values. Gaussian mixture imputation (GMCimpute) estimates missing values using Gaussian mixture and model averaging <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Collateral missing value imputation (CMVE) <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> predicts missing values based on a multiple covariance-based imputation matrices and performs imputation using least square regression and linear programming methods.</p>
			<p>Recently, several imputation methods using a priori information to impute missing values have been proposed such as a set theoretic framework approach based on projection onto convex sets (POCS) <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> and an approach based on the functional similarities of gene ontology <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. While most traditional missing imputation methods treated spots as binary value such as missing or present, weighted nearest neighbours method (WeNNI) adopted a continuous spot quality weight for the missing value estimation <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>.</p>
			<p>Among these methods, kNNimpute, LSimpute and LLSimpute are most commonly used because they are easy to apply with less computational burdens. Note that LSimpute and LLSimpute are regression based methods and kNNimpute can also be regarded as a regression based method for the simple intercept model. In this paper, we focus on these regression based methods and present their improvements.</p>
			<p>kNNimpute and LLSimpute both use the <it>k </it>selected genes to estimate missing values. Kim <it>et al</it>. <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> showed LLSimpute performed well for a large value of <it>k</it>, say over 200. However, it is inefficient to use such a large number of genes to estimate one missing value from a practical point of view. Furthermore, there is no guarantee that the selected <it>k </it>is sufficiently large enough for LLSimpute to perform well. Surprisingly, the performance of LLSimpute becomes very poor when <it>k </it>is close to the number of samples.</p>
			<p>On the other hand, kNNimpute performs well with relatively small values of <it>k</it>. For example, kNNimpute suggests using 10 or 15 similar genes. However, kNNimpute performs poorly when <it>k </it>is too small or too large. Its performance depends on the sample size and the correlations between genes. Therefore, kNNimpute is negatively affected by a badly chosen <it>k</it>.</p>
			<p>To overcome the limitations of these regression based imputation methods, we propose the robust least square estimation with principal components (RLSP) method. RLSP is an improved version of LLSimpute. We use the estimated principal components of the selected genes and apply quantile regression to estimate missing values with the estimated principal components. Note that the most imputation methods are not robust to outliers. RLSP performs well even when <it>k </it>is small. Moreover RLSP shows similar performance with LLSimpute when <it>k </it>is large. Therefore, RLSP is more robust to the choice of <it>k</it>.</p>
			<p>The normalized root mean squared error (NRMSE) is used to evaluate the differences in performances between the proposed RLSP method and the other imputation methods for various missing rates <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. The RLSP method clearly outperforms the LLSimpute method.</p>
		</sec>
		<sec>
			<st>
				<p>Methods</p>
			</st>
			<p>A whole gene expression profile is represented by a <it>G </it>&#215; <it>N </it>matrix, <b>Y</b>, where the rows correspond to the genes, the columns correspond to the experiments (samples), and the entry <it>Y</it><sub><it>i</it>,<it>j </it></sub>is the expression level of gene <it>i </it>in experiment <it>j</it>. For simplicity, we assume that the target gene vector <b>g* </b>has a missing value at the first sample, denoted by <it>&#945;</it>. For the <it>k </it>selected similar genes, let <m:math name="1471-2105-8-S2-S6-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mstyle mathvariant="bold" mathsize="normal"><m:mi>g</m:mi></m:mstyle><m:mrow><m:msub><m:mi>s</m:mi><m:mi>j</m:mi></m:msub></m:mrow></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWHNbWzdaWgaaWcbaGaem4Cam3aaSbaaWqaaiabdQgaQbqabaaaleqaaaaa@3137@</m:annotation></m:semantics></m:math> be a <it>N </it>&#215; 1 vector consisting of the <it>j</it>th selected genes with its first element <m:math name="1471-2105-8-S2-S6-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>w</m:mi><m:mrow><m:msub><m:mi>s</m:mi><m:mi>j</m:mi></m:msub></m:mrow></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWG3bWDdaWgaaWcbaGaem4Cam3aaSbaaWqaaiabdQgaQbqabaaaleqaaaaa@3153@</m:annotation></m:semantics></m:math>, where <it>s</it><sub><it>j </it></sub>denote the index for representing <it>k </it>selected genes for <it>j </it>= 1,...,<it>k</it>. The selected similar genes have complete values without missing observations. Then,</p>
			<p>
				<m:math name="1471-2105-8-S2-S6-i3" xmlns:m="http://www.w3.org/1998/Math/MathML">
					<m:semantics>
						<m:mrow>
							<m:mrow>
								<m:mo>(</m:mo>
								<m:mrow>
									<m:mtable>
										<m:mtr>
											<m:mtd>
												<m:mrow>
													<m:msup>
														<m:mstyle mathvariant="bold" mathsize="normal">
															<m:mi>g</m:mi>
														</m:mstyle>
														<m:mrow>
															<m:mstyle mathvariant="bold" mathsize="normal">
																<m:mo>*</m:mo>
															</m:mstyle>
															<m:mi>T</m:mi>
														</m:mrow>
													</m:msup>
												</m:mrow>
											</m:mtd>
										</m:mtr>
										<m:mtr>
											<m:mtd>
												<m:mrow>
													<m:msubsup>
														<m:mstyle mathvariant="bold" mathsize="normal">
															<m:mi>g</m:mi>
														</m:mstyle>
														<m:mrow>
															<m:msub>
																<m:mstyle mathvariant="bold" mathsize="normal">
																	<m:mi>s</m:mi>
																</m:mstyle>
																<m:mstyle mathvariant="bold" mathsize="normal">
																	<m:mn>1</m:mn>
																</m:mstyle>
															</m:msub>
														</m:mrow>
														<m:mi>T</m:mi>
													</m:msubsup>
												</m:mrow>
											</m:mtd>
										</m:mtr>
										<m:mtr>
											<m:mtd>
												<m:mrow>
													<m:msubsup>
														<m:mstyle mathvariant="bold" mathsize="normal">
															<m:mi>g</m:mi>
														</m:mstyle>
														<m:mrow>
															<m:msub>
																<m:mstyle mathvariant="bold" mathsize="normal">
																	<m:mi>s</m:mi>
																</m:mstyle>
																<m:mstyle mathvariant="bold" mathsize="normal">
																	<m:mn>2</m:mn>
																</m:mstyle>
															</m:msub>
														</m:mrow>
														<m:mi>T</m:mi>
													</m:msubsup>
												</m:mrow>
											</m:mtd>
										</m:mtr>
										<m:mtr>
											<m:mtd>
												<m:mo>&#8942;</m:mo>
											</m:mtd>
										</m:mtr>
										<m:mtr>
											<m:mtd>
												<m:mrow>
													<m:msubsup>
														<m:mstyle mathvariant="bold" mathsize="normal">
															<m:mi>g</m:mi>
														</m:mstyle>
														<m:mrow>
															<m:msub>
																<m:mstyle mathvariant="bold" mathsize="normal">
																	<m:mi>s</m:mi>
																</m:mstyle>
																<m:mstyle mathvariant="bold" mathsize="normal">
																	<m:mi>k</m:mi>
																</m:mstyle>
															</m:msub>
														</m:mrow>
														<m:mi>T</m:mi>
													</m:msubsup>
												</m:mrow>
											</m:mtd>
										</m:mtr>
									</m:mtable>
								</m:mrow>
								<m:mo>)</m:mo>
							</m:mrow>
							<m:mo>=</m:mo>
							<m:mrow>
								<m:mo>(</m:mo>
								<m:mrow>
									<m:mtable>
										<m:mtr>
											<m:mtd>
												<m:mi>&#945;</m:mi>
											</m:mtd>
											<m:mtd>
												<m:mrow>
													<m:msub>
														<m:mi>y</m:mi>
														<m:mn>1</m:mn>
													</m:msub>
												</m:mrow>
											</m:mtd>
											<m:mtd>
												<m:mrow>
													<m:msub>
														<m:mi>y</m:mi>
														<m:mn>2</m:mn>
													</m:msub>
												</m:mrow>
											</m:mtd>
											<m:mtd>
												<m:mo>&#8943;</m:mo>
											</m:mtd>
											<m:mtd>
												<m:mrow>
													<m:msub>
														<m:mi>y</m:mi>
														<m:mrow>
															<m:mi>N</m:mi>
															<m:mo>&#8722;</m:mo>
															<m:mn>1</m:mn>
														</m:mrow>
													</m:msub>
												</m:mrow>
											</m:mtd>
										</m:mtr>
										<m:mtr>
											<m:mtd>
												<m:mrow>
													<m:msub>
														<m:mi>w</m:mi>
														<m:mrow>
															<m:msub>
																<m:mi>s</m:mi>
																<m:mn>1</m:mn>
															</m:msub>
														</m:mrow>
													</m:msub>
												</m:mrow>
											</m:mtd>
											<m:mtd>
												<m:mrow>
													<m:msub>
														<m:mi>x</m:mi>
														<m:mrow>
															<m:msub>
																<m:mi>s</m:mi>
																<m:mn>1</m:mn>
															</m:msub>
															<m:mo>,</m:mo>
															<m:mn>1</m:mn>
														</m:mrow>
													</m:msub>
												</m:mrow>
											</m:mtd>
											<m:mtd>
												<m:mrow>
													<m:msub>
														<m:mi>x</m:mi>
														<m:mrow>
															<m:msub>
																<m:mi>s</m:mi>
																<m:mn>1</m:mn>
															</m:msub>
															<m:mo>,</m:mo>
															<m:mn>2</m:mn>
														</m:mrow>
													</m:msub>
												</m:mrow>
											</m:mtd>
											<m:mtd>
												<m:mo>&#8943;</m:mo>
											</m:mtd>
											<m:mtd>
												<m:mrow>
													<m:msub>
														<m:mi>x</m:mi>
														<m:mrow>
															<m:msub>
																<m:mi>s</m:mi>
																<m:mn>1</m:mn>
															</m:msub>
															<m:mo>,</m:mo>
															<m:mi>N</m:mi>
															<m:mo>&#8722;</m:mo>
															<m:mn>1</m:mn>
														</m:mrow>
													</m:msub>
												</m:mrow>
											</m:mtd>
										</m:mtr>
										<m:mtr>
											<m:mtd>
												<m:mo>&#8942;</m:mo>
											</m:mtd>
											<m:mtd>
												<m:mo>&#8942;</m:mo>
											</m:mtd>
											<m:mtd>
												<m:mo>&#8942;</m:mo>
											</m:mtd>
											<m:mtd>
												<m:mo>&#8942;</m:mo>
											</m:mtd>
											<m:mtd>
												<m:mo>&#8942;</m:mo>
											</m:mtd>
										</m:mtr>
										<m:mtr>
											<m:mtd>
												<m:mrow>
													<m:msub>
														<m:mi>w</m:mi>
														<m:mrow>
															<m:msub>
																<m:mi>s</m:mi>
																<m:mi>k</m:mi>
															</m:msub>
														</m:mrow>
													</m:msub>
												</m:mrow>
											</m:mtd>
											<m:mtd>
												<m:mrow>
													<m:msub>
														<m:mi>x</m:mi>
														<m:mrow>
															<m:msub>
																<m:mi>s</m:mi>
																<m:mi>k</m:mi>
															</m:msub>
															<m:mo>,</m:mo>
															<m:mn>1</m:mn>
														</m:mrow>
													</m:msub>
												</m:mrow>
											</m:mtd>
											<m:mtd>
												<m:mrow>
													<m:msub>
														<m:mi>x</m:mi>
														<m:mrow>
															<m:msub>
																<m:mi>s</m:mi>
																<m:mi>k</m:mi>
															</m:msub>
															<m:mo>,</m:mo>
															<m:mn>2</m:mn>
														</m:mrow>
													</m:msub>
												</m:mrow>
											</m:mtd>
											<m:mtd>
												<m:mo>&#8943;</m:mo>
											</m:mtd>
											<m:mtd>
												<m:mrow>
													<m:msub>
														<m:mi>x</m:mi>
														<m:mrow>
															<m:msub>
																<m:mi>s</m:mi>
																<m:mi>k</m:mi>
															</m:msub>
															<m:mo>,</m:mo>
															<m:mi>N</m:mi>
															<m:mo>&#8722;</m:mo>
															<m:mn>1</m:mn>
														</m:mrow>
													</m:msub>
												</m:mrow>
											</m:mtd>
										</m:mtr>
									</m:mtable>
								</m:mrow>
								<m:mo>)</m:mo>
							</m:mrow>
							<m:mo>=</m:mo>
							<m:mrow>
								<m:mo>(</m:mo>
								<m:mrow>
									<m:mtable>
										<m:mtr>
											<m:mtd>
												<m:mi>&#945;</m:mi>
											</m:mtd>
											<m:mtd>
												<m:mrow>
													<m:msup>
														<m:mstyle mathvariant="bold" mathsize="normal">
															<m:mi>y</m:mi>
														</m:mstyle>
														<m:mi>T</m:mi>
													</m:msup>
												</m:mrow>
											</m:mtd>
										</m:mtr>
										<m:mtr>
											<m:mtd>
												<m:mstyle mathvariant="bold" mathsize="normal">
													<m:mi>w</m:mi>
												</m:mstyle>
											</m:mtd>
											<m:mtd>
												<m:mstyle mathvariant="bold" mathsize="normal">
													<m:mi>X</m:mi>
												</m:mstyle>
											</m:mtd>
										</m:mtr>
									</m:mtable>
								</m:mrow>
								<m:mo>)</m:mo>
							</m:mrow>
							<m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
							<m:mrow>
								<m:mo>(</m:mo>
								<m:mn>1</m:mn>
								<m:mo>)</m:mo>
							</m:mrow>
						</m:mrow>
						<m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqadaqaauaabeqafeaaaaqaaiabhEgaNnaaCaaaleqabaGaeCOkaOIaemivaqfaaaGcbaGaeC4zaC2aa0baaSqaaiabhohaZnaaBaaameaacqWHXaqmaeqaaaWcbaGaemivaqfaaaGcbaGaeC4zaC2aa0baaSqaaiabhohaZnaaBaaameaacqWHYaGmaeqaaaWcbaGaemivaqfaaaGcbaGaeSO7I0eabaGaeC4zaC2aa0baaSqaaiabhohaZnaaBaaameaacqWHRbWAaeqaaaWcbaGaemivaqfaaaaaaOGaayjkaiaawMcaaiabg2da9maabmaabaqbaeqabqqbaaaaaeaaiiGacqWFXoqyaeaacqWG5bqEdaWgaaWcbaGaeGymaedabeaaaOqaaiabdMha5naaBaaaleaacqaIYaGmaeqaaaGcbaGaeS47IWeabaGaemyEaK3aaSbaaSqaaiabd6eaojabgkHiTiabigdaXaqabaaakeaacqWG3bWDdaWgaaWcbaGaem4Cam3aaSbaaWqaaiabigdaXaqabaaaleqaaaGcbaGaemiEaG3aaSbaaSqaaiabdohaZnaaBaaameaacqaIXaqmaeqaaSGaeiilaWIaeGymaedabeaaaOqaaiabdIha4naaBaaaleaacqWGZbWCdaWgaaadbaGaeGymaedabeaaliabcYcaSiabikdaYaqabaaakeaacqWIVlctaeaacqWG4baEdaWgaaWcbaGaem4Cam3aaSbaaWqaaiabigdaXaqabaWccqGGSaalcqWGobGtcqGHsislcqaIXaqmaeqaaaGcbaGaeSO7I0eabaGaeSO7I0eabaGaeSO7I0eabaGaeSO7I0eabaGaeSO7I0eabaGaem4DaC3aaSbaaSqaaiabdohaZnaaBaaameaacqWGRbWAaeqaaaWcbeaaaOqaaiabdIha4naaBaaaleaacqWGZbWCdaWgaaadbaGaem4AaSgabeaaliabcYcaSiabigdaXaqabaaakeaacqWG4baEdaWgaaWcbaGaem4Cam3aaSbaaWqaaiabdUgaRbqabaWccqGGSaalcqaIYaGmaeqaaaGcbaGaeS47IWeabaGaemiEaG3aaSbaaSqaaiabdohaZnaaBaaameaacqWGRbWAaeqaaSGaeiilaWIaemOta4KaeyOeI0IaeGymaedabeaaaaaakiaawIcacaGLPaaacqGH9aqpdaqadaqaauaabeqaciaaaeaacqWFXoqyaeaacqWH5bqEdaahaaWcbeqaaiabdsfaubaaaOqaaiabhEha3bqaaiabhIfaybaaaiaawIcacaGLPaaacaWLjaGaaCzcamaabmaabaGaeGymaedacaGLOaGaayzkaaaaaa@A2AD@</m:annotation>
					</m:semantics>
				</m:math>
			</p>
			<p>where <m:math name="1471-2105-8-S2-S6-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mstyle mathvariant="bold" mathsize="normal"><m:mi>w</m:mi></m:mstyle><m:mo>=</m:mo><m:msup><m:mrow><m:mo stretchy="false">[</m:mo><m:msub><m:mi>w</m:mi><m:mrow><m:msub><m:mi>s</m:mi><m:mn>1</m:mn></m:msub></m:mrow></m:msub><m:mo>,</m:mo><m:msub><m:mi>w</m:mi><m:mrow><m:msub><m:mi>s</m:mi><m:mn>2</m:mn></m:msub></m:mrow></m:msub><m:mo>,</m:mo><m:mo>&#8943;</m:mo><m:mo>,</m:mo><m:msub><m:mi>w</m:mi><m:mrow><m:msub><m:mi>s</m:mi><m:mi>k</m:mi></m:msub></m:mrow></m:msub><m:mo stretchy="false">]</m:mo></m:mrow><m:mi>T</m:mi></m:msup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWH3bWDcqGH9aqpcqGGBbWwcqWG3bWDdaWgaaWcbaGaem4Cam3aaSbaaWqaaiabigdaXaqabaaaleqaaOGaeiilaWIaem4DaC3aaSbaaSqaaiabdohaZnaaBaaameaacqaIYaGmaeqaaaWcbeaakiabcYcaSiabl+UimjabcYcaSiabdEha3naaBaaaleaacqWGZbWCdaWgaaadbaGaem4AaSgabeaaaSqabaGccqGGDbqxdaahaaWcbeqaaiabdsfaubaaaaa@44D6@</m:annotation></m:semantics></m:math>, and <b>y </b>= [<it>y</it><sub>1</sub>, <it>y</it><sub>2</sub>,...,<it>y</it><sub><it>N</it>-1</sub>]<sup><it>T </it></sup>is a subvector of <b>g* </b>excluding the missing value <it>&#945;</it>.</p>
			<p>LLSimpute selects the <it>k </it>most similar genes using <it>L</it><sub>2</sub>-norm or Pearson correlation and applies multiple regression to impute missing values with a linear combination of the <it>k </it>selected genes. LLSimpute applies multiple regression in two ways.</p>
			<p>The first model is</p>
			<p><b>y </b>= <b>X</b><sup><it>T</it></sup><it>&#946; </it>+ <it>&#949;</it></p>
			<p>
				<m:math name="1471-2105-8-S2-S6-i5" xmlns:m="http://www.w3.org/1998/Math/MathML">
					<m:semantics>
						<m:mrow>
							<m:mover accent="true">
								<m:mi>&#945;</m:mi>
								<m:mo>^</m:mo>
							</m:mover>
							<m:mo>=</m:mo>
							<m:msup>
								<m:mstyle mathvariant="bold" mathsize="normal">
									<m:mi>w</m:mi>
								</m:mstyle>
								<m:mi>T</m:mi>
							</m:msup>
							<m:mover accent="true">
								<m:mi>&#946;</m:mi>
								<m:mo>^</m:mo>
							</m:mover>
							<m:mo>=</m:mo>
							<m:msup>
								<m:mstyle mathvariant="bold" mathsize="normal">
									<m:mi>w</m:mi>
								</m:mstyle>
								<m:mi>T</m:mi>
							</m:msup>
							<m:msup>
								<m:mrow>
									<m:mo stretchy="false">(</m:mo>
									<m:mstyle mathvariant="bold" mathsize="normal">
										<m:mi>X</m:mi>
									</m:mstyle>
									<m:msup>
										<m:mstyle mathvariant="bold" mathsize="normal">
											<m:mi>X</m:mi>
										</m:mstyle>
										<m:mi>T</m:mi>
									</m:msup>
									<m:mo stretchy="false">)</m:mo>
								</m:mrow>
								<m:mo>&#8722;</m:mo>
							</m:msup>
							<m:mstyle mathvariant="bold" mathsize="normal">
								<m:mi>X</m:mi>
								<m:mi>y</m:mi>
							</m:mstyle>
							<m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
							<m:mrow>
								<m:mo>(</m:mo>
								<m:mn>2</m:mn>
								<m:mo>)</m:mo>
							</m:mrow>
						</m:mrow>
						<m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacuWFXoqygaqcaiabg2da9iabhEha3naaCaaaleqabaGaemivaqfaaOGaf8NSdiMbaKaacqGH9aqpcqWH3bWDdaahaaWcbeqaaiabdsfaubaakiabcIcaOiabhIfayjabhIfaynaaCaaaleqabaGaemivaqfaaOGaeiykaKYaaWbaaSqabeaacqGHsislaaGccqWHybawcqWH5bqEcaWLjaGaaCzcamaabmaabaGaeGOmaidacaGLOaGaayzkaaaaaa@4513@</m:annotation>
					</m:semantics>
				</m:math>
			</p>
			<p>and the second model is</p>
			<p><b>w </b>= <b>X</b><it>&#946;</it>* + <it>&#949;</it>*</p>
			<p>
				<m:math name="1471-2105-8-S2-S6-i6" xmlns:m="http://www.w3.org/1998/Math/MathML">
					<m:semantics>
						<m:mrow>
							<m:msup>
								<m:mover accent="true">
									<m:mi>&#945;</m:mi>
									<m:mo>^</m:mo>
								</m:mover>
								<m:mo>*</m:mo>
							</m:msup>
							<m:mo>=</m:mo>
							<m:msup>
								<m:mstyle mathvariant="bold" mathsize="normal">
									<m:mi>y</m:mi>
								</m:mstyle>
								<m:mi>T</m:mi>
							</m:msup>
							<m:msup>
								<m:mover accent="true">
									<m:mi>&#946;</m:mi>
									<m:mo>^</m:mo>
								</m:mover>
								<m:mo>*</m:mo>
							</m:msup>
							<m:mo>=</m:mo>
							<m:msup>
								<m:mstyle mathvariant="bold" mathsize="normal">
									<m:mi>y</m:mi>
								</m:mstyle>
								<m:mi>T</m:mi>
							</m:msup>
							<m:msup>
								<m:mrow>
									<m:mo stretchy="false">(</m:mo>
									<m:msup>
										<m:mstyle mathvariant="bold" mathsize="normal">
											<m:mi>X</m:mi>
										</m:mstyle>
										<m:mi>T</m:mi>
									</m:msup>
									<m:mstyle mathvariant="bold" mathsize="normal">
										<m:mi>X</m:mi>
									</m:mstyle>
									<m:mo stretchy="false">)</m:mo>
								</m:mrow>
								<m:mo>&#8722;</m:mo>
							</m:msup>
							<m:msup>
								<m:mstyle mathvariant="bold" mathsize="normal">
									<m:mi>X</m:mi>
								</m:mstyle>
								<m:mi>T</m:mi>
							</m:msup>
							<m:mstyle mathvariant="bold" mathsize="normal">
								<m:mi>w</m:mi>
							</m:mstyle>
							<m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
							<m:mrow>
								<m:mo>(</m:mo>
								<m:mn>3</m:mn>
								<m:mo>)</m:mo>
							</m:mrow>
						</m:mrow>
						<m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacuWFXoqygaqcamaaCaaaleqabaGaeiOkaOcaaOGaeyypa0JaeCyEaK3aaWbaaSqabeaacqWGubavaaGccuWFYoGygaqcamaaCaaaleqabaGaeiOkaOcaaOGaeyypa0JaeCyEaK3aaWbaaSqabeaacqWGubavaaGccqGGOaakcqWHybawdaahaaWcbeqaaiabdsfaubaakiabhIfayjabcMcaPmaaCaaaleqabaGaeyOeI0caaOGaeCiwaG1aaWbaaSqabeaacqWGubavaaGccqWH3bWDcaWLjaGaaCzcamaabmaabaGaeG4mamdacaGLOaGaayzkaaaaaa@48A7@</m:annotation>
					</m:semantics>
				</m:math>
			</p>
			<p>where (<b>XX</b><sup><it>T</it></sup>)<sup>- </sup>is the generalized inverse of (<b>XX</b><sup><it>T</it></sup>). If N is larger than k, (<b>XX</b><sup><it>T</it></sup>)<sup>- </sup>in the first model is easier to calculate (<b>X</b><sup><it>T</it></sup><b>X</b>)<sup>- </sup>than in the second model and vice versa.</p>
			<p>In case of multiple missing values in a gene, all missing components of each gene are excluded to find similar genes. Then, the vectors <b>w </b>and <b>y</b><sup><it>T</it></sup>, and matrix <b>X </b>are formed in a similar way as in the case of one missing entry, only with different dimensions.</p>
			<p>LLSimpute showed a good performance for a relatively large value of <it>k</it>. However, if a value of <it>k </it>is close to the number of samples, LLSimpute performed poorly compared to other imputation methods. It is probably due to the multi-collinearity problem that LLSimpute performs poorly when <it>k </it>is small. The patterns of gene expression are highly correlated leading to the poor performance of multiple regression.</p>
			<p>To overcome this limitation, we perform a regression with the principal components rather than the original data. Our technique utilizes the selection of two models in terms of <it>k </it>and applies the principal components analysis to the <it>k </it>selected genes. Also we consider the robustness to reduce the effects of the outliers by fitting robust regression.</p>
			<p>The RLSP method consists of three parts: (1) selection of <it>k </it>similar genes, (2) principal component analysis with the <it>k </it>selected genes, and (3) robust regression analysis using these principal components. We describe these processes step by step.</p>
			<sec>
				<st>
					<p>STEP 1 : Selection of <it>k </it>similar genes</p>
				</st>
				<p>To impute a missing value <it>&#945;</it>, the <it>k </it>similar genes are used for RLSP, where <it>k </it>is a pre-determined number. In LLSimpute, <it>L</it><sub>2</sub>-norm or Pearson correlation coefficient is used to select <it>k </it>similar genes. However, it is well known that <it>L</it><sub>2</sub>-norm and Pearson correlation coefficients are sensitive to outliers. In RLSP, we use <it>L</it><sub>1</sub>-norm as a distance measure to select the <it>k </it>similar genes for imputing the missing values of the gene <it>g</it>*,</p>
				<p>
					<m:math name="1471-2105-8-S2-S6-i7" xmlns:m="http://www.w3.org/1998/Math/MathML">
						<m:semantics>
							<m:mrow>
								<m:mi>d</m:mi>
								<m:mo stretchy="false">(</m:mo>
								<m:msup>
									<m:mstyle mathvariant="bold" mathsize="normal">
										<m:mi>g</m:mi>
									</m:mstyle>
									<m:mo>*</m:mo>
								</m:msup>
								<m:mo>,</m:mo>
								<m:msub>
									<m:mstyle mathvariant="bold" mathsize="normal">
										<m:mi>g</m:mi>
									</m:mstyle>
									<m:mi>i</m:mi>
								</m:msub>
								<m:mo stretchy="false">)</m:mo>
								<m:mo>=</m:mo>
								<m:mstyle displaystyle="true">
									<m:munderover>
										<m:mo>&#8721;</m:mo>
										<m:mrow>
											<m:mi>j</m:mi>
											<m:mo>=</m:mo>
											<m:mn>1</m:mn>
										</m:mrow>
										<m:mrow>
											<m:mi>N</m:mi>
											<m:mo>&#8722;</m:mo>
											<m:mn>1</m:mn>
										</m:mrow>
									</m:munderover>
									<m:mrow>
										<m:mrow>
											<m:mo>|</m:mo>
											<m:mrow>
												<m:msub>
													<m:mi>x</m:mi>
													<m:mrow>
														<m:mi>i</m:mi>
														<m:mo>,</m:mo>
														<m:mi>j</m:mi>
													</m:mrow>
												</m:msub>
												<m:mo>&#8722;</m:mo>
												<m:msub>
													<m:mi>y</m:mi>
													<m:mi>j</m:mi>
												</m:msub>
											</m:mrow>
											<m:mo>|</m:mo>
										</m:mrow>
									</m:mrow>
								</m:mstyle>
								<m:mo>.</m:mo>
							</m:mrow>
							<m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGKbazcqGGOaakcqWHNbWzdaahaaWcbeqaaiabcQcaQaaakiabcYcaSiabhEgaNnaaBaaaleaacqWGPbqAaeqaaOGaeiykaKIaeyypa0ZaaabCaeaadaabdaqaaiabdIha4naaBaaaleaacqWGPbqAcqGGSaalcqWGQbGAaeqaaOGaeyOeI0IaemyEaK3aaSbaaSqaaiabdQgaQbqabaaakiaawEa7caGLiWoaaSqaaiabdQgaQjabg2da9iabigdaXaqaaiabd6eaojabgkHiTiabigdaXaqdcqGHris5aOGaeiOla4caaa@4CD8@</m:annotation>
						</m:semantics>
					</m:math>
				</p>
			</sec>
			<sec>
				<st>
					<p>STEP 2 : Principal component</p>
				</st>
				<p>After selecting <it>k </it>similar genes, we perform the principal component analysis. We define two types of variance-covariance matrix. The first one is a <it>k </it>&#215; <it>k </it>matrix</p>
				<p>
					<m:math name="1471-2105-8-S2-S6-i8" xmlns:m="http://www.w3.org/1998/Math/MathML">
						<m:semantics>
							<m:mrow>
								<m:mi>V</m:mi>
								<m:mo>=</m:mo>
								<m:mstyle displaystyle="true">
									<m:munderover>
										<m:mo>&#8721;</m:mo>
										<m:mrow>
											<m:mi>i</m:mi>
											<m:mo>=</m:mo>
											<m:mn>1</m:mn>
										</m:mrow>
										<m:mrow>
											<m:mi>N</m:mi>
											<m:mo>&#8722;</m:mo>
											<m:mn>1</m:mn>
										</m:mrow>
									</m:munderover>
									<m:mrow>
										<m:mstyle displaystyle="true">
											<m:munderover>
												<m:mo>&#8721;</m:mo>
												<m:mrow>
													<m:mi>j</m:mi>
													<m:mo>=</m:mo>
													<m:mn>1</m:mn>
												</m:mrow>
												<m:mrow>
													<m:mi>N</m:mi>
													<m:mo>&#8722;</m:mo>
													<m:mn>1</m:mn>
												</m:mrow>
											</m:munderover>
											<m:mrow>
												<m:mo stretchy="false">(</m:mo>
												<m:msub>
													<m:mstyle mathvariant="bold" mathsize="normal">
														<m:mi>x</m:mi>
													</m:mstyle>
													<m:mi>i</m:mi>
												</m:msub>
												<m:mo>&#8722;</m:mo>
												<m:mstyle mathvariant="bold" mathsize="normal">
													<m:mover accent="true">
														<m:mi>x</m:mi>
														<m:mo>&#175;</m:mo>
													</m:mover>
												</m:mstyle>
												<m:mo>.</m:mo>
											</m:mrow>
										</m:mstyle>
										<m:mo stretchy="false">)</m:mo>
										<m:msup>
											<m:mrow>
												<m:mo stretchy="false">(</m:mo>
												<m:msub>
													<m:mstyle mathvariant="bold" mathsize="normal">
														<m:mi>x</m:mi>
													</m:mstyle>
													<m:mi>j</m:mi>
												</m:msub>
												<m:mo>&#8722;</m:mo>
												<m:mstyle mathvariant="bold" mathsize="normal">
													<m:mover accent="true">
														<m:mi>x</m:mi>
														<m:mo>&#175;</m:mo>
													</m:mover>
												</m:mstyle>
												<m:mo>.</m:mo>
												<m:mo stretchy="false">)</m:mo>
											</m:mrow>
											<m:mi>T</m:mi>
										</m:msup>
										<m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
										<m:mrow>
											<m:mo>(</m:mo>
											<m:mn>4</m:mn>
											<m:mo>)</m:mo>
										</m:mrow>
									</m:mrow>
								</m:mstyle>
							</m:mrow>
							<m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGwbGvcqGH9aqpdaaeWbqaamaaqahabaGaeiikaGIaeCiEaG3aaSbaaSqaaiabdMgaPbqabaGccqGHsislcuWH4baEgaqeaiabc6caUaWcbaGaemOAaOMaeyypa0JaeGymaedabaGaemOta4KaeyOeI0IaeGymaedaniabggHiLdGccqGGPaqkcqGGOaakcqWH4baEdaWgaaWcbaGaemOAaOgabeaakiabgkHiTiqbhIha4zaaraGaeiOla4IaeiykaKYaaWbaaSqabeaacqWGubavaaGccaWLjaGaaCzcamaabmaabaGaeGinaqdacaGLOaGaayzkaaaaleaacqWGPbqAcqGH9aqpcqaIXaqmaeaacqWGobGtcqGHsislcqaIXaqma0GaeyyeIuoaaaa@5596@</m:annotation>
						</m:semantics>
					</m:math>
				</p>
				<p>where <m:math name="1471-2105-8-S2-S6-i9" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mstyle mathvariant="bold" mathsize="normal"><m:mi>x</m:mi></m:mstyle><m:mi>i</m:mi></m:msub><m:mo>=</m:mo><m:msup><m:mrow><m:mo stretchy="false">[</m:mo><m:msub><m:mi>x</m:mi><m:mrow><m:msub><m:mi>s</m:mi><m:mn>1</m:mn></m:msub><m:mo>,</m:mo><m:mi>i</m:mi></m:mrow></m:msub><m:mo>,</m:mo><m:msub><m:mi>x</m:mi><m:mrow><m:msub><m:mi>s</m:mi><m:mn>2</m:mn></m:msub><m:mo>,</m:mo><m:mi>i</m:mi></m:mrow></m:msub><m:mo>,</m:mo><m:mo>&#8943;</m:mo><m:mo>,</m:mo><m:msub><m:mi>x</m:mi><m:mrow><m:msub><m:mi>s</m:mi><m:mi>k</m:mi></m:msub><m:mo>,</m:mo><m:mi>i</m:mi></m:mrow></m:msub><m:mo stretchy="false">]</m:mo></m:mrow><m:mi>T</m:mi></m:msup><m:mo>,</m:mo><m:mtext>&#160;</m:mtext><m:mstyle mathvariant="bold" mathsize="normal"><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover></m:mstyle><m:mo>.</m:mo><m:mo>=</m:mo><m:msup><m:mrow><m:mo stretchy="false">[</m:mo><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover><m:mrow><m:msub><m:mi>s</m:mi><m:mn>1</m:mn></m:msub><m:mo>,</m:mo><m:mo>.</m:mo></m:mrow></m:msub><m:mo>,</m:mo><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover><m:mrow><m:msub><m:mi>s</m:mi><m:mn>2</m:mn></m:msub><m:mo>,</m:mo><m:mo>.</m:mo></m:mrow></m:msub><m:mo>,</m:mo><m:mo>&#8943;</m:mo><m:mo>,</m:mo><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover><m:mrow><m:msub><m:mi>s</m:mi><m:mi>k</m:mi></m:msub><m:mo>,</m:mo><m:mo>.</m:mo></m:mrow></m:msub><m:mo stretchy="false">]</m:mo></m:mrow><m:mi>T</m:mi></m:msup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWH4baEdaWgaaWcbaGaemyAaKgabeaakiabg2da9iabcUfaBjabdIha4naaBaaaleaacqWGZbWCdaWgaaadbaGaeGymaedabeaaliabcYcaSiabdMgaPbqabaGccqGGSaalcqWG4baEdaWgaaWcbaGaem4Cam3aaSbaaWqaaiabikdaYaqabaWccqGGSaalcqWGPbqAaeqaaOGaeiilaWIaeS47IWKaeiilaWIaemiEaG3aaSbaaSqaaiabdohaZnaaBaaameaacqWGRbWAaeqaaSGaeiilaWIaemyAaKgabeaakiabc2faDnaaCaaaleqabaGaemivaqfaaOGaeiilaWIaeeiiaaIafCiEaGNbaebacqGGUaGlcqGH9aqpcqGGBbWwcuWG4baEgaqeamaaBaaaleaacqWGZbWCdaWgaaadbaGaeGymaedabeaaliabcYcaSiabc6caUaqabaGccqGGSaalcuWG4baEgaqeamaaBaaaleaacqWGZbWCdaWgaaadbaGaeGOmaidabeaaliabcYcaSiabc6caUaqabaGccqGGSaalcqWIVlctcqGGSaalcuWG4baEgaqeamaaBaaaleaacqWGZbWCdaWgaaadbaGaem4AaSgabeaaliabcYcaSiabc6caUaqabaGccqGGDbqxdaahaaWcbeqaaiabdsfaubaaaaa@6D93@</m:annotation></m:semantics></m:math>, and <m:math name="1471-2105-8-S2-S6-i10" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mtext>&#160;</m:mtext><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover><m:mrow><m:msub><m:mi>s</m:mi><m:mi>l</m:mi></m:msub><m:mo>,</m:mo><m:mo>.</m:mo></m:mrow></m:msub><m:mo>=</m:mo><m:mfrac><m:mn>1</m:mn><m:mrow><m:mi>N</m:mi><m:mo>&#8722;</m:mo><m:mn>1</m:mn></m:mrow></m:mfrac><m:mstyle displaystyle="true"><m:msubsup><m:mo>&#8721;</m:mo><m:mrow><m:mi>i</m:mi><m:mo>=</m:mo><m:mn>1</m:mn></m:mrow><m:mrow><m:mi>N</m:mi><m:mo>&#8722;</m:mo><m:mn>1</m:mn></m:mrow></m:msubsup><m:mrow><m:msub><m:mi>x</m:mi><m:mrow><m:msub><m:mi>s</m:mi><m:mi>l</m:mi></m:msub><m:mo>,</m:mo><m:mi>i</m:mi></m:mrow></m:msub></m:mrow></m:mstyle></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqqGGaaicuWG4baEgaqeamaaBaaaleaacqWGZbWCdaWgaaadbaGaemiBaWgabeaaliabcYcaSiabc6caUaqabaGccqGH9aqpdaWcaaqaaiabigdaXaqaaiabd6eaojabgkHiTiabigdaXaaadaaeWaqaaiabdIha4naaBaaaleaacqWGZbWCdaWgaaadbaGaemiBaWgabeaaliabcYcaSiabdMgaPbqabaaabaGaemyAaKMaeyypa0JaeGymaedabaGaemOta4KaeyOeI0IaeGymaedaniabggHiLdaaaa@4840@</m:annotation></m:semantics></m:math>. The second type is a (<it>N</it>-1) &#215; (<it>N</it>-1) matrix</p>
				<p>
					<m:math name="1471-2105-8-S2-S6-i11" xmlns:m="http://www.w3.org/1998/Math/MathML">
						<m:semantics>
							<m:mrow>
								<m:msup>
									<m:mi>V</m:mi>
									<m:mo>*</m:mo>
								</m:msup>
								<m:mo>=</m:mo>
								<m:mstyle displaystyle="true">
									<m:munderover>
										<m:mo>&#8721;</m:mo>
										<m:mrow>
											<m:mi>i</m:mi>
											<m:mo>=</m:mo>
											<m:mn>1</m:mn>
										</m:mrow>
										<m:mi>k</m:mi>
									</m:munderover>
									<m:mrow>
										<m:mstyle displaystyle="true">
											<m:munderover>
												<m:mo>&#8721;</m:mo>
												<m:mrow>
													<m:mi>j</m:mi>
													<m:mo>=</m:mo>
													<m:mn>1</m:mn>
												</m:mrow>
												<m:mi>k</m:mi>
											</m:munderover>
											<m:mrow>
												<m:mo stretchy="false">(</m:mo>
												<m:msubsup>
													<m:mstyle mathvariant="bold" mathsize="normal">
														<m:mi>x</m:mi>
													</m:mstyle>
													<m:mi>i</m:mi>
													<m:mo>*</m:mo>
												</m:msubsup>
												<m:mo>&#8722;</m:mo>
												<m:msubsup>
													<m:mstyle mathvariant="bold" mathsize="normal">
														<m:mover accent="true">
															<m:mi>x</m:mi>
															<m:mo>&#175;</m:mo>
														</m:mover>
													</m:mstyle>
													<m:mo>.</m:mo>
													<m:mo>*</m:mo>
												</m:msubsup>
											</m:mrow>
										</m:mstyle>
										<m:mo stretchy="false">)</m:mo>
										<m:msup>
											<m:mrow>
												<m:mo stretchy="false">(</m:mo>
												<m:msubsup>
													<m:mstyle mathvariant="bold" mathsize="normal">
														<m:mi>x</m:mi>
													</m:mstyle>
													<m:mi>j</m:mi>
													<m:mo>*</m:mo>
												</m:msubsup>
												<m:mo>&#8722;</m:mo>
												<m:msubsup>
													<m:mstyle mathvariant="bold" mathsize="normal">
														<m:mover accent="true">
															<m:mi>x</m:mi>
															<m:mo>&#175;</m:mo>
														</m:mover>
													</m:mstyle>
													<m:mo>.</m:mo>
													<m:mo>*</m:mo>
												</m:msubsup>
												<m:mo stretchy="false">)</m:mo>
											</m:mrow>
											<m:mi>T</m:mi>
										</m:msup>
										<m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
										<m:mrow>
											<m:mo>(</m:mo>
											<m:mn>5</m:mn>
											<m:mo>)</m:mo>
										</m:mrow>
									</m:mrow>
								</m:mstyle>
							</m:mrow>
							<m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGwbGvdaahaaWcbeqaaiabcQcaQaaakiabg2da9maaqahabaWaaabCaeaacqGGOaakcqWH4baEdaqhaaWcbaGaemyAaKgabaGaeiOkaOcaaOGaeyOeI0IafCiEaGNbaebadaqhaaWcbaGaeiOla4cabaGaeiOkaOcaaaqaaiabdQgaQjabg2da9iabigdaXaqaaiabdUgaRbqdcqGHris5aOGaeiykaKIaeiikaGIaeCiEaG3aa0baaSqaaiabdQgaQbqaaiabcQcaQaaakiabgkHiTiqbhIha4zaaraWaa0baaSqaaiabc6caUaqaaiabcQcaQaaakiabcMcaPmaaCaaaleqabaGaemivaqfaaOGaaCzcaiaaxMaadaqadaqaaiabiwda1aGaayjkaiaawMcaaaWcbaGaemyAaKMaeyypa0JaeGymaedabaGaem4AaSganiabggHiLdaaaa@5730@</m:annotation>
						</m:semantics>
					</m:math>
				</p>
				<p>where <m:math name="1471-2105-8-S2-S6-i12" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mstyle mathvariant="bold" mathsize="normal"><m:mi>x</m:mi></m:mstyle><m:mi>i</m:mi><m:mo>*</m:mo></m:msubsup><m:mo>=</m:mo><m:msup><m:mrow><m:mo stretchy="false">[</m:mo><m:msub><m:mi>x</m:mi><m:mrow><m:msub><m:mi>s</m:mi><m:mi>i</m:mi></m:msub><m:mo>,</m:mo><m:mn>1</m:mn></m:mrow></m:msub><m:mo>,</m:mo><m:msub><m:mi>x</m:mi><m:mrow><m:msub><m:mi>s</m:mi><m:mi>i</m:mi></m:msub><m:mo>,</m:mo><m:mn>2</m:mn></m:mrow></m:msub><m:mo>,</m:mo><m:mo>&#8943;</m:mo><m:mo>,</m:mo><m:msub><m:mi>x</m:mi><m:mrow><m:msub><m:mi>s</m:mi><m:mi>i</m:mi></m:msub><m:mo>,</m:mo><m:mi>N</m:mi><m:mo>&#8722;</m:mo><m:mn>1</m:mn></m:mrow></m:msub><m:mo stretchy="false">]</m:mo></m:mrow><m:mi>T</m:mi></m:msup><m:mo>,</m:mo><m:mtext>&#160;</m:mtext><m:msubsup><m:mstyle mathvariant="bold" mathsize="normal"><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover></m:mstyle><m:mo>.</m:mo><m:mo>*</m:mo></m:msubsup><m:mo>=</m:mo><m:msup><m:mrow><m:mo stretchy="false">[</m:mo><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover><m:mrow><m:mo>.</m:mo><m:mo>,</m:mo><m:mn>1</m:mn></m:mrow></m:msub><m:mo>,</m:mo><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover><m:mrow><m:mo>.</m:mo><m:mo>,</m:mo><m:mn>2</m:mn></m:mrow></m:msub><m:mo>,</m:mo><m:mo>&#8943;</m:mo><m:mo>,</m:mo><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover><m:mrow><m:mo>.</m:mo><m:mo>,</m:mo><m:mi>N</m:mi><m:mo>&#8722;</m:mo><m:mn>1</m:mn></m:mrow></m:msub><m:mo stretchy="false">]</m:mo></m:mrow><m:mi>T</m:mi></m:msup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegyvzYrwyUfgarqqtubsr4rNCHbGeaGqiA8vkIkVAFgIELiFeLkFeLk=iY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfeaY=biLkVcLq=JHqVepeea0=as0db9vqpepesP0xe9Fve9Fve9GapdbaqaaeGacaGaaiaabeqaamqadiabaaGcbaGaeCiEaG3aa0baaSqaaiabdMgaPbqaaiabcQcaQaaakiabg2da9iabcUfaBjabdIha4naaBaaaleaacqWGZbWCdaWgaaadbaGaemyAaKgabeaaliabcYcaSiabigdaXaqabaGccqGGSaalcqWG4baEdaWgaaWcbaGaem4Cam3aaSbaaWqaaiabdMgaPbqabaWccqGGSaalcqaIYaGmaeqaaOGaeiilaWIaeS47IWKaeiilaWIaemiEaG3aaSbaaSqaaiabdohaZnaaBaaameaacqWGPbqAaeqaaSGaeiilaWIaemOta4KaeyOeI0IaeGymaedabeaakiabc2faDnaaCaaaleqabaGaemivaqfaaOGaeiilaWIaeeiiaaIafCiEaGNbaebadaqhaaWcbaGaeiOla4cabaGaeiOkaOcaaOGaeyypa0Jaei4waSLafmiEaGNbaebadaWgaaWcbaGaeiOla4IaeiilaWIaeGymaedabeaakiabcYcaSiqbdIha4zaaraWaaSbaaSqaaiabc6caUiabcYcaSiabikdaYaqabaGccqGGSaalcqWIVlctcqGGSaalcuWG4baEgaqeamaaBaaaleaacqGGUaGlcqGGSaalcqWGobGtcqGHsislcqaIXaqmaeqaaOGaeiyxa01aaWbaaSqabeaacqWGubavaaaaaa@7E13@</m:annotation></m:semantics></m:math>, and <m:math name="1471-2105-8-S2-S6-i13" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover><m:mrow><m:mo>.</m:mo><m:mo>,</m:mo><m:mi>l</m:mi></m:mrow></m:msub><m:mo>=</m:mo><m:mfrac><m:mn>1</m:mn><m:mi>k</m:mi></m:mfrac><m:mstyle displaystyle="true"><m:msubsup><m:mo>&#8721;</m:mo><m:mrow><m:mi>i</m:mi><m:mo>=</m:mo><m:mn>1</m:mn></m:mrow><m:mi>k</m:mi></m:msubsup><m:mrow><m:msub><m:mi>x</m:mi><m:mrow><m:msub><m:mi>s</m:mi><m:mi>i</m:mi></m:msub><m:mo>,</m:mo><m:mi>l</m:mi></m:mrow></m:msub></m:mrow></m:mstyle></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWG4baEgaqeamaaBaaaleaacqGGUaGlcqGGSaalcqWGSbaBaeqaaOGaeyypa0ZaaSaaaeaacqaIXaqmaeaacqWGRbWAaaWaaabmaeaacqWG4baEdaWgaaWcbaGaem4Cam3aaSbaaWqaaiabdMgaPbqabaWccqGGSaalcqWGSbaBaeqaaaqaaiabdMgaPjabg2da9iabigdaXaqaaiabdUgaRbqdcqGHris5aaaa@428C@</m:annotation></m:semantics></m:math>.</p>
				<p>When <it>k </it>is larger than <it>N</it>, the size of <it>V </it>matrix becomes too large to handle and it is not computationally efficient to derive the principal components. Therefore, we use <it>V</it>* instead of <it>V </it>and use a different type of regression. <it>V </it>corresponds to the first model and <it>V</it>* corresponds to the second model in LLS impute, respectively. Kim et al. <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> showed that the solutions based on V and V* are in fact the same. Let <b>PC</b><sub><it>x </it></sub>= {<it>PC</it><sub>1</sub>, <it>PC</it><sub>2</sub>,...,<it>PC</it><sub><it>k</it></sub>} be the principal components using <it>V </it>and <m:math name="1471-2105-8-S2-S6-i14" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mstyle mathvariant="bold" mathsize="normal"><m:mi>P</m:mi></m:mstyle><m:msubsup><m:mstyle mathvariant="bold" mathsize="normal"><m:mi>C</m:mi></m:mstyle><m:mi>x</m:mi><m:mo>*</m:mo></m:msubsup><m:mo>=</m:mo><m:mo>{</m:mo><m:mi>P</m:mi><m:msubsup><m:mi>C</m:mi><m:mn>1</m:mn><m:mo>*</m:mo></m:msubsup><m:mo>,</m:mo><m:mi>P</m:mi><m:msubsup><m:mi>C</m:mi><m:mn>2</m:mn><m:mo>*</m:mo></m:msubsup><m:mo>,</m:mo><m:mo>&#8943;</m:mo><m:mo>,</m:mo><m:mi>P</m:mi><m:msubsup><m:mi>C</m:mi><m:mrow><m:mi>N</m:mi><m:mo>&#8722;</m:mo><m:mn>1</m:mn></m:mrow><m:mo>*</m:mo></m:msubsup><m:mo>}</m:mo></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWHqbaucqWHdbWqdaqhaaWcbaGaemiEaGhabaGaeiOkaOcaaOGaeyypa0Jaei4EaSNaemiuaaLaem4qam0aa0baaSqaaiabigdaXaqaaiabcQcaQaaakiabcYcaSiabdcfaqjabdoeadnaaDaaaleaacqaIYaGmaeaacqGGQaGkaaGccqGGSaalcqWIVlctcqGGSaalcqWGqbaucqWGdbWqdaqhaaWcbaGaemOta4KaeyOeI0IaeGymaedabaGaeiOkaOcaaOGaeiyFa0haaa@48D1@</m:annotation></m:semantics></m:math> be the principal components using <it>V</it>*. Then, these principal components <b>PC</b><sub><it>x </it></sub>and <m:math name="1471-2105-8-S2-S6-i15" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mstyle mathvariant="bold" mathsize="normal"><m:mi>P</m:mi></m:mstyle><m:msubsup><m:mstyle mathvariant="bold" mathsize="normal"><m:mi>C</m:mi></m:mstyle><m:mi>x</m:mi><m:mo>*</m:mo></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWHqbaucqWHdbWqdaqhaaWcbaGaemiEaGhabaGaeiOkaOcaaaaa@316E@</m:annotation></m:semantics></m:math> are used for the first type of regression (equation (2)) and the second type of regression model (equation (3)), respectively.</p>
			</sec>
			<sec>
				<st>
					<p>STEP 3 : Robust regression</p>
				</st>
				<p>We use <it>PC</it><sub>1</sub>, <it>PC</it><sub>2</sub>,...,<it>PC</it><sub><it>p </it></sub>or <m:math name="1471-2105-8-S2-S6-i16" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>P</m:mi><m:msubsup><m:mi>C</m:mi><m:mn>1</m:mn><m:mo>*</m:mo></m:msubsup><m:mo>,</m:mo><m:mi>P</m:mi><m:msubsup><m:mi>C</m:mi><m:mn>2</m:mn><m:mo>*</m:mo></m:msubsup><m:mo>,</m:mo><m:mo>&#8943;</m:mo><m:mo>,</m:mo><m:mi>P</m:mi><m:msubsup><m:mi>C</m:mi><m:mi>p</m:mi><m:mo>*</m:mo></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGqbaucqWGdbWqdaqhaaWcbaGaeGymaedabaGaeiOkaOcaaOGaeiilaWIaemiuaaLaem4qam0aa0baaSqaaiabikdaYaqaaiabcQcaQaaakiabcYcaSiabl+UimjabcYcaSiabdcfaqjabdoeadnaaDaaaleaacqWGWbaCaeaacqGGQaGkaaaaaa@3E5C@</m:annotation></m:semantics></m:math> as new exploratory variables and fit the regression model in a robust manner, where <it>p </it>is the predetermined number of the principal components. The corresponding regression models are</p>
				<p>
					<m:math name="1471-2105-8-S2-S6-i17" xmlns:m="http://www.w3.org/1998/Math/MathML">
						<m:semantics>
							<m:mrow>
								<m:mstyle mathvariant="bold" mathsize="normal">
									<m:mi>y</m:mi>
								</m:mstyle>
								<m:mo>=</m:mo>
								<m:mstyle displaystyle="true">
									<m:munderover>
										<m:mo>&#8721;</m:mo>
										<m:mrow>
											<m:mi>i</m:mi>
											<m:mo>=</m:mo>
											<m:mn>1</m:mn>
										</m:mrow>
										<m:mi>p</m:mi>
									</m:munderover>
									<m:mrow>
										<m:mi>P</m:mi>
										<m:msub>
											<m:mi>C</m:mi>
											<m:mi>i</m:mi>
										</m:msub>
										<m:msub>
											<m:mi>&#946;</m:mi>
											<m:mi>i</m:mi>
										</m:msub>
									</m:mrow>
								</m:mstyle>
								<m:mo>+</m:mo>
								<m:mi>&#949;</m:mi>
								<m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
								<m:mrow>
									<m:mo>(</m:mo>
									<m:mn>6</m:mn>
									<m:mo>)</m:mo>
								</m:mrow>
							</m:mrow>
							<m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWH5bqEcqGH9aqpdaaeWbqaaiabdcfaqjabdoeadnaaBaaaleaacqWGPbqAaeqaaGGacOGae8NSdi2aaSbaaSqaaiabdMgaPbqabaaabaGaemyAaKMaeyypa0JaeGymaedabaGaemiCaahaniabggHiLdGccqGHRaWkcqWF1oqzcaWLjaGaaCzcamaabmaabaGaeGOnaydacaGLOaGaayzkaaaaaa@436F@</m:annotation>
						</m:semantics>
					</m:math>
				</p>
				<p>and</p>
				<p>
					<m:math name="1471-2105-8-S2-S6-i18" xmlns:m="http://www.w3.org/1998/Math/MathML">
						<m:semantics>
							<m:mrow>
								<m:mstyle mathvariant="bold" mathsize="normal">
									<m:mi>w</m:mi>
								</m:mstyle>
								<m:mo>=</m:mo>
								<m:mstyle displaystyle="true">
									<m:munderover>
										<m:mo>&#8721;</m:mo>
										<m:mrow>
											<m:mi>i</m:mi>
											<m:mo>=</m:mo>
											<m:mn>1</m:mn>
										</m:mrow>
										<m:mi>p</m:mi>
									</m:munderover>
									<m:mrow>
										<m:mi>P</m:mi>
										<m:msubsup>
											<m:mi>C</m:mi>
											<m:mi>i</m:mi>
											<m:mo>*</m:mo>
										</m:msubsup>
										<m:msubsup>
											<m:mi>&#946;</m:mi>
											<m:mi>i</m:mi>
											<m:mo>*</m:mo>
										</m:msubsup>
									</m:mrow>
								</m:mstyle>
								<m:mo>+</m:mo>
								<m:msup>
									<m:mi>&#949;</m:mi>
									<m:mo>*</m:mo>
								</m:msup>
								<m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
								<m:mrow>
									<m:mo>(</m:mo>
									<m:mn>7</m:mn>
									<m:mo>)</m:mo>
								</m:mrow>
							</m:mrow>
							<m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegyvzYrwyUfgarqqtubsr4rNCHbGeaGqiA8vkIkVAFgIELiFeLkFeLk=iY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfeaY=biLkVcLq=JHqVepeea0=as0db9vqpepesP0xe9Fve9Fve9GapdbaqaaeGacaGaaiaabeqaamqadiabaaGcbaGaeC4DaCNaeyypa0ZaaabCaeaacqWGqbaucqWGdbWqdaqhaaWcbaGaemyAaKgabaGaeiOkaOcaaGGacOGae8NSdi2aa0baaSqaaiabdMgaPbqaaiabcQcaQaaaaeaacqWGPbqAcqGH9aqpcqaIXaqmaeaacqWGWbaCa0GaeyyeIuoakiabgUcaRiab=v7aLnaaCaaaleqabaGaeiOkaOcaaOGaaCzcaiaaxMaadaqadaqaaiabiEda3aGaayjkaiaawMcaaaaa@5679@</m:annotation>
						</m:semantics>
					</m:math>
				</p>
				<p>In our method, we use a quantile regression to fit the regression model in a robust manner. Robust regression usually provides an alternative analysis to least square regression when fundamental assumptions such as normality or variance homogeneity are violated. The quantile regression using the 50th percentile estimates the model parameters by minimizing the sum of absolute values of the residuals <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. It is estimated by minimizing <m:math name="1471-2105-8-S2-S6-i19" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mstyle displaystyle="true"><m:msubsup><m:mo>&#8721;</m:mo><m:mrow><m:mi>j</m:mi><m:mo>=</m:mo><m:mn>1</m:mn></m:mrow><m:mrow><m:mi>N</m:mi><m:mo>&#8722;</m:mo><m:mn>1</m:mn></m:mrow></m:msubsup><m:mrow><m:mrow><m:mo>|</m:mo><m:mrow><m:msub><m:mi>y</m:mi><m:mi>j</m:mi></m:msub><m:mo>&#8722;</m:mo><m:mstyle displaystyle="true"><m:msubsup><m:mo>&#8721;</m:mo><m:mrow><m:mi>i</m:mi><m:mo>=</m:mo><m:mn>1</m:mn></m:mrow><m:mi>p</m:mi></m:msubsup><m:mrow><m:mi>P</m:mi><m:msub><m:mi>C</m:mi><m:mrow><m:mi>i</m:mi><m:mi>j</m:mi></m:mrow></m:msub><m:msub><m:mi>&#946;</m:mi><m:mi>i</m:mi></m:msub></m:mrow></m:mstyle></m:mrow><m:mo>|</m:mo></m:mrow></m:mrow></m:mstyle></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaaeWaqaamaaemaabaGaemyEaK3aaSbaaSqaaiabdQgaQbqabaGccqGHsisldaaeWaqaaiabdcfaqjabdoeadnaaBaaaleaacqWGPbqAcqWGQbGAaeqaaGGacOGae8NSdi2aaSbaaSqaaiabdMgaPbqabaaabaGaemyAaKMaeyypa0JaeGymaedabaGaemiCaahaniabggHiLdaakiaawEa7caGLiWoaaSqaaiabdQgaQjabg2da9iabigdaXaqaaiabd6eaojabgkHiTiabigdaXaqdcqGHris5aaaa@4B30@</m:annotation></m:semantics></m:math> or <m:math name="1471-2105-8-S2-S6-i20" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mstyle displaystyle="true"><m:msubsup><m:mo>&#8721;</m:mo><m:mrow><m:mi>j</m:mi><m:mo>=</m:mo><m:mn>1</m:mn></m:mrow><m:mi>k</m:mi></m:msubsup><m:mrow><m:mrow><m:mo>|</m:mo><m:mrow><m:msub><m:mi>w</m:mi><m:mrow><m:msub><m:mi>s</m:mi><m:mi>j</m:mi></m:msub></m:mrow></m:msub><m:mo>&#8722;</m:mo><m:mstyle displaystyle="true"><m:msubsup><m:mo>&#8721;</m:mo><m:mrow><m:mi>i</m:mi><m:mo>=</m:mo><m:mn>1</m:mn></m:mrow><m:mi>p</m:mi></m:msubsup><m:mrow><m:mi>P</m:mi><m:msubsup><m:mi>C</m:mi><m:mrow><m:mi>i</m:mi><m:mi>j</m:mi></m:mrow><m:mo>*</m:mo></m:msubsup><m:msubsup><m:mi>&#946;</m:mi><m:mi>i</m:mi><m:mo>*</m:mo></m:msubsup></m:mrow></m:mstyle></m:mrow><m:mo>|</m:mo></m:mrow></m:mrow></m:mstyle></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaaeWaqaamaaemaabaGaem4DaC3aaSbaaSqaaiabdohaZnaaBaaameaacqWGQbGAaeqaaaWcbeaakiabgkHiTmaaqadabaGaemiuaaLaem4qam0aa0baaSqaaiabdMgaPjabdQgaQbqaaiabcQcaQaaaiiGakiab=j7aInaaDaaaleaacqWGPbqAaeaacqGGQaGkaaaabaGaemyAaKMaeyypa0JaeGymaedabaGaemiCaahaniabggHiLdaakiaawEa7caGLiWoaaSqaaiabdQgaQjabg2da9iabigdaXaqaaiabdUgaRbqdcqGHris5aaaa@4CEA@</m:annotation></m:semantics></m:math>. This way our analysis method can reject outliers and maintains robustness.</p>
				<p>If <it>p </it>= <it>k </it>and the regression model is estimated by the least squares method, RLSP is the same as LLSimpute. When <it>k </it>is small, we recommend using <it>p </it>= 1 and fit the regression model <it>y</it><sub><it>i </it></sub>= <it>&#946;</it><sub>1</sub><it>PC</it><sub>1<it>i </it></sub>+ <it>&#949;</it><sub><it>i </it></sub>using the sum of least absolute deviations. The imputed value of <it>&#945; </it>is defined by <m:math name="1471-2105-8-S2-S6-i21" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mover accent="true"><m:mi>&#945;</m:mi><m:mo>^</m:mo></m:mover><m:mo>=</m:mo><m:msub><m:mover accent="true"><m:mi>&#946;</m:mi><m:mo>^</m:mo></m:mover><m:mn>1</m:mn></m:msub><m:mi>P</m:mi><m:msub><m:mi>C</m:mi><m:mi>w</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacuWFXoqygaqcaiabg2da9iqb=j7aIzaajaWaaSbaaSqaaiabigdaXaqabaGccqWGqbaucqWGdbWqdaWgaaWcbaGaem4DaChabeaaaaa@3615@</m:annotation></m:semantics></m:math> where <it>PC</it><sub><it>w </it></sub>is the projected data of <b>w </b>onto the direction of <it>PC</it><sub>1</sub>. For <it>k </it>much larger than <it>N</it>, we recommend using <it>p </it>close to the number of the sample size.</p>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Results and discussion</p>
			</st>
			<sec>
				<st>
					<p>Datasets</p>
				</st>
				<p>Four data sets are used for the comparative study: three Spellman data sets (ALPHA, ELU, and ALPHA+ELU, <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B11">11</abbr></abbrgrp>), and Gasch data set <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. These data sets were also used in the comparative study of LLSimpute <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. ALPHA dataset was obtained from <it>&#945;</it>-factor block release studied for the identification of cell-cycle regulated genes in yeast <it>Saccharomyces cerevisiae </it><abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. ELU dataset was elutriation dataset in the same study. After removing all the genes with missing values in the ALPHA and ELU datasets, we obtained complete data matrices that contain 4,304 genes and 18 experiments and 4,304 genes and 14 experiments, respectively. ALPHA+ELU dataset was used for the examination of the additional sample effects as studied Oba et al. <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Gasch dataset was obtained from the study of genomic expression responses to DNA damage <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. After removing all genes with missing values, a complete data matrix with 2641 genes and 44 experiments was prepared for this study. For the simulation study, 1% and 5% missing observations were randomly generated in these data sets.</p>
				<p>We use the normalized root mean squares error (NRMSE) to evaluate the performances of the missing value imputation approaches, computed by</p>
				<p>
					<m:math name="1471-2105-8-S2-S6-i22" xmlns:m="http://www.w3.org/1998/Math/MathML">
						<m:semantics>
							<m:mrow>
								<m:mtext>NRMSE&#160;=</m:mtext>
								<m:mrow>
									<m:mrow>
										<m:msqrt>
											<m:mrow>
												<m:mi>m</m:mi>
												<m:mi>e</m:mi>
												<m:mi>a</m:mi>
												<m:mi>n</m:mi>
												<m:mo stretchy="false">[</m:mo>
												<m:msup>
													<m:mrow>
														<m:mo stretchy="false">(</m:mo>
														<m:msub>
															<m:mi>y</m:mi>
															<m:mrow>
																<m:mtext>guess</m:mtext>
															</m:mrow>
														</m:msub>
														<m:mo>&#8722;</m:mo>
														<m:msub>
															<m:mi>y</m:mi>
															<m:mrow>
																<m:mtext>answer</m:mtext>
															</m:mrow>
														</m:msub>
														<m:mo stretchy="false">)</m:mo>
													</m:mrow>
													<m:mn>2</m:mn>
												</m:msup>
												<m:mo stretchy="false">]</m:mo>
											</m:mrow>
										</m:msqrt>
									</m:mrow>
									<m:mo>/</m:mo>
									<m:mrow>
										<m:mi>s</m:mi>
										<m:mi>d</m:mi>
										<m:mo stretchy="false">(</m:mo>
										<m:msub>
											<m:mi>y</m:mi>
											<m:mrow>
												<m:mtext>answer</m:mtext>
											</m:mrow>
										</m:msub>
										<m:mo stretchy="false">)</m:mo>
									</m:mrow>
								</m:mrow>
								<m:mo>,</m:mo>
							</m:mrow>
							<m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqqGobGtcqqGsbGucqqGnbqtcqqGtbWucqqGfbqrcqqGGaaicqqG9aqpdaWcgaqaamaakaaabaGaemyBa0MaemyzauMaemyyaeMaemOBa4Maei4waSLaeiikaGIaemyEaK3aaSbaaSqaaiabbEgaNjabbwha1jabbwgaLjabbohaZjabbohaZbqabaGccqGHsislcqWG5bqEdaWgaaWcbaGaeeyyaeMaeeOBa4Maee4CamNaee4DaCNaeeyzauMaeeOCaihabeaakiabcMcaPmaaCaaaleqabaGaeGOmaidaaOGaeiyxa0faleqaaaGcbaGaem4CamNaemizaqMaeiikaGIaemyEaK3aaSbaaSqaaiabbggaHjabb6gaUjabbohaZjabbEha3jabbwgaLjabbkhaYbqabaGccqGGPaqkaaGaeiilaWcaaa@61F4@</m:annotation>
						</m:semantics>
					</m:math>
				</p>
				<p>where <it>y</it><sub><it>guess </it></sub>is the imputed value and <it>y</it><sub><it>answer </it></sub>the true value.</p>
			</sec>
			<sec>
				<st>
					<p>Experimental results</p>
				</st>
				<p>Figures <figr fid="F1">1</figr> and <figr fid="F2">2</figr> show the plots of NRMSE vs. <it>k </it>for the ELU Spellman data set, when the missing rates are 1% and 5%, respectively. We compare our RLSP with kNNimpute, LLSimpute and BPCA. For a large <it>k</it>, both LLSimpute and RLSP show highly competitive results and perform best compared to kNNimpute and BPCA. However, for a smaller <it>k</it>, RLSP performed much better than LLSimpute. LLSimpute shows a high peak when <it>k </it>is close to the number of samples. Because highly correlated <it>k </it>genes are usually selected in LLSimpute, the poor performance of LLSimpute is probably due to multi-collinearity of the selected <it>k </it>genes.</p>
				<fig id="F1">
					<title>
						<p>Figure 1</p>
					</title>
					<caption>
						<p>Comparison of the NRMSEs of various methods</p>
					</caption>
					<text>
						<p><b>Comparison of the NRMSEs of various method</b>s. Comparison of the NRMSEs of LLSimpute, kNN, BPCA, and RLSP imputation methods on ELU data set with the 1% missing rate. BPCA results are shown on the y-axis. The x-axis represents the value of <it>k </it>(selected similar genes). When <it>k </it>is close to sample size (<it>N</it>), LLSimpute has a high peak. Thus we truncated it in the graph. Overall, RLSP method showed improved performance compared to other methods.</p>
					</text>
					<graphic file="1471-2105-8-S2-S6-1"/>
				</fig>
				<fig id="F2">
					<title>
						<p>Figure 2</p>
					</title>
					<caption>
						<p>Comparison of the NRMSEs of various methods</p>
					</caption>
					<text>
						<p><b>Comparison of the NRMSEs of various method</b>s. Comparison of the NRMSEs of LLSimpute, kNN, BPCA, and RLSP imputation methods on ELU data set with the 5% missing rate. BPCA results are shown on the y-axis. The x-axis represents the value of <it>k </it>(selected similar genes). When <it>k </it>is close to sample size (<it>N</it>), LLSimpute has a high peak. Thus we truncated it in the graph. RLSP method demonstrated a better result than other methods.</p>
					</text>
					<graphic file="1471-2105-8-S2-S6-2"/>
				</fig>
				<p>Figures <figr fid="F3">3</figr> and <figr fid="F4">4</figr> show the smallest NRMSE values for four data sets. Figure <figr fid="F3">3</figr> represents the results of the case of 1% missing rate. LLSimpute and RLSP showed the similar results and the best performances in ALPHA and ELU data sets. For ALPHA+ELU and Gasch data, BPCA showed a little bit better performance than RLSP and LLSimpute, but it is competitive to LLSimpute and RLSP. Figure <figr fid="F4">4</figr> shows the results of the case of the 5% missing rate. RLSP, LLSimpute and BPCA show competitive performances for ALPHA and ELU data sets. For large datasets such as ALPHA+ELU and Gasch data sets, BPCA showed a little bit better performance. kNNimpute showed the worst performance in all data sets.</p>
				<fig id="F3">
					<title>
						<p>Figure 3</p>
					</title>
					<caption>
						<p>Comparison of the NRMSEs for 4 different data sets</p>
					</caption>
					<text>
						<p><b>Comparison of the NRMSEs for 4 different data set</b>s. Comparison of the NRMSEs for 4 different data sets (ALPHA, ELU, ALPHA+ELU, and Gasch) with 1% missing rate.</p>
					</text>
					<graphic file="1471-2105-8-S2-S6-3"/>
				</fig>
				<fig id="F4">
					<title>
						<p>Figure 4</p>
					</title>
					<caption>
						<p>Comparison of the NRMSEs for 4 different data sets</p>
					</caption>
					<text>
						<p><b>Comparison of the NRMSEs for 4 different data sets</b>. Comparison of the NRMSEs for 4 different data sets (ALPHA, ELU, ALPHA+ELU, and Gasch) with 5% missing rate.</p>
					</text>
					<graphic file="1471-2105-8-S2-S6-4"/>
				</fig>
				<p>We presume that the differences in the performance of missing value imputation methods are highly dependent on the data set as well as the value of <it>k</it>. When a moderate value of <it>k </it>is selected, say when <it>k </it>is close to the number of samples, the proposed RLSP method outperforms the LLSimpute method on all data sets (data not shown). As the missing rate increases, NRMSEs increase rapidly for all methods and the performances of all four methods become worse.</p>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Conclusion</p>
			</st>
			<p>The proposed RLSP method was motivated by a similar idea to that of the LLSimpute method. Both methods use the information from the selected <it>k </it>genes to estimate missing observations. LLSimpute uses the least squares method using the selected <it>k </it>genes. On the other hand, RLSP uses the principal components of the selected genes instead of the original <it>k </it>genes, and employs the quantile regression model for a robust analysis. The use of the principal components leads to a large difference between the two methods. The performance of LLSimpute is poor when <it>k </it>is small (near the number of samples). Assuming that multi-collinearity is probably the main cause for the poor performance of LLSimpute, RLSP addresses this problem using the principal components and then applying the robust regression approach to reduce the effect of outliers. In summary, RLSP showed more stable performance than LLSimpute for all data sets in our comparative studies.</p>
			<p>The performance of RLSP may depend on the value of <it>p</it>, the number of principal components. By varying the value of <it>p</it>, we examined its effect on the parameter estimators. The result showed that the performance of RLSP is optimal when the value of <it>p </it>is close to the number of the sample size (data not shown). A similar result was obtained from a previous study of the BPCA method <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. However, since the imputation procedure is executed for each missing value of a gene, the optimal value of <it>p </it>may differ from gene to gene. Selecting an optimal value for each missing value requires intensive computation. Thus, we recommend using the number of sample size as the value of <it>p </it>for practical application, although we expect that an appropriate choice of <it>p </it>would improve the performance of the RLSP method.</p>
			<p>In terms of computational efficiency, although RLSP and BPCA showed competitive results, BPCA required a higher computational demand due to the EM-like repetitive algorithm. In addition, RLSP seemed less computationally intensive than CMVE. However, a further study based on the same platform would be desirable for the systematic comparison.</p>
			<p>The presented method consists of three separate steps, where the first step applied L1 metric to select similar genes, the second step performs PCA on the selected set, which is a L2 method, and finally in the third step L1 metric is applied again to perform robust regression. Among the several combinations of metrics, the proposed combination provided the minimum NRMSE and provided the most computationally efficient result.</p>
			<p>The main motivation of the robust regression was to reduce the effect of outliers in estimation of missing observations. Our empirical studies demonstrated that the effect of outliers were not large enough to cause huge differences between robust regression and ordinary regression. Among the several robust regression methods including Tukey's bi-weight M-estimator, the proposed quantile regression using the 50th percentile provided the best result. However, a further study on selecting the better robust method is desirable.</p>
			<p>Finally, most missing imputation methods for microarray data assume the simple missing data mechanism to be the so called 'missing completely at random' <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. However, this mechanism may assume too much to be expected to hold in real applications. Therefore, more complicated methods are required for handling other possible missing data mechanisms. By incorporating the missing data mechanism or missing patterns in the microarray data, we could improve the performance of the missing imputation method.</p>
		</sec>
		<sec>
			<st>
				<p>Authors' contributions</p>
			</st>
			<p>All authors contributed to the development of RLSP method and the comparative studies with previously proposed imputation methods.</p>
		</sec>
	</bdy>
	<bm>
		<ack>
			<sec>
				<st>
					<p>Acknowledgements</p>
				</st>
				<p>The authors would like to thank the editors and two anonymous referees whose comments were extremely helpful. The authors also would like to thank Dr. Sehgal for many constructive discussions on the CMVE program. The work was supported by the National Research Laboratory Program of Korea Science and Engineering Foundation (M10500000126) and the Brain Korea 21 Project of the Ministry of Education.</p>
				<p>This article has been published as part of <it>BMC Bioinformatics </it>Volume 8, Supplement 2, 2007: Probabilistic Modeling and Machine Learning in Structural and Systems Biology. The full contents of the supplement are available online at <url>http://www.biomedcentral.com/1471-2105/8?issue=S2</url>.</p>
			</sec>
		</ack>
		<refgrp>
			<bibl id="B1">
				<title>
					<p>Gaussian mixture clustering and imputation of microarray data</p>
				</title>
				<aug>
					<au>
						<snm>Ouyang</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Welsh</snm>
						<fnm>WJ</fnm>
					</au>
					<au>
						<snm>Georgopoulos</snm>
						<fnm>P</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2004</pubdate>
				<volume>20</volume>
				<issue>6</issue>
				<fpage>917</fpage>
				<lpage>923</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/bth007</pubid>
						<pubid idtype="pmpid" link="fulltext">14751970</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B2">
				<title>
					<p>LSimpute: accurate estimation of missing values in microarray data with least squares methods</p>
				</title>
				<aug>
					<au>
						<snm>Bo</snm>
						<fnm>TH</fnm>
					</au>
					<au>
						<snm>Dysvik</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Jonassen</snm>
						<fnm>I</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2004</pubdate>
				<volume>32</volume>
				<issue>3</issue>
				<fpage>e34</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">374359</pubid>
						<pubid idtype="pmpid" link="fulltext">14978222</pubid>
						<pubid idtype="doi">10.1093/nar/gnh026</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B3">
				<title>
					<p>Missing value estimation methods for DNA microarrys</p>
				</title>
				<aug>
					<au>
						<snm>Troyanskaya</snm>
						<fnm>O</fnm>
					</au>
					<au>
						<snm>Cantor</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Sherlock</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Brown</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Hastie</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Tibshirani</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Botstein</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Altman</snm>
						<fnm>R</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2001</pubdate>
				<volume>17</volume>
				<issue>6</issue>
				<fpage>520</fpage>
				<lpage>525</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/17.6.520</pubid>
						<pubid idtype="pmpid" link="fulltext">11395428</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B4">
				<title>
					<p>Missing Value Estimation for DNA microarray gene expression data: local least squares imputation</p>
				</title>
				<aug>
					<au>
						<snm>Kim</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Golub</snm>
						<fnm>GH</fnm>
					</au>
					<au>
						<snm>Park</snm>
						<fnm>H</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2005</pubdate>
				<volume>21</volume>
				<issue>2</issue>
				<fpage>187</fpage>
				<lpage>198</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/bth499</pubid>
						<pubid idtype="pmpid" link="fulltext">15333461</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B5">
				<title>
					<p>A Bayesian missing value estimation method for gene expression profile data</p>
				</title>
				<aug>
					<au>
						<snm>Oba</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Sato</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Takemasa</snm>
						<fnm>I</fnm>
					</au>
					<au>
						<snm>Monden</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Matsubara</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Ishii</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2003</pubdate>
				<volume>19</volume>
				<issue>16</issue>
				<fpage>2088</fpage>
				<lpage>2096</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/btg287</pubid>
						<pubid idtype="pmpid" link="fulltext">14594714</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B6">
				<title>
					<p>Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data</p>
				</title>
				<aug>
					<au>
						<snm>Sehgal</snm>
						<fnm>MS</fnm>
					</au>
					<au>
						<snm>Gondal</snm>
						<fnm>I</fnm>
					</au>
					<au>
						<snm>Dooley</snm>
						<fnm>LS</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2005</pubdate>
				<volume>21</volume>
				<issue>10</issue>
				<fpage>2417</fpage>
				<lpage>2423</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/bti345</pubid>
						<pubid idtype="pmpid" link="fulltext">15731210</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B7">
				<title>
					<p>Microarray missing data imputation based on a set theoretic framework and biological knowledge</p>
				</title>
				<aug>
					<au>
						<snm>Gan</snm>
						<fnm>X</fnm>
					</au>
					<au>
						<snm>Liew</snm>
						<fnm>AW</fnm>
					</au>
					<au>
						<snm>Yan</snm>
						<fnm>H</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2006</pubdate>
				<volume>34</volume>
				<issue>5</issue>
				<fpage>1608</fpage>
				<lpage>1619</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">1409680</pubid>
						<pubid idtype="pmpid" link="fulltext">16549873</pubid>
						<pubid idtype="doi">10.1093/nar/gkl047</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B8">
				<title>
					<p>Improving missing value estimation in microarray data with gene ontology</p>
				</title>
				<aug>
					<au>
						<snm>Tuikkala</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Elo</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Nevalainen</snm>
						<fnm>OS</fnm>
					</au>
					<au>
						<snm>Aittokallio</snm>
						<fnm>T</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2006</pubdate>
				<volume>22</volume>
				<issue>5</issue>
				<fpage>566</fpage>
				<lpage>572</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/btk019</pubid>
						<pubid idtype="pmpid" link="fulltext">16377613</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B9">
				<title>
					<p>Improving missing value imputation of microarray data by using spot quality weights</p>
				</title>
				<aug>
					<au>
						<snm>Johansson</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Hakkinen</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>BMC Bioinformatics</source>
				<pubdate>2006</pubdate>
				<volume>7</volume>
				<issue/>
				<fpage>306</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1186/1471-2105-7-306</pubid>
						<pubid idtype="pmpid" link="fulltext">16780582</pubid>
						<pubid idtype="pmcid">1533869</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B10">
				<title>
					<p>Quantile Regression</p>
				</title>
				<aug>
					<au>
						<snm>Koenker</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Hallock</snm>
						<fnm>K</fnm>
					</au>
				</aug>
				<source>Journal of Economic Perspectives</source>
				<pubdate>2001</pubdate>
				<volume>15</volume>
				<fpage>143</fpage>
				<lpage>156</lpage>
			</bibl>
			<bibl id="B11">
				<title>
					<p>Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization</p>
				</title>
				<aug>
					<au>
						<snm>Spellman</snm>
						<fnm>PT</fnm>
					</au>
					<au>
						<snm>Sherlock</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Zhang</snm>
						<fnm>MQ</fnm>
					</au>
					<au>
						<snm>Iyer</snm>
						<fnm>VR</fnm>
					</au>
					<au>
						<snm>Anders</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Eisen</snm>
						<fnm>MB</fnm>
					</au>
					<au>
						<snm>Brown</snm>
						<fnm>PO</fnm>
					</au>
					<au>
						<snm>Botstein</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Futcher</snm>
						<fnm>B</fnm>
					</au>
				</aug>
				<source>Mol Biol Cell</source>
				<pubdate>1998</pubdate>
				<volume>9</volume>
				<fpage>3273</fpage>
				<lpage>3297</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">25624</pubid>
						<pubid idtype="pmpid" link="fulltext">9843569</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B12">
				<title>
					<p>Genomic expression responses to DNA damaging agents and the regulatory role of the yeast ATR homolog Mec1p</p>
				</title>
				<aug>
					<au>
						<snm>Gasch</snm>
						<fnm>AP</fnm>
					</au>
					<au>
						<snm>Huang</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Metzner</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Bostein</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Elledge</snm>
						<fnm>SJ</fnm>
					</au>
					<au>
						<snm>Brown</snm>
						<fnm>PO</fnm>
					</au>
				</aug>
				<source>Mol Biol Cell</source>
				<pubdate>2001</pubdate>
				<volume>12</volume>
				<fpage>2987</fpage>
				<lpage>3003</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">60150</pubid>
						<pubid idtype="pmpid" link="fulltext">11598186</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B13">
				<title>
					<p>Yeast Cell Cycle Analysis Project</p>
				</title>
				<url>http://cellcycle-www.stanford.edu</url>
			</bibl>
			<bibl id="B14">
				<title>
					<p>The web supplement to Gasch et al</p>
				</title>
				<url>http://www-genome.stanford.edu/Mec1</url>
			</bibl>
			<bibl id="B15">
				<title>
					<p>Statistical analysis with missing data</p>
				</title>
				<aug>
					<au>
						<snm>Little</snm>
						<fnm>RJA</fnm>
					</au>
					<au>
						<snm>Rubin</snm>
						<fnm>DB</fnm>
					</au>
				</aug>
				<publisher>Wiley, Hoboken, New Jersey</publisher>
				<edition>2</edition>
				<pubdate>2002</pubdate>
			</bibl>
		</refgrp>
	</bm>
</art>
