<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-6-244</ui>
   <ji>1471-2105</ji>
   <fm>
		<dochead>Software</dochead>
		<bibl>
			<title>
				<p>ASPIC: a novel method to predict the exon-intron structure of a gene that is optimally compatible to a set of transcript sequences</p>
			</title>
			<aug>
				<au id="A1">
					<snm>Bonizzoni</snm>
					<fnm>Paola</fnm>
					<insr iid="I1"/>
					<email>bonizzoni@disco.unimib.it</email>
				</au>
				<au id="A2">
					<snm>Rizzi</snm>
					<fnm>Raffaella</fnm>
					<insr iid="I1"/>
					<email>rizzi@disco.unimib.it</email>
				</au>
				<au id="A3" ca="yes">
					<snm>Pesole</snm>
					<fnm>Graziano</fnm>
					<insr iid="I2"/>
					<email>graziano.pesole@unimi.it</email>
				</au>
			</aug>
			<insg>
				<ins id="I1">
					<p>DISCo, University of Milan Bicocca, via Bicocca degli Arcimboldi, 8, Milan, 20135, Italy.</p>
				</ins>
				<ins id="I2">
					<p>Dipartimento di Scienze Biomolecolari e Biotecnologie, University of Milan, via Celoria, 26, Milan, 20133, Italy.</p>
				</ins>
			</insg>
			<source>BMC Bioinformatics</source>
			<issn>1471-2105</issn>
			<pubdate>2005</pubdate>
			<volume>6</volume>
			<issue>1</issue>
			<fpage>244</fpage>
			<url>http://www.biomedcentral.com/1471-2105/6/244</url>
			<xrefbib>
				<pubidlist>
					<pubid idtype="pmpid">16207377</pubid>
					<pubid idtype="doi">10.1186/1471-2105-6-244</pubid>
				</pubidlist>
			</xrefbib>
		</bibl>
		<history>
			<rec>
				<date>
					<day>26</day>
					<month>5</month>
					<year>2005</year>
				</date>
			</rec>
			<acc>
				<date>
					<day>05</day>
					<month>10</month>
					<year>2005</year>
				</date>
			</acc>
			<pub>
				<date>
					<day>05</day>
					<month>10</month>
					<year>2005</year>
				</date>
			</pub>
		</history>
		<cpyrt>
			<year>2005</year>
			<collab>Bonizzoni et al; licensee BioMed Central Ltd.</collab>
			<note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
		</cpyrt>
		<abs>
			<sec>
				<st>
					<p>Abstract</p>
				</st>
				<sec>
					<st>
						<p>Background:</p>
					</st>
					<p>Currently available methods to predict splice sites are mainly based on the independent and progressive alignment of transcript data (mostly ESTs) to the genomic sequence. Apart from often being computationally expensive, this approach is vulnerable to several problems &#8211; hence the need to develop novel strategies.</p>
				</sec>
				<sec>
					<st>
						<p>Results:</p>
					</st>
					<p>We propose a method, based on a novel multiple genome-EST alignment algorithm, for the detection of splice sites. To avoid limitations of splice sites prediction (mainly, over-predictions) due to independent single EST alignments to the genomic sequence our approach performs a multiple alignment of transcript data to the genomic sequence based on the combined analysis of all available data. We recast the problem of predicting constitutive and alternative splicing as an optimization problem, where the optimal multiple transcript alignment minimizes the number of exons and hence of splice site observations.</p>
					<p>We have implemented a splice site predictor based on this algorithm in the software tool ASPIC (Alternative Splicing PredICtion). It is distinguished from other methods based on BLAST-like tools by the incorporation of entirely new ad hoc procedures for accurate and computationally efficient transcript alignment and adopts dynamic programming for the refinement of intron boundaries. ASPIC also provides the minimal set of non-mergeable transcript isoforms compatible with the detected splicing events. The ASPIC web resource is dynamically interconnected with the Ensembl and Unigene databases and also implements an upload facility.</p>
				</sec>
				<sec>
					<st>
						<p>Conclusion:</p>
					</st>
					<p>Extensive bench marking shows that ASPIC outperforms other existing methods in the detection of novel splicing isoforms and in the minimization of over-predictions. ASPIC also requires a lower computation time for processing a single gene and an EST cluster. The ASPIC web resource is available at <url>http://aspic.algo.disco.unimib.it/aspic-devel/</url>.</p>
				</sec>
			</sec>
		</abs>
	</fm>
   <bdy>
		<sec>
			<st>
				<p>Background</p>
			</st>
			<p>The completion of several genome projects has, rather surprisingly, revealed that despite a remarkable heterogeneity in organism complexity and genome size, the variation in total gene number is much less pronounced, with a less than a 10-fold increase in gene number between prokaryotes (e.g. <it>E. coli</it>) and vertebrates (e.g. human) <abbrgrp>
					<abbr bid="B1">1</abbr>
				</abbrgrp>.</p>
			<p>However, the level of protein complexity in humans and other vertebrates is much higher than expected from the estimated gene number. Alternative splicing, leading to the generation of multiple transcripts from single genes, is believed to be the major mechanism expanding protein diversity in higher organisms <abbrgrp>
					<abbr bid="B2">2</abbr>
				</abbrgrp>. These transcripts can differ both in the untranslated (UTR) and in coding regions. Thus, using a different combination of donor and acceptor splice sites, transcripts encoding different proteins can be produced with alternative UTRs regulating their fate in the cell. Indeed, recent large scale genomic studies have shown that alternative splicing occurs in 40&#8211;60% of human genes <abbrgrp>
					<abbr bid="B3">3</abbr>
				</abbrgrp> and that it is a likely determinant of species-specificity since an unexpectedly low level of alternative splicing pattern conservation has been observed in pairs of orthologous genes <abbrgrp>
					<abbr bid="B4">4</abbr>
				</abbrgrp>. Recent studies have also shown that alternative splicing is important for determining developmental- and tissue-specific- gene expression <abbrgrp>
					<abbr bid="B5">5</abbr>
					<abbr bid="B6">6</abbr>
				</abbrgrp>. Aberrant splicing forms are also associated with human diseases <abbrgrp>
					<abbr bid="B7">7</abbr>
				</abbrgrp>. For these reasons, there is a growing interest in the high-throughput identification of alternative splicing forms in human and other organisms <abbrgrp>
					<abbr bid="B8">8</abbr>
				</abbrgrp>.</p>
			<p>Recently, there has been a growing interest in the design of computational methods to predict alternative splicing. Published methods may be classified in three groups: methods based on the comparison of expressed sequences to each other (i.e. <abbrgrp>
					<abbr bid="B9">9</abbr>
				</abbrgrp>, <abbrgrp>
					<abbr bid="B10">10</abbr>
				</abbrgrp>, <abbrgrp>
					<abbr bid="B11">11</abbr>
				</abbrgrp>), methods based on the alignment of ESTs to the genomic sequence <abbrgrp>
					<abbr bid="B12">12</abbr>
					<abbr bid="B13">13</abbr>
					<abbr bid="B14">14</abbr>
				</abbrgrp> and more recently methods that combine the previous two approaches, i.e. EST comparison and genome comparison, as proposed in <abbrgrp>
					<abbr bid="B15">15</abbr>
				</abbrgrp> and <abbrgrp>
					<abbr bid="B16">16</abbr>
				</abbrgrp>: we call such methods <it>multiple EST alignment methods</it>. A wide ranging discussion of the limitations of the first two methods has been presented and it has been shown that combining the two approaches leads to clear improvements in alternative splicing identification <abbrgrp>
					<abbr bid="B16">16</abbr>
				</abbrgrp>. Computational methods may be also classified according to the computational approach used to produce EST alignments. Indeed, it must be pointed out that the majority of tools uses BLAST, sim4 or most recently BLAT to map ESTs to the genome (see Table <tblr tid="T1">1</tblr> in <abbrgrp>
					<abbr bid="B11">11</abbr>
				</abbrgrp>). These tools are often error prone when aligning ESTs because they have not been designed to consider either the relationship between ESTs and their corresponding genomic sequences or sequencing errors in ESTs &#8211; for example the presence of large gaps, short exons or specific constraints on the alignment near intron boundaries.</p>
			<tbl id="T1">
				<title>
					<p>Table 1</p>
				</title>
				<caption>
					<p>Benchmark comparison of ASPIC with other similar tools</p>
				</caption>
				<tblbdy cols="10">
					<r>
						<c>
							<p/>
						</c>
						<c cspan="3" ca="center">
							<p>ASPIC</p>
						</c>
						<c cspan="2" ca="center">
							<p>ASAP</p>
						</c>
						<c cspan="2" ca="center">
							<p>ASD</p>
						</c>
						<c cspan="2" ca="center">
							<p>ACEVIEW</p>
						</c>
					</r>
					<r>
						<c>
							<p/>
						</c>
						<c cspan="3">
							<hr/>
						</c>
						<c cspan="2">
							<hr/>
						</c>
						<c cspan="2">
							<hr/>
						</c>
						<c cspan="2">
							<hr/>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>GENE</it></p>
						</c>
						<c ca="center">
							<p><it>#introns (#novel)</it></p>
						</c>
						<c ca="center">
							<p><it>#TS</it></p>
						</c>
						<c ca="center">
							<p><it>#EST/splice</it></p>
						</c>
						<c ca="center">
							<p><it>#introns (#ASPIC)</it></p>
						</c>
						<c ca="center">
							<p><it>#TS</it></p>
						</c>
						<c ca="center">
							<p><it>#introns (#ASPIC)</it></p>
						</c>
						<c ca="center">
							<p><it>#TS</it></p>
						</c>
						<c ca="center">
							<p><it>#introns (#ASPIC)</it></p>
						</c>
						<c ca="center">
							<p><it>#TS</it></p>
						</c>
					</r>
					<r>
						<c cspan="10">
							<hr/>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>ABCB10</p>
						</c>
						<c ca="center">
							<p>12(0)</p>
						</c>
						<c ca="center">
							<p>2</p>
						</c>
						<c ca="center">
							<p>12.42</p>
						</c>
						<c ca="center">
							<p>12(12)</p>
						</c>
						<c ca="center">
							<p>1</p>
						</c>
						<c ca="center">
							<p>Not Found</p>
						</c>
						<c>
							<p/>
						</c>
						<c ca="center">
							<p>13(12)</p>
						</c>
						<c ca="center">
							<p>3</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>ACADM</p>
						</c>
						<c ca="center">
							<p>21(1)</p>
						</c>
						<c ca="center">
							<p>15</p>
						</c>
						<c ca="center">
							<p>31.52</p>
						</c>
						<c ca="center">
							<p>15(14)</p>
						</c>
						<c ca="center">
							<p>6</p>
						</c>
						<c ca="center">
							<p>Not Found</p>
						</c>
						<c>
							<p/>
						</c>
						<c ca="center">
							<p>22(20)</p>
						</c>
						<c ca="center">
							<p>14</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>ACTN2</p>
						</c>
						<c ca="center">
							<p>23(0)</p>
						</c>
						<c ca="center">
							<p>28</p>
						</c>
						<c ca="center">
							<p>19.09</p>
						</c>
						<c ca="center">
							<p>20(20)</p>
						</c>
						<c ca="center">
							<p>1</p>
						</c>
						<c ca="center">
							<p>22(22)</p>
						</c>
						<c ca="center">
							<p>4</p>
						</c>
						<c ca="center">
							<p>23(23)</p>
						</c>
						<c ca="center">
							<p>8</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>ADAM15</p>
						</c>
						<c ca="center">
							<p>41(4)</p>
						</c>
						<c ca="center">
							<p>67</p>
						</c>
						<c ca="center">
							<p>40.07</p>
						</c>
						<c ca="center">
							<p>13(13)</p>
						</c>
						<c ca="center">
							<p>4</p>
						</c>
						<c ca="center">
							<p>29(29)</p>
						</c>
						<c ca="center">
							<p>11</p>
						</c>
						<c ca="center">
							<p>56(37)</p>
						</c>
						<c ca="center">
							<p>25</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>ADAMTS4</p>
						</c>
						<c ca="center">
							<p>8(0)</p>
						</c>
						<c ca="center">
							<p>3</p>
						</c>
						<c ca="center">
							<p>8.63</p>
						</c>
						<c ca="center">
							<p>7(7)</p>
						</c>
						<c ca="center">
							<p>1</p>
						</c>
						<c ca="center">
							<p>8(8)</p>
						</c>
						<c ca="center">
							<p>4</p>
						</c>
						<c ca="center">
							<p>8(8)</p>
						</c>
						<c ca="center">
							<p>3</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>ADORA1</p>
						</c>
						<c ca="center">
							<p>13(1)</p>
						</c>
						<c ca="center">
							<p>10</p>
						</c>
						<c ca="center">
							<p>4.69</p>
						</c>
						<c ca="center">
							<p>8(8)</p>
						</c>
						<c ca="center">
							<p>4</p>
						</c>
						<c ca="center">
							<p>12(12)</p>
						</c>
						<c ca="center">
							<p>9</p>
						</c>
						<c ca="center">
							<p>10(9)</p>
						</c>
						<c ca="center">
							<p>7</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>ADORA3</p>
						</c>
						<c ca="center">
							<p>15(13)</p>
						</c>
						<c ca="center">
							<p>5</p>
						</c>
						<c ca="center">
							<p>6.13</p>
						</c>
						<c ca="center">
							<p>2(2)</p>
						</c>
						<c ca="center">
							<p>2</p>
						</c>
						<c ca="center">
							<p>Not Found</p>
						</c>
						<c>
							<p/>
						</c>
						<c ca="center">
							<p>3(2)</p>
						</c>
						<c ca="center">
							<p>3</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>AGL</p>
						</c>
						<c ca="center">
							<p>40(1)</p>
						</c>
						<c ca="center">
							<p>12</p>
						</c>
						<c ca="center">
							<p>14.48</p>
						</c>
						<c ca="center">
							<p>38(38)</p>
						</c>
						<c ca="center">
							<p>1</p>
						</c>
						<c ca="center">
							<p>Not Found</p>
						</c>
						<c>
							<p/>
						</c>
						<c ca="center">
							<p>39(39)</p>
						</c>
						<c ca="center">
							<p>10</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>AGRN</p>
						</c>
						<c ca="center">
							<p>41(4)</p>
						</c>
						<c ca="center">
							<p>21</p>
						</c>
						<c ca="center">
							<p>16.98</p>
						</c>
						<c ca="center">
							<p>35(32)</p>
						</c>
						<c ca="center">
							<p>1</p>
						</c>
						<c ca="center">
							<p>Not Found</p>
						</c>
						<c>
							<p/>
						</c>
						<c ca="center">
							<p>45(37)</p>
						</c>
						<c ca="center">
							<p>11</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>AGT</p>
						</c>
						<c ca="center">
							<p>7(2)</p>
						</c>
						<c ca="center">
							<p>17</p>
						</c>
						<c ca="center">
							<p>52.86</p>
						</c>
						<c ca="center">
							<p>4(3)</p>
						</c>
						<c ca="center">
							<p>1</p>
						</c>
						<c ca="center">
							<p>Not Found</p>
						</c>
						<c>
							<p/>
						</c>
						<c ca="center">
							<p>9(5)</p>
						</c>
						<c ca="center">
							<p>8</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>AHCYL1</p>
						</c>
						<c ca="center">
							<p>26(4)</p>
						</c>
						<c ca="center">
							<p>35</p>
						</c>
						<c ca="center">
							<p>48.50</p>
						</c>
						<c ca="center">
							<p>19(19)</p>
						</c>
						<c ca="center">
							<p>5</p>
						</c>
						<c ca="center">
							<p>19(19)</p>
						</c>
						<c ca="center">
							<p>4</p>
						</c>
						<c ca="center">
							<p>26(22)</p>
						</c>
						<c ca="center">
							<p>13</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>AKR7A2</p>
						</c>
						<c ca="center">
							<p>9(1)</p>
						</c>
						<c ca="center">
							<p>6</p>
						</c>
						<c ca="center">
							<p>73.33</p>
						</c>
						<c ca="center">
							<p>6(6)</p>
						</c>
						<c ca="center">
							<p>1</p>
						</c>
						<c ca="center">
							<p>7(7)</p>
						</c>
						<c ca="center">
							<p>2</p>
						</c>
						<c ca="center">
							<p>31(8)</p>
						</c>
						<c ca="center">
							<p>15</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>ALDH9A1</p>
						</c>
						<c ca="center">
							<p>17(4)</p>
						</c>
						<c ca="center">
							<p>10</p>
						</c>
						<c ca="center">
							<p>39.29</p>
						</c>
						<c ca="center">
							<p>11(11)</p>
						</c>
						<c ca="center">
							<p>2</p>
						</c>
						<c ca="center">
							<p>Not Found</p>
						</c>
						<c>
							<p/>
						</c>
						<c ca="center">
							<p>16(13)</p>
						</c>
						<c ca="center">
							<p>7</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>ALPL</p>
						</c>
						<c ca="center">
							<p>15(0)</p>
						</c>
						<c ca="center">
							<p>8</p>
						</c>
						<c ca="center">
							<p>19.47</p>
						</c>
						<c ca="center">
							<p>14(13)</p>
						</c>
						<c ca="center">
							<p>3</p>
						</c>
						<c ca="center">
							<p>13(13)</p>
						</c>
						<c ca="center">
							<p>3</p>
						</c>
						<c ca="center">
							<p>17(15)</p>
						</c>
						<c ca="center">
							<p>9</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>AMPD1</p>
						</c>
						<c ca="center">
							<p>14(0)</p>
						</c>
						<c ca="center">
							<p>4</p>
						</c>
						<c ca="center">
							<p>7.64</p>
						</c>
						<c ca="center">
							<p>13(13)</p>
						</c>
						<c ca="center">
							<p>1</p>
						</c>
						<c ca="center">
							<p>Not Found</p>
						</c>
						<c>
							<p/>
						</c>
						<c ca="center">
							<p>45(14)</p>
						</c>
						<c ca="center">
							<p>12</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>ANGPTL1</p>
						</c>
						<c ca="center">
							<p>6(0)</p>
						</c>
						<c ca="center">
							<p>3</p>
						</c>
						<c ca="center">
							<p>10.50</p>
						</c>
						<c ca="center">
							<p>5(5)</p>
						</c>
						<c ca="center">
							<p>1</p>
						</c>
						<c ca="center">
							<p>Not Found</p>
						</c>
						<c>
							<p/>
						</c>
						<c ca="center">
							<p>6(6)</p>
						</c>
						<c ca="center">
							<p>4</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>ANGPTL3</p>
						</c>
						<c ca="center">
							<p>6(0)</p>
						</c>
						<c ca="center">
							<p>5</p>
						</c>
						<c ca="center">
							<p>19.17</p>
						</c>
						<c ca="center">
							<p>6(6)</p>
						</c>
						<c ca="center">
							<p>2</p>
						</c>
						<c ca="center">
							<p>Not Found</p>
						</c>
						<c>
							<p/>
						</c>
						<c ca="center">
							<p>8(6)</p>
						</c>
						<c ca="center">
							<p>7</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>ANXA9</p>
						</c>
						<c ca="center">
							<p>15(1)</p>
						</c>
						<c ca="center">
							<p>3</p>
						</c>
						<c ca="center">
							<p>14.40</p>
						</c>
						<c ca="center">
							<p>13(13)</p>
						</c>
						<c ca="center">
							<p>1</p>
						</c>
						<c ca="center">
							<p>14(13)</p>
						</c>
						<c ca="center">
							<p>2</p>
						</c>
						<c ca="center">
							<p>16(14)</p>
						</c>
						<c ca="center">
							<p>6</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>AP4B1</p>
						</c>
						<c ca="center">
							<p>18(0)</p>
						</c>
						<c ca="center">
							<p>22</p>
						</c>
						<c ca="center">
							<p>14.61</p>
						</c>
						<c ca="center">
							<p>12(12)</p>
						</c>
						<c ca="center">
							<p>1</p>
						</c>
						<c ca="center">
							<p>17(16)</p>
						</c>
						<c ca="center">
							<p>12</p>
						</c>
						<c ca="center">
							<p>16(16)</p>
						</c>
						<c ca="center">
							<p>14</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>APCS</p>
						</c>
						<c ca="center">
							<p>2(1)</p>
						</c>
						<c ca="center">
							<p>5</p>
						</c>
						<c ca="center">
							<p>62.50</p>
						</c>
						<c ca="center">
							<p>1(1)</p>
						</c>
						<c ca="center">
							<p>1</p>
						</c>
						<c ca="center">
							<p>Not Found</p>
						</c>
						<c>
							<p/>
						</c>
						<c ca="center">
							<p>1(1)</p>
						</c>
						<c ca="center">
							<p>1</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>ARHGEF2</p>
						</c>
						<c ca="center">
							<p>32(1)</p>
						</c>
						<c ca="center">
							<p>37</p>
						</c>
						<c ca="center">
							<p>15.19</p>
						</c>
						<c ca="center">
							<p>22(22)</p>
						</c>
						<c ca="center">
							<p>3</p>
						</c>
						<c ca="center">
							<p>26(25)</p>
						</c>
						<c ca="center">
							<p>6</p>
						</c>
						<c ca="center">
							<p>35(31)</p>
						</c>
						<c ca="center">
							<p>17</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>ARHGEF11</p>
						</c>
						<c ca="center">
							<p>47(1)</p>
						</c>
						<c ca="center">
							<p>9</p>
						</c>
						<c ca="center">
							<p>7.70</p>
						</c>
						<c ca="center">
							<p>42(42)</p>
						</c>
						<c ca="center">
							<p>2</p>
						</c>
						<c ca="center">
							<p>41(40)</p>
						</c>
						<c ca="center">
							<p>6</p>
						</c>
						<c ca="center">
							<p>46(45)</p>
						</c>
						<c ca="center">
							<p>17</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>ARHGEF16</p>
						</c>
						<c ca="center">
							<p>14(2)</p>
						</c>
						<c ca="center">
							<p>12</p>
						</c>
						<c ca="center">
							<p>18.64</p>
						</c>
						<c ca="center">
							<p>10(10)</p>
						</c>
						<c ca="center">
							<p>1</p>
						</c>
						<c ca="center">
							<p>Not Found</p>
						</c>
						<c>
							<p/>
						</c>
						<c ca="center">
							<p>15(12)</p>
						</c>
						<c ca="center">
							<p>5</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>ARNT</p>
						</c>
						<c ca="center">
							<p>26(1)</p>
						</c>
						<c ca="center">
							<p>14</p>
						</c>
						<c ca="center">
							<p>11.73</p>
						</c>
						<c ca="center">
							<p>20(18)</p>
						</c>
						<c ca="center">
							<p>1</p>
						</c>
						<c ca="center">
							<p>22(21)</p>
						</c>
						<c ca="center">
							<p>3</p>
						</c>
						<c ca="center">
							<p>38(26)</p>
						</c>
						<c ca="center">
							<p>14</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>ARPC5</p>
						</c>
						<c ca="center">
							<p>4(1)</p>
						</c>
						<c ca="center">
							<p>2</p>
						</c>
						<c ca="center">
							<p>120.75</p>
						</c>
						<c ca="center">
							<p>3(3)</p>
						</c>
						<c ca="center">
							<p>1</p>
						</c>
						<c ca="center">
							<p>4(2)</p>
						</c>
						<c ca="center">
							<p>2</p>
						</c>
						<c ca="center">
							<p>6(3)</p>
						</c>
						<c ca="center">
							<p>4</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>ARTN</p>
						</c>
						<c ca="center">
							<p>8(0)</p>
						</c>
						<c ca="center">
							<p>10</p>
						</c>
						<c ca="center">
							<p>5.25</p>
						</c>
						<c ca="center">
							<p>7(7)</p>
						</c>
						<c ca="center">
							<p>4</p>
						</c>
						<c ca="center">
							<p>6(6)</p>
						</c>
						<c ca="center">
							<p>3</p>
						</c>
						<c ca="center">
							<p>7(7)</p>
						</c>
						<c ca="center">
							<p>10</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>ATAD3A</p>
						</c>
						<c ca="center">
							<p>22(2)</p>
						</c>
						<c ca="center">
							<p>10</p>
						</c>
						<c ca="center">
							<p>36.41</p>
						</c>
						<c ca="center">
							<p>16(16)</p>
						</c>
						<c ca="center">
							<p>2</p>
						</c>
						<c ca="center">
							<p>Not Found</p>
						</c>
						<c>
							<p/>
						</c>
						<c ca="center">
							<p>58(21)</p>
						</c>
						<c ca="center">
							<p>27</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>ATP1B1</p>
						</c>
						<c ca="center">
							<p>11(2)</p>
						</c>
						<c ca="center">
							<p>12</p>
						</c>
						<c ca="center">
							<p>59.27</p>
						</c>
						<c ca="center">
							<p>7(7)</p>
						</c>
						<c ca="center">
							<p>1</p>
						</c>
						<c ca="center">
							<p>10(9)</p>
						</c>
						<c ca="center">
							<p>4</p>
						</c>
						<c ca="center">
							<p>11(8)</p>
						</c>
						<c ca="center">
							<p>10</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>ATP2B4</p>
						</c>
						<c ca="center">
							<p>29(3)</p>
						</c>
						<c ca="center">
							<p>11</p>
						</c>
						<c ca="center">
							<p>8.07</p>
						</c>
						<c ca="center">
							<p>22(22)</p>
						</c>
						<c ca="center">
							<p>5</p>
						</c>
						<c ca="center">
							<p>23(23)</p>
						</c>
						<c ca="center">
							<p>3</p>
						</c>
						<c ca="center">
							<p>26(26)</p>
						</c>
						<c ca="center">
							<p>14</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>Clorf10</p>
						</c>
						<c ca="center">
							<p>2(0)</p>
						</c>
						<c ca="center">
							<p>1</p>
						</c>
						<c ca="center">
							<p>7.00</p>
						</c>
						<c ca="center">
							<p>2(2)</p>
						</c>
						<c ca="center">
							<p>1</p>
						</c>
						<c ca="center">
							<p>Not Found</p>
						</c>
						<c>
							<p/>
						</c>
						<c ca="center">
							<p>2(2)</p>
						</c>
						<c ca="center">
							<p>2</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>Clorf26</p>
						</c>
						<c ca="center">
							<p>22(0)</p>
						</c>
						<c ca="center">
							<p>7</p>
						</c>
						<c ca="center">
							<p>7.91</p>
						</c>
						<c ca="center">
							<p>17(17)</p>
						</c>
						<c ca="center">
							<p>1</p>
						</c>
						<c ca="center">
							<p>Not Found</p>
						</c>
						<c>
							<p/>
						</c>
						<c ca="center">
							<p>23(22)</p>
						</c>
						<c ca="center">
							<p>6</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>C1QB</p>
						</c>
						<c ca="center">
							<p>3(1)</p>
						</c>
						<c ca="center">
							<p>3</p>
						</c>
						<c ca="center">
							<p>25.67</p>
						</c>
						<c ca="center">
							<p>5(2)</p>
						</c>
						<c ca="center">
							<p>3</p>
						</c>
						<c ca="center">
							<p>5(2)</p>
						</c>
						<c ca="center">
							<p>3</p>
						</c>
						<c ca="center">
							<p>6(2)</p>
						</c>
						<c ca="center">
							<p>5</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>CAPZA1</p>
						</c>
						<c ca="center">
							<p>15(4)</p>
						</c>
						<c ca="center">
							<p>15</p>
						</c>
						<c ca="center">
							<p>68.40</p>
						</c>
						<c ca="center">
							<p>11(11)</p>
						</c>
						<c ca="center">
							<p>2</p>
						</c>
						<c ca="center">
							<p>10(9)</p>
						</c>
						<c ca="center">
							<p>3</p>
						</c>
						<c ca="center">
							<p>12(11)</p>
						</c>
						<c ca="center">
							<p>7</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>CTRC</p>
						</c>
						<c ca="center">
							<p>9(1)</p>
						</c>
						<c ca="center">
							<p>4</p>
						</c>
						<c ca="center">
							<p>36.22</p>
						</c>
						<c ca="center">
							<p>8(8)</p>
						</c>
						<c ca="center">
							<p>1</p>
						</c>
						<c ca="center">
							<p>Not Found</p>
						</c>
						<c>
							<p/>
						</c>
						<c ca="center">
							<p>8(7)</p>
						</c>
						<c ca="center">
							<p>3</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>DMRTA2</p>
						</c>
						<c ca="center">
							<p>2(0)</p>
						</c>
						<c ca="center">
							<p>2</p>
						</c>
						<c ca="center">
							<p>1.00</p>
						</c>
						<c ca="center">
							<p>1(1)</p>
						</c>
						<c ca="center">
							<p>1</p>
						</c>
						<c ca="center">
							<p>Not Found</p>
						</c>
						<c>
							<p/>
						</c>
						<c ca="center">
							<p>2(2)</p>
						</c>
						<c ca="center">
							<p>1</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>DPH2L2</p>
						</c>
						<c ca="center">
							<p>12(1)</p>
						</c>
						<c ca="center">
							<p>11</p>
						</c>
						<c ca="center">
							<p>31.25</p>
						</c>
						<c ca="center">
							<p>10(10)</p>
						</c>
						<c ca="center">
							<p>7</p>
						</c>
						<c ca="center">
							<p>12(11)</p>
						</c>
						<c ca="center">
							<p>12</p>
						</c>
						<c ca="center">
							<p>12(11)</p>
						</c>
						<c ca="center">
							<p>14</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>EPHA2</p>
						</c>
						<c ca="center">
							<p>20(1)</p>
						</c>
						<c ca="center">
							<p>8</p>
						</c>
						<c ca="center">
							<p>13.45</p>
						</c>
						<c ca="center">
							<p>16(16)</p>
						</c>
						<c ca="center">
							<p>1</p>
						</c>
						<c ca="center">
							<p>17(17)</p>
						</c>
						<c ca="center">
							<p>7</p>
						</c>
						<c ca="center">
							<p>20(19)</p>
						</c>
						<c ca="center">
							<p>8</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>EYA3</p>
						</c>
						<c ca="center">
							<p>20(0)</p>
						</c>
						<c ca="center">
							<p>9</p>
						</c>
						<c ca="center">
							<p>11.40</p>
						</c>
						<c ca="center">
							<p>15(15)</p>
						</c>
						<c ca="center">
							<p>1</p>
						</c>
						<c ca="center">
							<p>Not Found</p>
						</c>
						<c>
							<p/>
						</c>
						<c ca="center">
							<p>21(20)</p>
						</c>
						<c ca="center">
							<p>10</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>FBXO2</p>
						</c>
						<c ca="center">
							<p>9(0)</p>
						</c>
						<c ca="center">
							<p>6</p>
						</c>
						<c ca="center">
							<p>13.67</p>
						</c>
						<c ca="center">
							<p>9(8)</p>
						</c>
						<c ca="center">
							<p>2</p>
						</c>
						<c ca="center">
							<p>6(5)</p>
						</c>
						<c ca="center">
							<p>3</p>
						</c>
						<c ca="center">
							<p>9(8)</p>
						</c>
						<c ca="center">
							<p>5</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>FCGR3B</p>
						</c>
						<c ca="center">
							<p>7(0)</p>
						</c>
						<c ca="center">
							<p>5</p>
						</c>
						<c ca="center">
							<p>22.57</p>
						</c>
						<c ca="center">
							<p>4(4)</p>
						</c>
						<c ca="center">
							<p>1</p>
						</c>
						<c ca="center">
							<p>Not Found</p>
						</c>
						<c>
							<p/>
						</c>
						<c ca="center">
							<p>7(7)</p>
						</c>
						<c ca="center">
							<p>6</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>FUCA1</p>
						</c>
						<c ca="center">
							<p>11(3)</p>
						</c>
						<c ca="center">
							<p>8</p>
						</c>
						<c ca="center">
							<p>18.00</p>
						</c>
						<c ca="center">
							<p>8(8)</p>
						</c>
						<c ca="center">
							<p>2</p>
						</c>
						<c ca="center">
							<p>Not Found</p>
						</c>
						<c>
							<p/>
						</c>
						<c ca="center">
							<p>7(7)</p>
						</c>
						<c ca="center">
							<p>2</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>GBP2</p>
						</c>
						<c ca="center">
							<p>17(3)</p>
						</c>
						<c ca="center">
							<p>8</p>
						</c>
						<c ca="center">
							<p>27.82</p>
						</c>
						<c ca="center">
							<p>12(12)</p>
						</c>
						<c ca="center">
							<p>2</p>
						</c>
						<c ca="center">
							<p>Not Found</p>
						</c>
						<c>
							<p/>
						</c>
						<c ca="center">
							<p>26(14)</p>
						</c>
						<c ca="center">
							<p>10</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>GMEB1</p>
						</c>
						<c ca="center">
							<p>12(1)</p>
						</c>
						<c ca="center">
							<p>5</p>
						</c>
						<c ca="center">
							<p>16.67</p>
						</c>
						<c ca="center">
							<p>9(9)</p>
						</c>
						<c ca="center">
							<p>2</p>
						</c>
						<c ca="center">
							<p>11(11)</p>
						</c>
						<c ca="center">
							<p>3</p>
						</c>
						<c ca="center">
							<p>11(11)</p>
						</c>
						<c ca="center">
							<p>6</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>HNRPR</p>
						</c>
						<c ca="center">
							<p>20(2)</p>
						</c>
						<c ca="center">
							<p>38</p>
						</c>
						<c ca="center">
							<p>45.70</p>
						</c>
						<c ca="center">
							<p>16(15)</p>
						</c>
						<c ca="center">
							<p>7</p>
						</c>
						<c ca="center">
							<p>12(12)</p>
						</c>
						<c ca="center">
							<p>7</p>
						</c>
						<c ca="center">
							<p>21(18)</p>
						</c>
						<c ca="center">
							<p>17</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>LGALS8</p>
						</c>
						<c ca="center">
							<p>22(2)</p>
						</c>
						<c ca="center">
							<p>25</p>
						</c>
						<c ca="center">
							<p>16.00</p>
						</c>
						<c ca="center">
							<p>12(11)</p>
						</c>
						<c ca="center">
							<p>4</p>
						</c>
						<c ca="center">
							<p>13(13)</p>
						</c>
						<c ca="center">
							<p>4</p>
						</c>
						<c ca="center">
							<p>27(19)</p>
						</c>
						<c ca="center">
							<p>21</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>LRRN5</p>
						</c>
						<c ca="center">
							<p>3(0)</p>
						</c>
						<c ca="center">
							<p>3</p>
						</c>
						<c ca="center">
							<p>3.00</p>
						</c>
						<c ca="center">
							<p>5(3)</p>
						</c>
						<c ca="center">
							<p>2</p>
						</c>
						<c ca="center">
							<p>Not Found</p>
						</c>
						<c>
							<p/>
						</c>
						<c ca="center">
							<p>5(3)</p>
						</c>
						<c ca="center">
							<p>4</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>LYPLA2</p>
						</c>
						<c ca="center">
							<p>15(0)</p>
						</c>
						<c ca="center">
							<p>14</p>
						</c>
						<c ca="center">
							<p>96.07</p>
						</c>
						<c ca="center">
							<p>14(14)</p>
						</c>
						<c ca="center">
							<p>6</p>
						</c>
						<c ca="center">
							<p>Not Found</p>
						</c>
						<c>
							<p/>
						</c>
						<c ca="center">
							<p>15(14)</p>
						</c>
						<c ca="center">
							<p>14</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>MASP2</p>
						</c>
						<c ca="center">
							<p>11(0)</p>
						</c>
						<c ca="center">
							<p>5</p>
						</c>
						<c ca="center">
							<p>7.00</p>
						</c>
						<c ca="center">
							<p>11(11)</p>
						</c>
						<c ca="center">
							<p>2</p>
						</c>
						<c ca="center">
							<p>11(11)</p>
						</c>
						<c ca="center">
							<p>5</p>
						</c>
						<c ca="center">
							<p>11(11)</p>
						</c>
						<c ca="center">
							<p>6</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>MOV10</p>
						</c>
						<c ca="center">
							<p>35(4)</p>
						</c>
						<c ca="center">
							<p>42</p>
						</c>
						<c ca="center">
							<p>25.29</p>
						</c>
						<c ca="center">
							<p>29(29)</p>
						</c>
						<c ca="center">
							<p>7</p>
						</c>
						<c ca="center">
							<p>24(23)</p>
						</c>
						<c ca="center">
							<p>8</p>
						</c>
						<c ca="center">
							<p>33(31)</p>
						</c>
						<c ca="center">
							<p>21</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>NPPB</p>
						</c>
						<c ca="center">
							<p>2(0)</p>
						</c>
						<c ca="center">
							<p>1</p>
						</c>
						<c ca="center">
							<p>30.50</p>
						</c>
						<c ca="center">
							<p>2(2)</p>
						</c>
						<c ca="center">
							<p>1</p>
						</c>
						<c ca="center">
							<p>Not Found</p>
						</c>
						<c>
							<p/>
						</c>
						<c ca="center">
							<p>3(2)</p>
						</c>
						<c ca="center">
							<p>3</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>PAFAH2</p>
						</c>
						<c ca="center">
							<p>18(3)</p>
						</c>
						<c ca="center">
							<p>11</p>
						</c>
						<c ca="center">
							<p>15.06</p>
						</c>
						<c ca="center">
							<p>12(11)</p>
						</c>
						<c ca="center">
							<p>2</p>
						</c>
						<c ca="center">
							<p>13(13)</p>
						</c>
						<c ca="center">
							<p>5</p>
						</c>
						<c ca="center">
							<p>16(14)</p>
						</c>
						<c ca="center">
							<p>8</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>PALMD</p>
						</c>
						<c ca="center">
							<p>8(0)</p>
						</c>
						<c ca="center">
							<p>9</p>
						</c>
						<c ca="center">
							<p>35.63</p>
						</c>
						<c ca="center">
							<p>7(7)</p>
						</c>
						<c ca="center">
							<p>1</p>
						</c>
						<c ca="center">
							<p>Not Found</p>
						</c>
						<c>
							<p/>
						</c>
						<c ca="center">
							<p>9(8)</p>
						</c>
						<c ca="center">
							<p>8</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>PEX10</p>
						</c>
						<c ca="center">
							<p>12(1)</p>
						</c>
						<c ca="center">
							<p>13</p>
						</c>
						<c ca="center">
							<p>21.58</p>
						</c>
						<c ca="center">
							<p>6(6)</p>
						</c>
						<c ca="center">
							<p>1</p>
						</c>
						<c ca="center">
							<p>10(10)</p>
						</c>
						<c ca="center">
							<p>7</p>
						</c>
						<c ca="center">
							<p>12(10)</p>
						</c>
						<c ca="center">
							<p>11</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>PINK1</p>
						</c>
						<c ca="center">
							<p>10(2)</p>
						</c>
						<c ca="center">
							<p>15</p>
						</c>
						<c ca="center">
							<p>40.20</p>
						</c>
						<c ca="center">
							<p>7(7)</p>
						</c>
						<c ca="center">
							<p>2</p>
						</c>
						<c ca="center">
							<p>Not Found</p>
						</c>
						<c>
							<p/>
						</c>
						<c ca="center">
							<p>9(8)</p>
						</c>
						<c ca="center">
							<p>10</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>PTPRU</p>
						</c>
						<c ca="center">
							<p>38(3)</p>
						</c>
						<c ca="center">
							<p>15</p>
						</c>
						<c ca="center">
							<p>12.89</p>
						</c>
						<c ca="center">
							<p>20(20)</p>
						</c>
						<c ca="center">
							<p>1</p>
						</c>
						<c ca="center">
							<p>Not Found</p>
						</c>
						<c>
							<p/>
						</c>
						<c ca="center">
							<p>35(35)</p>
						</c>
						<c ca="center">
							<p>9</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>RHOC</p>
						</c>
						<c ca="center">
							<p>17(3)</p>
						</c>
						<c ca="center">
							<p>5</p>
						</c>
						<c ca="center">
							<p>8.35</p>
						</c>
						<c ca="center">
							<p>13(2)</p>
						</c>
						<c ca="center">
							<p>7</p>
						</c>
						<c ca="center">
							<p>15(1)</p>
						</c>
						<c ca="center">
							<p>9</p>
						</c>
						<c ca="center">
							<p>39(14)</p>
						</c>
						<c ca="center">
							<p>31</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>SDC3</p>
						</c>
						<c ca="center">
							<p>6(2)</p>
						</c>
						<c ca="center">
							<p>5</p>
						</c>
						<c ca="center">
							<p>5.33</p>
						</c>
						<c ca="center">
							<p>4(3)</p>
						</c>
						<c ca="center">
							<p>2</p>
						</c>
						<c ca="center">
							<p>8(4)</p>
						</c>
						<c ca="center">
							<p>5</p>
						</c>
						<c ca="center">
							<p>9(5)</p>
						</c>
						<c ca="center">
							<p>6</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>SDHB</p>
						</c>
						<c ca="center">
							<p>11(0)</p>
						</c>
						<c ca="center">
							<p>12</p>
						</c>
						<c ca="center">
							<p>97.27</p>
						</c>
						<c ca="center">
							<p>9(9)</p>
						</c>
						<c ca="center">
							<p>3</p>
						</c>
						<c ca="center">
							<p>Not Found</p>
						</c>
						<c>
							<p/>
						</c>
						<c ca="center">
							<p>13(11)</p>
						</c>
						<c ca="center">
							<p>11</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>SERPINC1</p>
						</c>
						<c ca="center">
							<p>12(2)</p>
						</c>
						<c ca="center">
							<p>7</p>
						</c>
						<c ca="center">
							<p>18.75</p>
						</c>
						<c ca="center">
							<p>8(8)</p>
						</c>
						<c ca="center">
							<p>2</p>
						</c>
						<c ca="center">
							<p>8(8)</p>
						</c>
						<c ca="center">
							<p>3</p>
						</c>
						<c ca="center">
							<p>16(10)</p>
						</c>
						<c ca="center">
							<p>11</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>SFPQ</p>
						</c>
						<c ca="center">
							<p>12(3)</p>
						</c>
						<c ca="center">
							<p>11</p>
						</c>
						<c ca="center">
							<p>74.75</p>
						</c>
						<c ca="center">
							<p>9(9)</p>
						</c>
						<c ca="center">
							<p>1</p>
						</c>
						<c ca="center">
							<p>9(9)</p>
						</c>
						<c ca="center">
							<p>3</p>
						</c>
						<c ca="center">
							<p>17(9)</p>
						</c>
						<c ca="center">
							<p>25</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>TARDBP</p>
						</c>
						<c ca="center">
							<p>21(3)</p>
						</c>
						<c ca="center">
							<p>20</p>
						</c>
						<c ca="center">
							<p>29.38</p>
						</c>
						<c ca="center">
							<p>15(13)</p>
						</c>
						<c ca="center">
							<p>4</p>
						</c>
						<c ca="center">
							<p>9(9)</p>
						</c>
						<c ca="center">
							<p>4</p>
						</c>
						<c ca="center">
							<p>18(16)</p>
						</c>
						<c ca="center">
							<p>15</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>TCN2</p>
						</c>
						<c ca="center">
							<p>13(1)</p>
						</c>
						<c ca="center">
							<p>10</p>
						</c>
						<c ca="center">
							<p>26.15</p>
						</c>
						<c ca="center">
							<p>9(9)</p>
						</c>
						<c ca="center">
							<p>2</p>
						</c>
						<c ca="center">
							<p>12(12)</p>
						</c>
						<c ca="center">
							<p>4</p>
						</c>
						<c ca="center">
							<p>13(10)</p>
						</c>
						<c ca="center">
							<p>11</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>TOR3A</p>
						</c>
						<c ca="center">
							<p>15(2)</p>
						</c>
						<c ca="center">
							<p>9</p>
						</c>
						<c ca="center">
							<p>19.20</p>
						</c>
						<c ca="center">
							<p>9(9)</p>
						</c>
						<c ca="center">
							<p>4</p>
						</c>
						<c ca="center">
							<p>11(11)</p>
						</c>
						<c ca="center">
							<p>8</p>
						</c>
						<c ca="center">
							<p>15(13)</p>
						</c>
						<c ca="center">
							<p>12</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>VAMP3</p>
						</c>
						<c ca="center">
							<p>5(0)</p>
						</c>
						<c ca="center">
							<p>3</p>
						</c>
						<c ca="center">
							<p>80.60</p>
						</c>
						<c ca="center">
							<p>6(5)</p>
						</c>
						<c ca="center">
							<p>2</p>
						</c>
						<c ca="center">
							<p>6(5)</p>
						</c>
						<c ca="center">
							<p>3</p>
						</c>
						<c ca="center">
							<p>7(5)</p>
						</c>
						<c ca="center">
							<p>10</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>Total</p>
						</c>
						<c ca="center">
							<p>1009(94)</p>
						</c>
						<c ca="center">
							<p>11.9</p>
						</c>
						<c ca="center">
							<p>28.3</p>
						</c>
						<c ca="center">
							<p>753(721)</p>
						</c>
						<c ca="center">
							<p>2.3</p>
						</c>
						<c ca="center">
							<p>495(461)</p>
						</c>
						<c ca="center">
							<p>5.1</p>
						</c>
						<c ca="center">
							<p>1194(905)</p>
						</c>
						<c ca="center">
							<p>9.7</p>
						</c>
					</r>
				</tblbdy>
				<tblfn>
					<p>ASPIC results from a random sample of 64 human genes from Chromosome 1 compared to those from the ASAP, ASD and AceView resources. The first column reports the HUGO name of the examined gene. The ASPIC data include the total number of predicted introns (novel introns in brackets), the minimum number of compatible transcripts and the average number of ESTs supporting gene splices. Introns with 2 non canonical splices are accepted by ASPIC only if confirmed by at least two ESTs. For the other resources the number of predicted introns (in brackets those also predicted by ASPIC) and the minimum number of compatible transcripts are reported. Other resources: AceView (July 2003 and August 2004 releases), ASD (July 2004) and ASAP (July 2004).</p>
				</tblfn>
			</tbl>
			<p>In this paper we propose a method that is not based on traditional BLAST-like (or BLAT-like as in <abbrgrp>
					<abbr bid="B17">17</abbr>
				</abbrgrp>) alignment tools for spliced alignment, but which relies on a new heuristic for multiple EST alignments that allows &#8211; as in <abbrgrp>
					<abbr bid="B12">12</abbr>
				</abbrgrp> &#8211; the use of a high number of insertions/deletions and specific scoring criteria for the spliced alignment in order to generate more accurate splice site predictions (see <abbrgrp>
					<abbr bid="B18">18</abbr>
				</abbrgrp>). Indeed, even recent tools such as BLAT <abbrgrp>
					<abbr bid="B19">19</abbr>
				</abbrgrp> produce erroneous alignments when used for EST-genome comparison as observed in <abbrgrp>
					<abbr bid="B17">17</abbr>
				</abbrgrp> and require further corrections to the alignments produced. For example BLAT tends to create many small gaps in the alignment in cases of low sequence quality.</p>
			<p>Through a combined analysis of all EST data and their genomic alignments our heuristic method aims to reduce over predictions of splice sites due to EST sequence errors or erroneous single EST alignments. This goal is achieved by minimizing the set of splice sites that is compatible with a multiple alignments of all transcript data. This approach overcomes the limitations of methods that (incorrectly) assume independency of single transcript-genome alignments. Indeed, tools based on independent single EST alignments (for example, Spidey <abbrgrp>
					<abbr bid="B14">14</abbr>
				</abbrgrp> and Squall <abbrgrp>
					<abbr bid="B20">20</abbr>
				</abbrgrp>) may produce false splice forms that would not be supported by a combined multiple alignment of all ESTs against the genomic sequence.</p>
		</sec>
		<sec>
			<st>
				<p>Implementation</p>
			</st>
			<sec>
				<st>
					<p>Methods</p>
				</st>
				<p>Our method is based on the formalization of the problem of detecting splice sites as an optimization problem (Multiple EST Factorization Compatibility, MEFC) as proposed in <abbrgrp>
						<abbr bid="B15">15</abbr>
					</abbrgrp>: it implements an heuristic that extends &#8211; and greatly improves &#8211; a basic algorithmic approach proposed by the same authors in <abbrgrp>
						<abbr bid="B15">15</abbr>
					</abbrgrp>. An evident shortcoming of computational methods to predict splice sites is represented by the large number of false positive predictions produced by these methods. To overcome this limitation, we propose that an optimization criterion may be required to construct a multiple transcript alignment: the objective function of such a criterion is to minimize the number of exon predictions and hence of alignment-inferred splice sites. There is theoretical evidence for this assumption which is also supported by several real cases encountered while analyzing EST alignments. Indeed, such an optimization criterion is required when there are multiple possible adequate alignments of an EST region (or candidate exon) to the genomic sequence, even when restrictive rules are used (i.e. <it>GT </it>&#8211; <it>AG </it>splice sites) to restrict the alignment to biologically plausible solutions. The use of the optimization criterion, the combined EST analysis and the fact that our method is entirely based on a novel alignment procedure all differentiate our approach from those previously presented. The method we propose here is also different from the ones suggested in <abbrgrp>
						<abbr bid="B21">21</abbr>
					</abbrgrp> and <abbrgrp>
						<abbr bid="B11">11</abbr>
					</abbrgrp> where a combined analysis of EST alignments is done after all EST alignments have been generated. The method we propose also aims to reduce the computational time as in <abbrgrp>
						<abbr bid="B20">20</abbr>
					</abbrgrp>, while retaining a high accuracy of predictions. It is specifically designed to process a whole gene and large number of ESTs &#8211; the databases currently contain about 6 millions human ESTs and the number is growing rapidly. As shown in <abbrgrp>
						<abbr bid="B20">20</abbr>
					</abbrgrp>, computational times for a single EST alignment may range from a fraction of a second to the several seconds required by programs such as sim4 <abbrgrp>
						<abbr bid="B22">22</abbr>
					</abbrgrp>.</p>
				<p>The software tool ASPIC (Alternative Splicing PredICtion) has been designed and implemented in a user-friendly web-server accepting as input a gene sequence and transcript data, typically a Unigene cluster related to the gene. Major features of ASPIC include its applicability to the analysis of splice variants in several organisms, and the fact that it collects together several sources of information on splice sites in a single web-based tool.</p>
				<p>ASPIC also provides a minimal set of transcript isoforms explaining all alternative splice events occurring among the set of transcripts considered. Furthermore, it includes a module for detecting and scoring splice junctions (canonical and non-canonical) by using quality measures based on <abbrgrp>
						<abbr bid="B18">18</abbr>
					</abbrgrp> and <abbrgrp>
						<abbr bid="B23">23</abbr>
					</abbrgrp>. An extensive benchmark comparison of ASPIC with respect to other similar tools <abbrgrp>
						<abbr bid="B24">24</abbr>
						<abbr bid="B25">25</abbr>
					</abbrgrp> shows that our method calculates the location of splice sites with high sensitivity and accuracy but still retaining an high computational efficiency such that in <abbrgrp>
						<abbr bid="B20">20</abbr>
					</abbrgrp>. Remarkably, ASPIC differently from <abbrgrp>
						<abbr bid="B20">20</abbr>
					</abbrgrp> combines EST alignment to splice site prediction.</p>
			</sec>
			<sec>
				<st>
					<p>Algorithm overview</p>
				</st>
				<p>In the following, we will use the term <it>EST </it>to denote a transcript and <it>genomic sequence </it>to refer to a gene related to a set of transcripts. We will use <it>G </it>to denote a genomic sequence, that is, a sequence over alphabet &#931; = {A, C, G, T} &#8746; {N}, with N denoting any nucleotide. Genomic sequences containing sequence repeats or short exons may be alignable to the same EST sequence in a number of equally probable ways. This fact further complicates the problem of identifying the correct exon-intron structure. However, it is reasonable to assume that a correct exon-intron structure can be obtained by aligning all EST sequences so that regions that are common to different ESTs are aligned to the same region of the gene. This assumption leads to the framing of the problem of predicting gene structure from a set of ESTs as an optimization problem as introduced in <abbrgrp>
						<abbr bid="B15">15</abbr>
					</abbrgrp> with the MEFC problem (Minimum EST Factorization Compatible with a genomic sequence). In this context, the gene structure prediction problem has an instance consisting of a set of EST sequences and a genomic sequence: the question is to compute the constitutive exons of the genomic sequence and the factorization of each EST into such genomic exons with the objective of minimizing the number of predicted exons.</p>
				<p>In fact, as illustrated in the examples below, a minimum length exon-factorization of a genomic sequence would forbid multiple unsupported EST alignments. However, with real data, situations frequently occur where multiple EST alignments are generated and additional criteria to find an exon-factorization are required, thus justifying (as discussed in the following sections) the use of the optimization criterion in our method.</p>
				<p>1. Terminal EST factors may be short (10&#8211;30 bp in length) and may have multiple plausible alignments to the genomic sequence, particularly when the EST sequence contains errors.</p>
				<p>2. Part of a factor may be repeated along the genomic sequence. A theoretical example of this situation, and how optimization may be used to find correct predictions, is reported in Fig <figr fid="F1">1</figr>. <supplr sid="S1">Additional file 1</supplr> illustrates a specific example of this situation, occurring in the Unigene cluster related to the human AMY2A gene.</p>
				<fig id="F1">
					<title>
						<p>Figure 1</p>
					</title>
					<caption>
						<p>The figure illustrates two gene-factorizations into 7 and 4 pseudo-exons of the genomic sequence <it>G</it></p>
					</caption>
					<text>
						<p>The figure illustrates two gene-factorizations into 7 and 4 pseudo-exons of the genomic sequence <it>G</it>. Let <it>S</it><sub>1</sub>, <it>S</it><sub>2 </sub>and <it>S</it><sub>3 </sub>be EST sequences in S agreeing to the genomic sequence <it>G</it>, where sequence <it>S</it><sub>1 </sub>= <it>ABDEF</it>, <it>S</it><sub>2 </sub>= <it>ABCDE </it>and <it>S</it><sub>3 </sub>= <it>BDEFG</it>, each letter in {<it>A</it>, <it>B</it>, <it>C</it>, <it>D</it>, <it>E</it>, <it>F</it>, <it>G</it>} denotes a sequence (A). In (B) and (C) two alternative EST-genome alignments of sequences <it>S</it><sub>1</sub>, <it>S</it><sub>2 </sub>and <it>S</it><sub>3 </sub>are represented: each EST factorization of <it>S</it><sub><it>i </it></sub>associated with the EST-genome alignment is shadowed. Pseudo-exons in the gene-factorization are colored white, while introns are in grey. Segments labelled by letters represent regions of the genomic sequence that align to a substring of the input sequence of the corresponding letter. Note that an approach that aligns independently each sequence <it>S</it><sub>1</sub>, <it>S</it><sub>2 </sub>and <it>S</it><sub>3 </sub>to <it>G</it>, one after the other, may produce the gene-factorization &lt;<it>A</it>, <it>B</it>, <it>C</it>, <it>D</it>, <it>F</it>, <it>E</it>, <it>G</it>> consisting of 7 pseudo-exons (B), while the one minimizing the number of pseudo-exons provides only 4 pseudo-exons (C). Indeed, there are EST factorizations of each <it>S</it><sub><it>i </it></sub>that are compatible or variant compatible with the gene-factorization <it>G</it><sub><it>E </it></sub>= &lt;<it>AB</it>, <it>C</it>, <it>DE</it>, <it>FG</it>>. More precisely, &lt;<it>AB</it>, <it>DE</it>, <it>F</it>> is an EST-factorization of <it>S</it><sub>1 </sub>that is compatible to <it>G</it><sub><it>E</it></sub>. Then &lt;<it>AB</it>, <it>C</it>, <it>DE</it>> is an EST-factorization of <it>S</it><sub>2 </sub>compatible to <it>G</it><sub><it>E</it></sub>. Finally, &lt;<it>B</it>, <it>DE</it>, <it>FG</it>> is an EST-factorization of <it>S</it><sub>3 </sub>compatible with <it>G</it><sub><it>E </it></sub>(C).</p>
					</text>
					<graphic file="1471-2105-6-244-1"/>
				</fig>
				<suppl id="S1">
					<title>
						<p>Additional File 1</p>
					</title>
					<text>
						<p>Splicing site prediction with and without the optimization strategy.</p>
					</text>
					<file name="1471-2105-6-244-S1.pdf">
						<p>Click here for file</p>
					</file>
				</suppl>
				<p>3. Short repeats may occur in the genomic sequence and EST sequences may contain errors near splice junctions.</p>
			</sec>
			<sec>
				<st>
					<p>The MEFC problem: definition</p>
				</st>
				<p>In the following we introduce some basic notions that allow us to define the MEFC problem and describe the method we propose to face it.</p>
				<p>We recall that there are four main patterns of alternative splicing that potentially may occur in nature <abbrgrp>
						<abbr bid="B2">2</abbr>
					</abbrgrp>:</p>
				<p>1) exon-skipping; 2) mutually exclusive exons; 3) competing 5'/3' ends; and 4) intron retention. While the first two splicing modes simply determine whether an exon is used or not during splicing, in the third mode the transcript <it>splicing variants </it>derive from competing partially overlapping exons. Finally, intron retention occurs when an exon is present in a transcript, while in another it appears with a missing internal region.</p>
				<p>Then, a <it>gene factorization G</it><sub><it>E </it></sub>of <it>G </it>is a sequence &lt;<it>f</it><sub>1</sub>, ..., <it>f</it><sub><it>n</it></sub>> of <it>n </it>substrings <it>f</it><sub><it>i </it></sub>of <it>G</it>, we define <it>pseudo-exons</it>, such that <it>G </it>is given by the concatenation of the pseudo-exons <it>f</it><sub><it>i </it></sub>interspersed by other substrings called <it>introns</it>. In particular, a pseudo-exon defines a contiguous genome region corresponding to and/or containing one or more exon splice variants.</p>
				<p>An <it>EST factorization </it>of an EST sequence <it>S </it>is an ordered sequence &lt;<it>s</it><sub>1</sub>, <it>s</it><sub>2</sub>, ..., <it>s</it><sub><it>k</it></sub>> such that <it>S </it>= <it>s</it><sub>1</sub><it>s</it><sub>2 </sub>... <it>s</it><sub><it>k</it></sub>, where each substring <it>s</it><sub><it>i </it></sub>is called a <it>factor </it>of the EST <it>S</it>. The <it>edit distance ed</it>(<it>x</it>, <it>y</it>) between two sequences <it>x </it>and <it>y </it>measures the number of mismatches in the alignment of <it>x </it>and <it>y</it>.</p>
				<p>We define an EST factorization &lt;<it>s</it><sub>1</sub>, <it>s</it><sub>2</sub>, ..., <it>s</it><sub><it>k</it></sub>> <it>compatible with </it>a gene-factorization <it>G</it><sub><it>E </it></sub>of a genomic sequence <it>G </it>if there exists a sequence of genomic pseudo-exons <m:math name="1471-2105-6-244-i1" xmlns:m="http://www.w3.org/1998/Math/MathML">
						<m:semantics>
							<m:mrow>
								<m:msub>
									<m:mi>f</m:mi>
									<m:mrow>
										<m:msub>
											<m:mi>i</m:mi>
											<m:mn>1</m:mn>
										</m:msub>
									</m:mrow>
								</m:msub>
								<m:mo>,</m:mo>
								<m:msub>
									<m:mi>f</m:mi>
									<m:mrow>
										<m:msub>
											<m:mi>i</m:mi>
											<m:mn>2</m:mn>
										</m:msub>
									</m:mrow>
								</m:msub>
								<m:mo>,</m:mo>
								<m:msub>
									<m:mi>f</m:mi>
									<m:mrow>
										<m:msub>
											<m:mi>i</m:mi>
											<m:mn>3</m:mn>
										</m:msub>
									</m:mrow>
								</m:msub>
								<m:mo>,</m:mo>
								<m:mo>&#8230;</m:mo>
								<m:mo>,</m:mo>
								<m:msub>
									<m:mi>f</m:mi>
									<m:mrow>
										<m:msub>
											<m:mi>i</m:mi>
											<m:mi>k</m:mi>
										</m:msub>
									</m:mrow>
								</m:msub>
							</m:mrow>
							<m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafeart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGMbGzdaWgaaWcbaGaemyAaK2aaSbaaWqaaiabigdaXaqabaaaleqaaOGaeiilaWIaemOzay2aaSbaaSqaaiabdMgaPnaaBaaameaacqaIYaGmaeqaaaWcbeaakiabcYcaSiabdAgaMnaaBaaaleaacqWGPbqAdaWgaaadbaGaeG4mamdabeaaaSqabaGccqGGSaalcqWIMaYscqGGSaalcqWGMbGzdaWgaaWcbaGaemyAaK2aaSbaaWqaaiabdUgaRbqabaaaleqaaaaa@41F0@</m:annotation>
						</m:semantics>
					</m:math> of <it>G </it>such that for each factor <it>s</it><sub><it>j</it></sub>, with 2 &#8804; <it>j </it>&#8804; <it>k </it>- 1, <it>ed</it>(<it>s</it><sub><it>j</it></sub>, <m:math name="1471-2105-6-244-i2" xmlns:m="http://www.w3.org/1998/Math/MathML">
						<m:semantics>
							<m:mrow>
								<m:msub>
									<m:mi>f</m:mi>
									<m:mrow>
										<m:msub>
											<m:mi>i</m:mi>
											<m:mi>j</m:mi>
										</m:msub>
									</m:mrow>
								</m:msub>
							</m:mrow>
							<m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGMbGzdaWgaaWcbaGaemyAaK2aaSbaaWqaaiabdQgaQbqabaaaleqaaaaa@311D@</m:annotation>
						</m:semantics>
					</m:math>
) is bounded by a given parameter <it>bound</it>, factors <it>s</it><sub>1 </sub>and <it>s</it><sub><it>k </it></sub>differ from a suffix of pseudo-exon <m:math name="1471-2105-6-244-i3" xmlns:m="http://www.w3.org/1998/Math/MathML">
						<m:semantics>
							<m:mrow>
								<m:msub>
									<m:mi>f</m:mi>
									<m:mrow>
										<m:msub>
											<m:mi>i</m:mi>
											<m:mn>1</m:mn>
										</m:msub>
									</m:mrow>
								</m:msub>
							</m:mrow>
							<m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGMbGzdaWgaaWcbaGaemyAaK2aaSbaaWqaaiabigdaXaqabaaaleqaaaaa@30B0@</m:annotation>
						</m:semantics>
					</m:math>
 and a prefix of <m:math name="1471-2105-6-244-i4" xmlns:m="http://www.w3.org/1998/Math/MathML">
						<m:semantics>
							<m:mrow>
								<m:msub>
									<m:mi>f</m:mi>
									<m:mrow>
										<m:msub>
											<m:mi>i</m:mi>
											<m:mi>k</m:mi>
										</m:msub>
									</m:mrow>
								</m:msub>
							</m:mrow>
							<m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGMbGzdaWgaaWcbaGaemyAaK2aaSbaaWqaaiabdUgaRbqabaaaleqaaaaa@311F@</m:annotation>
						</m:semantics>
					</m:math>
, respectively, by a number of alignment mismatches bounded by <it>bound</it>.</p>
				<p>Because of alternative splicing, we further provide the notion of EST factorization <it>variant compatible </it>with a gene-factorization <it>G</it><sub><it>E</it></sub>. This is simply obtained by requiring in the previous notion that <it>ed</it>(<it>s</it><sub><it>j</it></sub>, <it>factor</it>(<m:math name="1471-2105-6-244-i2" xmlns:m="http://www.w3.org/1998/Math/MathML">
						<m:semantics>
							<m:mrow>
								<m:msub>
									<m:mi>f</m:mi>
									<m:mrow>
										<m:msub>
											<m:mi>i</m:mi>
											<m:mi>j</m:mi>
										</m:msub>
									</m:mrow>
								</m:msub>
							</m:mrow>
							<m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGMbGzdaWgaaWcbaGaemyAaK2aaSbaaWqaaiabdQgaQbqabaaaleqaaaaa@311D@</m:annotation>
						</m:semantics>
					</m:math>
)) is bounded by a given parameter <it>bound</it>, where <it>factor </it>(<m:math name="1471-2105-6-244-i2" xmlns:m="http://www.w3.org/1998/Math/MathML">
						<m:semantics>
							<m:mrow>
								<m:msub>
									<m:mi>f</m:mi>
									<m:mrow>
										<m:msub>
											<m:mi>i</m:mi>
											<m:mi>j</m:mi>
										</m:msub>
									</m:mrow>
								</m:msub>
							</m:mrow>
							<m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGMbGzdaWgaaWcbaGaemyAaK2aaSbaaWqaaiabdQgaQbqabaaaleqaaaaa@311D@</m:annotation>
						</m:semantics>
					</m:math>
) is a prefix, suffix or even a proper factor of the pseudo-exon <m:math name="1471-2105-6-244-i2" xmlns:m="http://www.w3.org/1998/Math/MathML">
						<m:semantics>
							<m:mrow>
								<m:msub>
									<m:mi>f</m:mi>
									<m:mrow>
										<m:msub>
											<m:mi>i</m:mi>
											<m:mi>j</m:mi>
										</m:msub>
									</m:mrow>
								</m:msub>
							</m:mrow>
							<m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGMbGzdaWgaaWcbaGaemyAaK2aaSbaaWqaaiabdQgaQbqabaaaleqaaaaa@311D@</m:annotation>
						</m:semantics>
					</m:math>.</p>
				<p>An EST factor <it>s</it><sub><it>j</it></sub>, corresponding to a gene exon <it>factor</it>(<m:math name="1471-2105-6-244-i2" xmlns:m="http://www.w3.org/1998/Math/MathML">
						<m:semantics>
							<m:mrow>
								<m:msub>
									<m:mi>f</m:mi>
									<m:mrow>
										<m:msub>
											<m:mi>i</m:mi>
											<m:mi>j</m:mi>
										</m:msub>
									</m:mrow>
								</m:msub>
							</m:mrow>
							<m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGMbGzdaWgaaWcbaGaemyAaK2aaSbaaWqaaiabdQgaQbqabaaaleqaaaaa@311D@</m:annotation>
						</m:semantics>
					</m:math>) is defined as internal or external depending on whether both donor and acceptor splices are or are not present, respectively at its genome boundaries after alignment. Thus, factors <it>s</it><sub>1</sub>, <it>s</it><sub><it>k </it></sub>of the EST factorization &lt;<it>s</it><sub>1</sub>, <it>s</it><sub>2</sub>, ..., <it>s</it><sub><it>k</it></sub>> are called <it>external factors </it>while <it>s</it><sub>2</sub>, ..., <it>s</it><sub><it>k</it>-1 </sub>are called <it>internal factors</it>.</p>
				<p>In other words, an EST factorization is induced by an alignment of the EST to exons of the genomic sequence. Each EST factor must correspond or align to an exon. The external EST factors can correspond to a fragment (a prefix or a suffix) of the relative exons.</p>
				<p>By using the above stated notions, the MEFC problem is defined as follows. The instance of the problem consists of a genomic sequence <it>G </it>and a set of EST sequences (transcripts), while a solution consists of one gene-factorization <it>G</it><sub><it>E </it></sub>of <it>G </it>and EST factorizations that are compatible or variant compatible with <it>G</it><sub><it>E</it></sub>. Thus an <it>optimal </it>solution in the MEFC problem (that is an optimal gene-factorization and optimal compatible EST factorizations) is the one that minimizes the number of distinct pseudo-exons in the gene-factorization of the genomic sequence.</p>
			</sec>
			<sec>
				<st>
					<p>Generation of nearly optimal compatible genome-EST alignments</p>
				</st>
				<p>The ASPIC software implements an heuristic method for the MEFC problem stated before.</p>
				<p>The general structure of the method consists of:</p>
				<p>(a) an initial pre-processing of the genomic sequence,</p>
				<p>(b) two main procedural phases applying criteria to minimize splice sites.</p>
				<p>In the following we provide a detailed description of the method by first describing the initial pre-processing phase and then the main algorithmic steps of the two phases.</p>
				<sec>
					<st>
						<p>Pre-processing of the genomic sequence</p>
					</st>
					<p>The alignment of a single EST factor to the genomic sequence is based on the notion of a <it>component</it>: a <it>component </it>is a substring of the genomic sequence that perfectly matches a portion of an EST factor. The length of a component is a critical parameter used to accelerate the alignment of EST factors as well as for finding error-free matching regions between ESTs and the genomic sequence. Indeed, components of a given length (for example 15 bp) may have very few occurrences on a genomic sequence, thus making the process of locating EST factors very fast. For this reason, the length of a component is computed automatically as a function of the gene sequence length, but it can be also modified by the user as an input parameter. The algorithm starts with an initial pre-processing of the genomic sequence <it>G </it>that consists in building a hash-table containing all occurrences of each component in <it>G</it>. Thus a key list of components (i.e. substrings of the genome) provides the entry of a Hash Table used to speed up the alignment process of an EST factor to the genomic sequence. Since the algorithm locates the intron regions by validating the splice sites using first the <it>GT-AG </it>rule, a second hash-table for all <it>GT </it>and <it>AG </it>occurrences on the genomic sequence, is initially computed and stored.</p>
				</sec>
				<sec>
					<st>
						<p>Phase 1: iterative computation of all EST internal factors</p>
					</st>
					<p>The first phase is an iterative processing of each EST in the set <it>S </it>= {<it>S</it><sub>1</sub>, ..., <it>S</it><sub><it>m</it></sub>} such that the general <it>i </it>iteration produces an alignment of each EST in the set {<it>S</it><sub>1</sub>, ..., <it>S</it><sub><it>i</it></sub>} compatible with a partial gene-factorization of <it>G </it>&#8211; the generation of an EST alignment against the genomic sequence implying an EST factorization. The generic step of the iteration in our algorithm consists of finding the next factor <it>s</it><sub><it>j </it></sub>of a partial EST factorization &lt;<it>s</it><sub>2</sub>, ..., <it>s</it><sub><it>j</it>-1</sub>> and the corresponding exon along the genomic sequence. In this phase the EST-factorization is produced using a criterion, called <it>concatenating exons</it>, to minimize the number of exons. This criterion consists of concatenating two or more consecutive EST factors into a unique exon whenever a true exon may have been over factorized because of repeated regions in the genomic sequence (see as an example Figure <figr fid="F1">1</figr>).</p>
					<p>More precisely, given the alignment of the internal factors &lt;<it>s</it><sub>2</sub>, ..., <it>s</it><sub><it>i</it></sub>> of an EST, then the genomic alignment of a new EST factor <it>s</it><sub><it>i</it>+1 </sub>is computed in four main steps.</p>
					<p>In step (1) the EST suffix to be aligned after factor <it>s</it><sub><it>i </it></sub>is divided into consecutive strings <it>x</it><sub>1</sub>, <it>x</it><sub>2</sub>, ..., <it>x</it><sub><it>n </it></sub>of the predefined length of a component. Indeed, the first possible genomic location of EST factor <it>s</it><sub><it>i</it>+1 </sub>is determined by finding the leftmost string <it>x</it><sub><it>j </it></sub>of the EST suffix that is a component and allows the optimal alignment of the entire EST factor s<sub><it>i</it>+<it>i </it></sub>(see Fig. <figr fid="F2">2(a), (b)</figr>). In step (2), for each occurrence of a component <it>x</it><sub><it>j </it></sub>along the genomic sequence, a genomic region of maximal length containing <it>x</it><sub><it>j </it></sub>is optimally aligned in linear time and space (using the edit-distance within a Kband <abbrgrp>
							<abbr bid="B26">26</abbr>
						</abbrgrp>) to the new EST factor <it>s</it><sub><it>i</it>+1</sub>, until a compatible alignment is found (i.e. few errors are allowed and possibly canonical splice sites are located). Note that step two may fail to compute the new EST factor <it>s</it><sub><it>i</it>+1</sub>, whenever the previous EST internal factors &lt;<it>s</it><sub>2</sub>, ..., <it>s</it><sub><it>i</it></sub>> do not allow the generation of an EST-factorization compatible with the partially computed gene-factorization. Indeed, some EST factors may have been incorrectly computed because of a wrong alignment of the EST sequence. <it>Backtracking </it>allows the relocation of exons. This consists of trying alternative occurrences in the genomic sequence of components of previous factors starting from <it>s</it><sub><it>i </it></sub>up to <it>s</it><sub>2</sub>.</p>
					<fig id="F2">
						<title>
							<p>Figure 2</p>
						</title>
						<caption>
							<p>Location of a new EST internal factor <it>s</it><sub><it>i</it>+1 </sub>given previous computed factors <it>s</it><sub>2</sub>, ..., <it>s</it><sub><it>i</it></sub></p>
						</caption>
						<text>
							<p>Location of a new EST internal factor <it>s</it><sub><it>i</it>+1 </sub>given previous computed factors <it>s</it><sub>2</sub>, ..., <it>s</it><sub><it>i</it></sub>. (a) Consecutive sequence components <it>c</it><sub>1 </sub>... <it>c</it><sub><it>j </it></sub>are tested to find the first one that allows the identification of a genomic region that optimally aligns factor <it>s</it><sub><it>i</it>+1 </sub>(i.e. alignment extension on one or both sides of the component): such a region is determined in (b) by the component <it>c</it><sub><it>j</it></sub>. Figure (b) shows that some intervening positions (sequence x) may occur between factor <it>s</it><sub><it>i </it></sub>and <it>s</it><sub><it>i</it>+1</sub>. Indeed, in this case the placement of <it>s</it><sub><it>i</it>+1 </sub>gives the correct right end of previous factor <it>s</it><sub><it>i</it></sub>, since the larger factor <graphic file="1471-2105-6-244-i5.gif"/> inducing canonical splice sites on the genomic sequence can be optimally aligned before <it>s</it><sub><it>i</it>+1 </sub>thus leading to an optimal location of both <it>s</it><sub><it>i </it></sub>and <it>s</it><sub><it>i</it>+1</sub>.</p>
						</text>
						<graphic file="1471-2105-6-244-2"/>
					</fig>
					<p>Once the location of factor <it>s</it><sub><it>i</it>+1 </sub>is determined, the <it>concatenating exon </it>criterion is applied in step (3) which consists of testing whether one or more consecutive EST factors preceding factor <it>s</it><sub><it>i</it>+1 </sub>can be concatenated to <it>s</it><sub><it>i</it>+1 </sub>to obtain a unique factor <it>s </it>such that it optimally aligns to the genomic sequence. In this case, <it>s </it>replaces a list of consecutive EST factors, thus minimizing the number of exonic regions in the gene-factorization (see for example exons <it>AB </it>and <it>DE </it>in Figure 1(C) produced by the application of concatenating exon criterion to <it>A </it>and <it>B </it>first, and then to <it>D </it>and <it>E</it>). Clearly, after the minimization, the new EST factor <it>s</it><sub><it>i</it>+1 </sub>as well as previous factor <it>s</it><sub><it>i </it></sub>are redefined so that the EST alignments define a smaller number of exons.</p>
					<p>Finally, in step (4), a dynamic programming (DP) algorithm is used to refine the intron boundaries between the defined EST factors <it>s</it><sub><it>i </it></sub>and <it>s</it><sub><it>i</it>+1</sub>. This crucial step of the algorithm is detailed in the next section <it>Refining intron boundaries</it>.</p>
					<p>Observe that the location of a new EST factor <it>s</it><sub><it>i</it>+1 </sub>is based on the use of a single component (that is a perfect matching region) and that such a component is located on the factor by testing consecutive positions in the EST suffix after factor <it>s</it><sub><it>i</it></sub>. This approach may imply that several positions after the right end of EST factor <it>s</it><sub><it>i </it></sub>are skipped before placing the left end of the new factor <it>s</it><sub><it>i</it>+1</sub>. Indeed, in such cases the placement of factor <it>s</it><sub><it>i</it>+1 </sub>may imply an extension (or a reduction) of the right end of previous factor <it>s</it><sub><it>i </it></sub>thus optimizing exon definition (see Fig. <figr fid="F2">2(c)</figr>). This strategy makes the alignment process more flexible and faster with reference to other approaches (such as BLAT <abbrgrp>
							<abbr bid="B19">19</abbr>
						</abbrgrp>) that apply strict matching criteria.</p>
					<p>Indeed a feature of ASPIC alignment algorithm is that it allows a fast exact location of the alignment regions of EST factors without necessarily comparing all EST sequences against large portions of the genomic sequence. Consequently, ASPIC also allows EST alignment in the presence of a relatively high number of errors that are located in specific regions. Moreover, even though the alignment process relies on dynamic programming (DP) it turns out to be very fast in most of the cases, as indeed DP is only applied to short portions of the EST and genome sequence.</p>
				</sec>
				<sec>
					<st>
						<p>Phase 2: refining internal factors and placing external factors</p>
					</st>
					<p>This phase of the algorithm completes the computation of all EST factorizations (i.e. EST alignments) by first correcting all internal EST factors pre-computed in the first phase in order to make all factorizations compatible with the same gene-factorization <it>G</it><sub><it>E </it></sub>of <it>G </it>minimizing the number of splice sites. More precisely, the minimization relies on the use of a criterion called <it>merging splice sites</it>. Merging splice sites consists of comparing computed exons <it>x </it>and <it>y </it>supported by EST factors to reduce the intron boundary of <it>x </it>to the one of <it>y </it>or vice versa, whenever they differ at only a few positions, likely because of sequencing errors in the EST factors (see an example in Fig. <figr fid="F3">3</figr>). Clearly, this step may avoid over prediction of splice sites due to the erroneous location of intron boundaries because of sequencing errors. This criterion is also implemented to allow the detection of possibly true splice variants determined by competing 3' or 5' junctions induced by few bases (two bases or more).</p>
					<fig id="F3">
						<title>
							<p>Figure 3</p>
						</title>
						<caption>
							<p>Example of intron detection in the human ATP1B1 (UG:Hs.291196) gene without (A) or with (B) the refinement of exon-intron boundaries</p>
						</caption>
						<text>
							<p>Example of intron detection in the human ATP1B1 (UG:Hs.291196) gene without (A) or with (B) the refinement of exon-intron boundaries. The first row shows the genomic sequence aligned to the EST sequences (below). In (A) four different introns are detected (A, B, C, D) that can be merged to only two (A, D) in B. Absolute coordinate (NCBI 35 assembly) are shown for each intron and acceptor/donor splice sites are in black-background.</p>
						</text>
						<graphic file="1471-2105-6-244-3"/>
					</fig>
					<p>Finally, after the localization of EST internal factors, all EST external factors are computed. The <it>concatenating exons </it>and <it>merging splice sites </it>criteria are used again since errors in EST sequences are more prevalent in terminal regions, which may be as short as few bases &#8211; thus permitting several alternative alignments. The procedure that finds external EST factors tries to align the EST leftmost (or rightmost) factor as a suffix (or a prefix) of some previously computed exon. If that is not possible, the factor is placed in a new location in correspondence with a <it>GT </it>(or <it>AG</it>) pattern and then the DP algorithm is used again to refine intron boundaries.</p>
				</sec>
			</sec>
			<sec>
				<st>
					<p>Refining exon-intron boundaries</p>
				</st>
				<p>Because of sequence repeats and sequencing errors in ESTs, the exact location of splice junctions is a critical issue <abbrgrp>
						<abbr bid="B27">27</abbr>
					</abbrgrp>. Our method combines different strategies to evaluate and hence improve the quality of splice data produced. These are listed below:</p>
				<p>1. <it>Finding intron boundaries via dynamic programming</it>. A first criterion used to find the exact location of intron boundaries is the evaluation of alignment quality. We have designed an algorithm, based on dynamic programming (DP), to produce optimal alignments of regions close to splice sites. It computes the genomic alignment of a suffix <it>w </it>and a prefix <it>y </it>of two consecutive EST factors, <it>s</it><sub><it>i </it></sub>and <it>s</it><sub><it>i</it>+1</sub>, in order to locate in the genomic sequence the optimal position for a <it>single large gap </it>corresponding to the intron region. This gap may not be delimited by canonical splice sites following the <it>GT </it>&#8211; <it>AG </it>rule, which is recognized as a basic one for the validation of splice sites, as more than 98.7% annotated splice sites in GenBank are canonical in this respect <abbrgrp>
						<abbr bid="B18">18</abbr>
					</abbrgrp>. Indeed, there may be different optimal alignments leaving a gap with the same error rate. Thus a second important algorithmic step is applied by ASPIC to locate splice sites.</p>
				<p>2. <it>Canonical patterns and weight matrices</it>. Whenever the optimal alignment computed via DP does not lead to canonical splice junctions, then the algorithm looks for alternative alignments with the same error rate with preference for the couple of splice boundaries more frequently represented in the weight matrix provided in <abbrgrp>
						<abbr bid="B18">18</abbr>
					</abbrgrp> (see Table <tblr tid="T2">2</tblr> in <abbrgrp>
						<abbr bid="B18">18</abbr>
					</abbrgrp>). If different alignments of the same quality (i.e. number of errors) are possible near intron boundaries, the choice of the alignment is done by using the weight matrix. For example, the base-pairs GC-AG are selected before the pair AT-AG if compatible with an alignment of splice sites leaving the same number of errors, as GC-AG is more frequent than AT-AG in the weight matrix. Clearly, an high quality alignment may also lead to the acceptance of splice sites with null frequency in <abbrgrp>
						<abbr bid="B18">18</abbr>
					</abbrgrp> matrices.</p>
				<tbl id="T2">
					<title>
						<p>Table 2</p>
					</title>
					<caption>
						<p>Splice sites in known and novel ASPIC-predicted introns</p>
					</caption>
					<tblbdy cols="5">
						<r>
							<c ca="left">
								<p>Splice Site</p>
							</c>
							<c cspan="2" ca="left">
								<p>Known introns</p>
							</c>
							<c cspan="2" ca="left">
								<p>Novel introns</p>
							</c>
						</r>
						<r>
							<c cspan="5">
								<hr/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><it>N</it></p>
							</c>
							<c ca="left">
								<p><it>%</it></p>
							</c>
							<c ca="left">
								<p><it>N</it></p>
							</c>
							<c ca="left">
								<p><it>%</it></p>
							</c>
						</r>
						<r>
							<c cspan="5">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>GT-AG</p>
							</c>
							<c ca="left">
								<p>897</p>
							</c>
							<c ca="left">
								<p>98.14</p>
							</c>
							<c ca="left">
								<p>57</p>
							</c>
							<c ca="left">
								<p>60.64</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>GC-AG</p>
							</c>
							<c ca="left">
								<p>8</p>
							</c>
							<c ca="left">
								<p>0.77</p>
							</c>
							<c ca="left">
								<p>15</p>
							</c>
							<c ca="left">
								<p>15.96</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>GT-other</p>
							</c>
							<c ca="left">
								<p>3</p>
							</c>
							<c ca="left">
								<p>0.33</p>
							</c>
							<c ca="left">
								<p>5</p>
							</c>
							<c ca="left">
								<p>5.10</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>other-AG</p>
							</c>
							<c ca="left">
								<p>4</p>
							</c>
							<c ca="left">
								<p>0.44</p>
							</c>
							<c ca="left">
								<p>13</p>
							</c>
							<c ca="left">
								<p>13.27</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>other-other</p>
							</c>
							<c ca="left">
								<p>3</p>
							</c>
							<c ca="left">
								<p>0.33</p>
							</c>
							<c ca="left">
								<p>4</p>
							</c>
							<c ca="left">
								<p>4.08</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><it>Total</it></p>
							</c>
							<c ca="left">
								<p>915</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>94</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>Number (N) and percentage (%) of splice site types in known and novel ASPIC-predicted introns.</p>
					</tblfn>
				</tbl>
				<p>Actually, the presence of sequencing errors may often complicate the location of the correct splice sites junctions. For these reasons, the use of agreement criteria among EST alignments turns out to be crucial in many practical cases to detect highly confirmed splice junctions and thus to correct ambiguous alignments.</p>
				<p>Moreover, in order to evaluate the quality of splice sites we annotate each detected splice site, either donor or acceptor, with a consensus sequence and a score: the score derives from the formula and tabular nucleotide frequencies reported in <abbrgrp>
						<abbr bid="B23">23</abbr>
					</abbrgrp>. Indeed, conserved splice sequences provide further evidence for splice junctions.</p>
				<p>3. <it>Congruence of ESTs on the location of splice sites</it>. Since the merging splice site criterion discussed in the previous section is based on a combined analysis of all EST factorizations, it is crucial also for validating intron boundaries. Indeed, by comparing EST factors it is possible to discover sequencing errors in ESTs that show that some intron boundaries must be considered as coincident if few errors are tolerated (typically at most one error for each splice site) or even by shifting the location of canonical splice sites. For example, in many cases the GT-AG rule may be applied to locate an EST factor boundary in two very close locations of the genomic sequence, thus making the choice of the alignment near intron boundaries for a single EST difficult. In these cases, an independent EST alignment does not allow the determination of the EST splice sites, while the presence of other EST factorizations having a better quality alignment to the genomic sequence may solve the aforementioned dilemma because of the common compatibility to the exon-intron structure. This situation is detailed in the example shown in Fig. <figr fid="F3">3</figr>.</p>
				<p>4. <it>Filtering artifacts and locating gene strand</it>. Our implementation has automatic procedures to locate the strand from which each EST originates (independently from the cluster annotation) and a filtering of possible artifacts and polyA ends. Moreover, EST alignments of poor quality are filtered out based on several criteria, including a percentage of sequence identity below the fixed cutoff.</p>
				<p>As an example, Figure <figr fid="F3">3</figr> reports the optimal alignments of ESTs close to intron boundaries illustrating the need for specific criteria to locate all plausible intron boundaries. The basic criterion is the congruence of ESTs near splice sites, combined with the use of known frequencies of splice patterns (see <abbrgrp>
						<abbr bid="B18">18</abbr>
					</abbrgrp>). ATP1B1 introns B and C (Fig. <figr fid="F3">3A</figr>) can disappear by merging them to intron A (confirmed by a large number of ESTs) after the introduction of a A-insertion or of a C-deletion in the relative alignments. On the other hand, intron D is likely to represent a genuine variant. In all these cases it is likely that the relevant EST sequences are not correct due to a typical base miscalling in single-read automatic sequencing, i.e. AAA instead of AA for BG705986 and C instead of CC for BG699442.</p>
			</sec>
			<sec>
				<st>
					<p>Clustering ESTs by common splice sites</p>
				</st>
				<p>For each splice site predicted, ASPIC provides the list of ESTs supporting such splice sites, thus allowing the evaluation of the quality of the prediction in terms of number of ESTs confirming it. Moreover, this step allows the grouping of ESTs that strongly support a common transcript (by sharing the same sequence of splice sites).</p>
			</sec>
			<sec>
				<st>
					<p>Minimal set of full-length transcript isoforms</p>
				</st>
				<p>Since a feature of ASPIC is to report splice sites and corresponding factorization into genomic exons for each EST <it>(EST-exon-factorization </it>in our terminology), we have designed and implemented in the module <it>Transview </it>of ASPIC an efficient algorithm that combines EST-exon-factorization data into a set of minimal full-length transcripts that are supported by the evidence, i.e. by the set of available ESTs. Our algorithm is based on the use of directed acyclic graphs (DAG): nodes of the graph are EST-exon-factorizations, while edges connect nodes (sequences) that are related by a binary relation among EST-exon-factorization (<it>extension</it>). Paths in the graph represent possible full-length transcripts. Various methods based on graphs have been reported to predict transcripts from ESTs such as in <abbrgrp>
						<abbr bid="B28">28</abbr>
						<abbr bid="B10">10</abbr>
					</abbrgrp> and <abbrgrp>
						<abbr bid="B17">17</abbr>
					</abbrgrp>: our method is different from those approaches in the construction of the graph as well as in the way the graph is visited to report full-length transcripts. In contrast to graph based approaches proposed in <abbrgrp>
						<abbr bid="B17">17</abbr>
					</abbrgrp> or <abbrgrp>
						<abbr bid="B11">11</abbr>
					</abbrgrp> where nodes are exons or nucleotide sequences, our approach uses a reduced graph and an efficient visiting process that allows the reporting of all plausible paths, without requiring a trimming phase as in <abbrgrp>
						<abbr bid="B17">17</abbr>
					</abbrgrp> to remove redundant models. Indeed, our algorithm aims to reduce over predictions or false positives as well as to reduce the execution time required by the construction of a potentially exponential number of paths (putative full-length transcripts) in the graph. Moreover, the construction of the graph in our model is guided by input parameters that allows the user to specify the quality of predicted full-length transcript with respect to the set of transcripts supporting them.</p>
				<p><it>Transview </it>provides a visualization of full-length isoforms and for each predicted full-transcript their composition in terms of the ESTs that support the full-transcript. Details on the algorithm will be discussed elsewhere.</p>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Results</p>
			</st>
			<p>The capability of ASPIC method to computationally produce high quality gene predictions has been tested by performing two types of experiments. A first experiment consisted in comparing ASPIC data with data available from other database sources that collect intron-exon data obtained through computational as well as experimental methods. This first experiment shows the ability of ASPIC in predicting novel splice variants as well as in detecting good quality splice sites confirmed by other sources. In order to assess the quality and reliability of novel predictions, a second experiment has been carried out: this one consisted in comparing ASPIC data with those produced within the ENCODE project <abbrgrp>
					<abbr bid="B29">29</abbr>
				</abbrgrp> aimed at providing a reliable annotation of 1% of the human genome. In particular, we investigated the occurrence of false positives in ASPIC-predicted introns as determined by RT-PCR analysis for 22 genes located in 13 Encode regions.</p>
			<sec>
				<st>
					<p>Comparing ASPIC with other similar tools</p>
				</st>
				<p>The ASPIC method has been tested on a sample of 64 genes randomly chosen from the human Chromosome 1. Results are summarized in Table <tblr tid="T1">1</tblr> where they are also compared with those obtained by other publicly available resources. A total of 1009 introns were predicted by ASPIC as compared to 753 by ASAP, 495 by ASD and 1194 by AceView. ASPIC predicted 95.7%, 93.1% and 75.8% of introns predicted by ASAP, ASD and AceView, respectively. In general, predicted introns were well supported by genome-transcript alignments with 28.3 ESTs supporting each splice site on average. Missing introns may derive from additional ESTs not present in the UNIGENE cluster used by ASPIC or by the stringent parameter thresholds adopted in ASPIC to consider an intron prediction reliable. The large number of additional introns detected by AceView, but not by other resources, are partly due to the wrong selection &#8211; in some cases &#8211; of the genomic region to be considered for the analysis. For example, AceView predicts 45 introns in the gene AMPD1 w.r.t. the 14 introns predicted by ASPIC (13 in ASAP). In this case the genome region selected by AceView encompasses 113 kb covering AMPD1 and two additional genes. A similar problem can be observed with several other genes where the number of AceView introns is remarkably higher than that detected from other resources (e.g. ADAM15, AKR7A2, ARNT, ARPC5, ATAD3A, etc.). Also, AceView intron over-prediction is likely due to the use of less stringent parameters in genome-transcript alignments, as in the example shown in Fig. <figr fid="F4">4</figr>.</p>
				<fig id="F4">
					<title>
						<p>Figure 4</p>
					</title>
					<caption>
						<p>Example of intron boundaries detected for the human AHCYL1 gene by AceView and ASPIC</p>
					</caption>
					<text>
						<p>Example of intron boundaries detected for the human AHCYL1 gene by AceView and ASPIC. The hypothetical novel intron predicted by AceView (July 2003 release) with non-canonical splices can be reduced to a known intron by a single A-insertion. Intron coordinates are referred to Ensembl release 26.35.1.</p>
					</text>
					<graphic file="1471-2105-6-244-4"/>
				</fig>
				<p>However, ASPIC detected a total of 94 novel introns, each confirmed by 2.18 ESTs on average. It is interesting to note that our data show a higher occurrence of non-canonical splice sites with respect to previous estimates <abbrgrp>
						<abbr bid="B30">30</abbr>
					</abbrgrp>. Table <tblr tid="T2">2</tblr> shows splice sites for known and novel ASPIC predicted introns. These data are not unexpected as previous estimates did not consider most of the splicing variants of annotated genes. While some of the predicted introns may simply be artifactual it is likely that rarer splicing isoforms involve a higher proportion of non-canonical splice sites. Another striking observation from our analysis is that 62/64 genes (97%) show alternative splicing with an average of 11.9 transcripts/gene, a value similar to that from AceView data (see Table <tblr tid="T1">1</tblr>) but significantly higher than 2.3 and 5.1 estimated by ASAP and ASD repectively. It is worth mentioning that data reported by ASAP are not updated w.r.t. the latest Unigene/genome data and several genes (28/64) were not annotated in ASD. It should be considered that Unigene clusters are enlarging at a great rate and genomic sequences are also continuously updated. To address this problem ASPIC data are stored in a dynamic database. The relevant data for each gene query are stored in the ASPIC database so that if another user does a similar query the results are immediately available without carrying out a new analysis. However, the user can choose to overwrite stored data with updated genome and transcript data directly extracted from Ensembl and Unigene databases. The new data remain stored in the ASPIC database until a new overwrite request for the same gene query is made.</p>
			</sec>
			<sec>
				<st>
					<p>False positive incidence of ASPIC introns</p>
				</st>
				<p>In order to compare the false positive rate of introns predicted by ASPIC and other methods we analyzed the GENCODE experimental verification of computationally predicted introns for a set of 22 genes in 13 Encode regions (see the GENCODE annotations in the <supplr sid="S2">Additional file 2</supplr>). Of the total 44 introns not supported by RT-PCR experiments (labeled RT_negative) ASPIC supported only 12/44 whereas AceView supported 41/44 (Table <tblr tid="T3">3</tblr>). Interestingly, 7/12 ASPIC introns were supported by more than 2 ESTs, also showing high-scoring slice patterns (see <supplr sid="S3">Additional file 3</supplr>). This finding suggests possible leakages in experimental validations carried out within the Encode project.</p>
				<suppl id="S2">
					<title>
						<p>Additional File 2</p>
					</title>
					<text>
						<p>Gencode annotation of 13 Encode regions.</p>
					</text>
					<file name="1471-2105-6-244-S2.xls">
						<p>Click here for file</p>
					</file>
				</suppl>
				<tbl id="T3">
					<title>
						<p>Table 3</p>
					</title>
					<caption>
						<p>RT-negative introns supported by ASPIC</p>
					</caption>
					<tblbdy cols="6">
						<r>
							<c ca="left">
								<p>Encode Region</p>
							</c>
							<c ca="left">
								<p>Gene</p>
							</c>
							<c cspan="3" ca="left">
								<p>Intron position</p>
							</c>
							<c ca="left">
								<p>Prediction Method</p>
							</c>
						</r>
						<r>
							<c cspan="6">
								<hr/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><it>Chr</it></p>
							</c>
							<c ca="left">
								<p><it>Start</it></p>
							</c>
							<c ca="left">
								<p><it>End</it></p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c cspan="6">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENm004</p>
							</c>
							<c ca="left">
								<p>SLC5A1</p>
							</c>
							<c ca="left">
								<p>22</p>
							</c>
							<c ca="left">
								<p>30779886</p>
							</c>
							<c ca="left">
								<p>30787475</p>
							</c>
							<c ca="center">
								<p>ASPIC (3), ECgene, acembly</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENm004</p>
							</c>
							<c ca="left">
								<p>PISD</p>
							</c>
							<c ca="left">
								<p>22</p>
							</c>
							<c ca="left">
								<p>30350425</p>
							</c>
							<c ca="left">
								<p>30351061</p>
							</c>
							<c ca="center">
								<p>ASPIC (1), ensEstGene</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENm004</p>
							</c>
							<c ca="left">
								<p>PISD</p>
							</c>
							<c ca="left">
								<p>22</p>
							</c>
							<c ca="left">
								<p>30337622</p>
							</c>
							<c ca="left">
								<p>30338657</p>
							</c>
							<c ca="center">
								<p>acembly</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENm004</p>
							</c>
							<c ca="left">
								<p>PISD</p>
							</c>
							<c ca="left">
								<p>22</p>
							</c>
							<c ca="left">
								<p>30346557</p>
							</c>
							<c ca="left">
								<p>30351299</p>
							</c>
							<c ca="center">
								<p>acembly</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENm004</p>
							</c>
							<c ca="left">
								<p>PISD</p>
							</c>
							<c ca="left">
								<p>22</p>
							</c>
							<c ca="left">
								<p>30365972</p>
							</c>
							<c ca="left">
								<p>30366216</p>
							</c>
							<c ca="center">
								<p>acembly</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENm004</p>
							</c>
							<c ca="left">
								<p>RFPL3</p>
							</c>
							<c ca="left">
								<p>22</p>
							</c>
							<c ca="left">
								<p>31075439</p>
							</c>
							<c ca="left">
								<p>31078694</p>
							</c>
							<c ca="center">
								<p>acembly</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENm004</p>
							</c>
							<c ca="left">
								<p>SYN3</p>
							</c>
							<c ca="left">
								<p>22</p>
							</c>
							<c ca="left">
								<p>31727364</p>
							</c>
							<c ca="left">
								<p>31734939</p>
							</c>
							<c ca="center">
								<p>acembly</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENm004</p>
							</c>
							<c ca="left">
								<p>TIMP3</p>
							</c>
							<c ca="left">
								<p>22</p>
							</c>
							<c ca="left">
								<p>31521971</p>
							</c>
							<c ca="left">
								<p>31522263</p>
							</c>
							<c ca="center">
								<p>acembly</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENm004</p>
							</c>
							<c ca="left">
								<p>TIMP3</p>
							</c>
							<c ca="left">
								<p>22</p>
							</c>
							<c ca="left">
								<p>31521971</p>
							</c>
							<c ca="left">
								<p>31522271</p>
							</c>
							<c ca="center">
								<p>acembly</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENr223</p>
							</c>
							<c ca="left">
								<p>MTO1</p>
							</c>
							<c ca="left">
								<p>6</p>
							</c>
							<c ca="left">
								<p>74249065</p>
							</c>
							<c ca="left">
								<p>74253041</p>
							</c>
							<c ca="center">
								<p>ASPIC (6), ECgene, acembly</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENr223</p>
							</c>
							<c ca="left">
								<p>MTO1</p>
							</c>
							<c ca="left">
								<p>6</p>
							</c>
							<c ca="left">
								<p>74253206</p>
							</c>
							<c ca="left">
								<p>74258677</p>
							</c>
							<c ca="center">
								<p>ASPIC (5), ECgene, acembly</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENr231</p>
							</c>
							<c ca="left">
								<p>PSMD4</p>
							</c>
							<c ca="left">
								<p>1</p>
							</c>
							<c ca="left">
								<p>148044771</p>
							</c>
							<c ca="left">
								<p>148047709</p>
							</c>
							<c ca="center">
								<p>ASPIC (2), ECgene, acembly</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENr231</p>
							</c>
							<c ca="left">
								<p>PIP5K1A</p>
							</c>
							<c ca="left">
								<p>1</p>
							</c>
							<c ca="left">
								<p>148035078</p>
							</c>
							<c ca="left">
								<p>148039516</p>
							</c>
							<c ca="center">
								<p>acembly</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENr231</p>
							</c>
							<c ca="left">
								<p>PIP5K1A</p>
							</c>
							<c ca="left">
								<p>1</p>
							</c>
							<c ca="left">
								<p>148035192</p>
							</c>
							<c ca="left">
								<p>148035350</p>
							</c>
							<c ca="center">
								<p>acembly</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENr231</p>
							</c>
							<c ca="left">
								<p>PSMB4</p>
							</c>
							<c ca="left">
								<p>1</p>
							</c>
							<c ca="left">
								<p>148187431</p>
							</c>
							<c ca="left">
								<p>148194586</p>
							</c>
							<c ca="center">
								<p>acembly</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENr231</p>
							</c>
							<c ca="left">
								<p>PSMD4</p>
							</c>
							<c ca="left">
								<p>1</p>
							</c>
							<c ca="left">
								<p>148040228</p>
							</c>
							<c ca="left">
								<p>148047741</p>
							</c>
							<c ca="center">
								<p>acembly</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENr231</p>
							</c>
							<c ca="left">
								<p>PSMD4</p>
							</c>
							<c ca="left">
								<p>1</p>
							</c>
							<c ca="left">
								<p>148040358</p>
							</c>
							<c ca="left">
								<p>148044611</p>
							</c>
							<c ca="center">
								<p>acembly</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENr231</p>
							</c>
							<c ca="left">
								<p>PSMD4</p>
							</c>
							<c ca="left">
								<p>1</p>
							</c>
							<c ca="left">
								<p>148044684</p>
							</c>
							<c ca="left">
								<p>148047709</p>
							</c>
							<c ca="center">
								<p>acembly</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENr231</p>
							</c>
							<c ca="left">
								<p>PSMD4</p>
							</c>
							<c ca="left">
								<p>1</p>
							</c>
							<c ca="left">
								<p>148046796</p>
							</c>
							<c ca="left">
								<p>148047709</p>
							</c>
							<c ca="center">
								<p>acembly</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENr231</p>
							</c>
							<c ca="left">
								<p>SNX27</p>
							</c>
							<c ca="left">
								<p>1</p>
							</c>
							<c ca="left">
								<p>148423496</p>
							</c>
							<c ca="left">
								<p>148424527</p>
							</c>
							<c ca="center">
								<p>acembly</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENr231</p>
							</c>
							<c ca="left">
								<p>TUFT1</p>
							</c>
							<c ca="left">
								<p>1</p>
							</c>
							<c ca="left">
								<p>148350164</p>
							</c>
							<c ca="left">
								<p>148356015</p>
							</c>
							<c ca="center">
								<p>acembly</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENr231</p>
							</c>
							<c ca="left">
								<p>TUFT1</p>
							</c>
							<c ca="left">
								<p>1</p>
							</c>
							<c ca="left">
								<p>148356492</p>
							</c>
							<c ca="left">
								<p>148372163</p>
							</c>
							<c ca="center">
								<p>acembly</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENr232</p>
							</c>
							<c ca="left">
								<p>CRAT</p>
							</c>
							<c ca="left">
								<p>9</p>
							</c>
							<c ca="left">
								<p>128949911</p>
							</c>
							<c ca="left">
								<p>128950731</p>
							</c>
							<c ca="center">
								<p>ASPIC (1), acembly, softberryGene</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENr232</p>
							</c>
							<c ca="left">
								<p>PPP2R4</p>
							</c>
							<c ca="left">
								<p>9</p>
							</c>
							<c ca="left">
								<p>128953625</p>
							</c>
							<c ca="left">
								<p>128962345</p>
							</c>
							<c ca="center">
								<p>ASPIC (1), ECgene</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENr232</p>
							</c>
							<c ca="left">
								<p>PPP2R4</p>
							</c>
							<c ca="left">
								<p>9</p>
							</c>
							<c ca="left">
								<p>128952305</p>
							</c>
							<c ca="left">
								<p>128953168</p>
							</c>
							<c ca="center">
								<p>acembly</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENr232</p>
							</c>
							<c ca="left">
								<p>PPP2R4</p>
							</c>
							<c ca="left">
								<p>9</p>
							</c>
							<c ca="left">
								<p>128952336</p>
							</c>
							<c ca="left">
								<p>128953268</p>
							</c>
							<c ca="center">
								<p>acembly</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENr232</p>
							</c>
							<c ca="left">
								<p>PPP2R4</p>
							</c>
							<c ca="left">
								<p>9</p>
							</c>
							<c ca="left">
								<p>128952981</p>
							</c>
							<c ca="left">
								<p>128953304</p>
							</c>
							<c ca="center">
								<p>acembly</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENr232</p>
							</c>
							<c ca="left">
								<p>PPP2R4</p>
							</c>
							<c ca="left">
								<p>9</p>
							</c>
							<c ca="left">
								<p>128953105</p>
							</c>
							<c ca="left">
								<p>128953150</p>
							</c>
							<c ca="center">
								<p>acembly</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENr232</p>
							</c>
							<c ca="left">
								<p>SH3GLB2</p>
							</c>
							<c ca="left">
								<p>9</p>
							</c>
							<c ca="left">
								<p>128835746</p>
							</c>
							<c ca="left">
								<p>128849868</p>
							</c>
							<c ca="center">
								<p>acembly</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENr232</p>
							</c>
							<c ca="left">
								<p>SH3GLB2</p>
							</c>
							<c ca="left">
								<p>9</p>
							</c>
							<c ca="left">
								<p>128860722</p>
							</c>
							<c ca="left">
								<p>128862923</p>
							</c>
							<c ca="center">
								<p>acembly</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENr323</p>
							</c>
							<c ca="left">
								<p>LACE1</p>
							</c>
							<c ca="left">
								<p>6</p>
							</c>
							<c ca="left">
								<p>108794230</p>
							</c>
							<c ca="left">
								<p>108829892</p>
							</c>
							<c ca="center">
								<p>ASPIC (5), sgpGene</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENr323</p>
							</c>
							<c ca="left">
								<p>LACE1</p>
							</c>
							<c ca="left">
								<p>6</p>
							</c>
							<c ca="left">
								<p>108747689</p>
							</c>
							<c ca="left">
								<p>108751721</p>
							</c>
							<c ca="center">
								<p>acembly</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENr323</p>
							</c>
							<c ca="left">
								<p>SNX3</p>
							</c>
							<c ca="left">
								<p>6</p>
							</c>
							<c ca="left">
								<p>108688727</p>
							</c>
							<c ca="left">
								<p>108690771</p>
							</c>
							<c ca="center">
								<p>acembly</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENr333</p>
							</c>
							<c ca="left">
								<p>RNPC2</p>
							</c>
							<c ca="left">
								<p>20</p>
							</c>
							<c ca="left">
								<p>33764744</p>
							</c>
							<c ca="left">
								<p>33765167</p>
							</c>
							<c ca="center">
								<p>ASPIC (1), ECgene, acembly</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENr333</p>
							</c>
							<c ca="left">
								<p>RNPC2</p>
							</c>
							<c ca="left">
								<p>20</p>
							</c>
							<c ca="left">
								<p>33786418</p>
							</c>
							<c ca="left">
								<p>33787848</p>
							</c>
							<c ca="center">
								<p>ASPIC (1), ECgene</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENr333</p>
							</c>
							<c ca="left">
								<p>CEP2</p>
							</c>
							<c ca="left">
								<p>20</p>
							</c>
							<c ca="left">
								<p>33527835</p>
							</c>
							<c ca="left">
								<p>33529106</p>
							</c>
							<c ca="center">
								<p>acembly</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENr333</p>
							</c>
							<c ca="left">
								<p>CEP2</p>
							</c>
							<c ca="left">
								<p>20</p>
							</c>
							<c ca="left">
								<p>33554537</p>
							</c>
							<c ca="left">
								<p>33568378</p>
							</c>
							<c ca="center">
								<p>acembly</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENr333</p>
							</c>
							<c ca="left">
								<p>CEP2</p>
							</c>
							<c ca="left">
								<p>20</p>
							</c>
							<c ca="left">
								<p>33561298</p>
							</c>
							<c ca="left">
								<p>33568224</p>
							</c>
							<c ca="center">
								<p>acembly</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENr333</p>
							</c>
							<c ca="left">
								<p>ITGB4BP</p>
							</c>
							<c ca="left">
								<p>20</p>
							</c>
							<c ca="left">
								<p>33335958</p>
							</c>
							<c ca="left">
								<p>33343927</p>
							</c>
							<c ca="center">
								<p>acembly</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENr333</p>
							</c>
							<c ca="left">
								<p>RNPC2</p>
							</c>
							<c ca="left">
								<p>20</p>
							</c>
							<c ca="left">
								<p>33776436</p>
							</c>
							<c ca="left">
								<p>33777255</p>
							</c>
							<c ca="center">
								<p>acembly</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENr333</p>
							</c>
							<c ca="left">
								<p>RNPC2</p>
							</c>
							<c ca="left">
								<p>20</p>
							</c>
							<c ca="left">
								<p>33780699</p>
							</c>
							<c ca="left">
								<p>33780701</p>
							</c>
							<c ca="center">
								<p>acembly</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENr333</p>
							</c>
							<c ca="left">
								<p>SDBCAG84</p>
							</c>
							<c ca="left">
								<p>20</p>
							</c>
							<c ca="left">
								<p>33585738</p>
							</c>
							<c ca="left">
								<p>33593682</p>
							</c>
							<c ca="center">
								<p>acembly</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENr334</p>
							</c>
							<c ca="left">
								<p>TFEB</p>
							</c>
							<c ca="left">
								<p>6</p>
							</c>
							<c ca="left">
								<p>41766952</p>
							</c>
							<c ca="left">
								<p>41811861</p>
							</c>
							<c ca="center">
								<p>ASPIC (5), ECgene, acembly</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENr334</p>
							</c>
							<c ca="left">
								<p>TFEB</p>
							</c>
							<c ca="left">
								<p>6</p>
							</c>
							<c ca="left">
								<p>41766952</p>
							</c>
							<c ca="left">
								<p>41799176</p>
							</c>
							<c ca="center">
								<p>ASPIC (3), ECgene, acembly</p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>List of computationally predicted introns of 22 genes contained in 13 Encode regions (see the GENCODE annotations in the <supplr sid="S2">Additional file 2</supplr>) but not validated by RT-PCR analysis. For each intron are shown the Encode region, the gene ID, location (NCBI 35 assembly) and prediction methods (Acembly/AceView, <url>http://www.aceview.org</url>; ECgene [17]; ensEstGene [28]; softberryGene [35]. For ASPIC predictions the number of supporting in shown in the brackets.</p>
					</tblfn>
				</tbl>
			</sec>
			<sec>
				<st>
					<p>The ASPIC Web Resource</p>
				</st>
				<p>The ASPIC program can be accessed online at: <url>http://aspic.algo.disco.unimib.it/aspic-devel/</url>. ASPIC standard input data consist of a genomic sequence and a set of transcripts. Such data are acquired either automatically or by uploading files specified by the user. In the first case, a basic form permits the input of an official HUGO gene name for the genomic sequence (e.g. ABCB10, HUGO names are permitted only for human genes) and/or a Unigene cluster identifier (e.g. Hs.1710). EST clusters are automatically retrieved from Unigene, while genome sequences are retrieved by using the API provided from Ensembl. All results presented here are based on one of the latest releases (September 2004 Ensembl API release .25 and 2004 Unigene database release).</p>
				<p>The automatic acquisition of clusters is allowed for human and every other organism whose data may be acquired from the Ensembl database. A specific upload function allows the user to query ASPIC processing of arbitrary genomic sequences and transcript data in FASTA format.</p>
				<p>An advanced search form allows the user to run the ASPIC program by specifying basic parameters used to produce compatible EST alignments.</p>
				<p>We have tested our method using standard parameters suggested by experimental analysis of real data. For example, we choose a minimum exon length of 15 nt. The component length for building hash tables is computed by using a formula that relates the minimum exon length to the component length in such a way that the existence of an error-free substring in an EST factor is guaranteed.</p>
				<p>ASPIC outputs a complete description of each EST exon-factorization, with a view of the alignment to the genomic sequence, as well as a tabulated view of splice sites. The program provides an output file that contains detailed information about all EST exon-factorizations. This file is also processed by Perl scripts in order to produce and make available to the user from the ASPIC web site: i) a table view listing all detected introns; ii) a graphical view showing the general exon-intron arrangement of the queried gene; and iii) a transcript view showing all non-mergeable transcript models compatible with detected introns. In particular, the table reports the relative and absolute coordinates of each detected intron derived from the genomic sequence and genome build considered, respectively, as well as the number of confirming ESTs. Absolute coordinates, not provided by other resources, are particularly useful for the comparison of intron coordinates for a gene to those annotated in genome browsers. The main graphical view is a visualization of the intron structure of the genomic sequence derived from the tabulated data. Such a graphical view also provides links to a visualization of the alignment of the 15 base pairs of EST sequences closest to intron boundaries. Figure <figr fid="F5">5</figr> shows an example of the table, the graphical and the transcript view.</p>
				<fig id="F5">
					<title>
						<p>Figure 5</p>
					</title>
					<caption>
						<p>Snapshot of the ASPIC output for the gene HNRPR (human chromosome 1)</p>
					</caption>
					<text>
						<p>Snapshot of the ASPIC output for the gene HNRPR (human chromosome 1). The Table View (A) lists all detected introns, their coordinates and the number of supporting ESTs. The Alignment View (B) shows the alignment between genomic and EST sequences around splice sites. The Graphical View (C) provides a general scheme of the splicing pattern. The Transcript View (D) shows the minumum set of different transcripts compatible with the detected splicing patterns.</p>
					</text>
					<graphic file="1471-2105-6-244-5"/>
				</fig>
				<sec>
					<st>
						<p>ASPIC Execution time</p>
					</st>
					<p>The performance of ASPIC has been evaluated on a Pentium IV class PC, with 256 MB of main memory running the Linux operating system.</p>
					<p>The processing time for a single EST varied from 0.007 sec cpu time to a maximum of 2.5 sec cpu time, where the gene length varied from 5014 bp to 287011 bp, requiring on average around 71 seconds cpu time per gene. Thus ASPIC can process about 5000 ESTs in about half an hour of cpu time (against the four hours required in <abbrgrp>
							<abbr bid="B16">16</abbr>
						</abbrgrp>).</p>
				</sec>
				<sec>
					<st>
						<p>Experimental results: WEB-sources</p>
					</st>
					<p>The comparison of ASPIC data with other sources of splice sites has been carried out by accessing available databases from the web at the following sites: ASD <abbrgrp>
							<abbr bid="B31">31</abbr>
						</abbrgrp>, ASAP <abbrgrp>
							<abbr bid="B32">32</abbr>
						</abbrgrp>, Acembly <abbrgrp>
							<abbr bid="B33">33</abbr>
						</abbrgrp>.</p>
				</sec>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Conclusion</p>
			</st>
			<p>The ASPIC algorithm implements a novel methodology that optimizes the overall compatibility between genomic and transcript sequences to detect splice sites &#8211; thus minimizing mispredictions due to repetitive sequences or sequence errors in the ESTs. It does not impose constraints on the splice boundaries (i.e. strict observance of the GT-AG rule) but in case of equally likely alternative alignments adjusts splice boundaries to those observed to occur more frequently in known genes <abbrgrp>
					<abbr bid="B18">18</abbr>
				</abbrgrp>. Hence, it is able to detect non-canonical splice boundaries such as those of U12-dependent introns <abbrgrp>
					<abbr bid="B34">34</abbr>
				</abbrgrp> in the presence of suitable supporting transcripts (see <supplr sid="S3">Additional file 3</supplr>). Finally, ASPIC allows the user to carry out splicing predictions on a wide range of species as well as on user-submitted genome and transcript sequences.</p>
			<suppl id="S3">
				<title>
					<p>Additional File 3</p>
				</title>
				<text>
					<p>RT-negative introns detected by ASPIC.</p>
				</text>
				<file name="1471-2105-6-244-S3.pdf">
					<p>Click here for file</p>
				</file>
			</suppl>
		</sec>
		<sec>
			<st>
				<p>Availability and requirements</p>
			</st>
			<p>The ASPIC web tool is available to scientists wishing to use it at <url>http://aspic.algo.disco.unimib.it/aspic-devel/</url>. To submit a query to ASPIC the user needs to fill a form specifying the organism, the gene ID (Ensembl or HUGO), the Unigene cluster ID (optional) and providing an email address. The request is processed by the ASPIC software and when the results are available an email is automatically sent back to the address specified by the user, providing a link to processed data.</p>
			<p>ASPIC collects all the results of submitted queries in a dynamic database.</p>
			<p>Project name: ASPic Alternative Splicing Prediction</p>
			<p>Project home page: <url>http://aspic.algo.disco.unimib.it</url></p>
			<p>Programming language: C</p>
			<p>Operating system: Debian GNU/Linux 3.1, kernel 2.6.8</p>
			<p>Other requirements: Apache 1.3, Perl 5.8.4, Php 4.3.10, MySQL 4.1, gcc 3.3.5</p>
		</sec>
		<sec>
			<st>
				<p>Authors' contributions</p>
			</st>
			<p>GP conceived the study. PB and RR designed the algorithms and the general ASPIC method. RR implemented the method, realized the web resources and performed the experimental analysis. All authors participated in the design of the ASPIC tool and the experimental study. All authors have contributed in drafting the article.</p>
			<suppl id="S4">
				<title>
					<p>Additional File 4</p>
				</title>
				<text>
					<p>U12 dependent introns detected by ASPIC.</p>
				</text>
				<file name="1471-2105-6-244-S4.pdf">
					<p>Click here for file</p>
				</file>
			</suppl>
		</sec>
	</bdy>
   <bm>
		<ack>
			<sec>
				<st>
					<p>Acknowledgements</p>
				</st>
				<p>This work was supported by FIRB projects "Bioinformatica per la Genomica e la Proteomica" and "Laboratorio Italiano di Bioinformatica &#8211; L.I.BI." (Ministero dell'Istruzione e Ricerca Scientifica, Italy), Associazione Italiana Ricerca sul Cancro and Telethon. We thank Gianluca Delia Vedova for his helpful suggestions on the preliminary design of ASPIC software, David Horner and Giulio Pavesi for helpful comments on the manuscript and Gabriele Ravanelli for providing a Perl library to visualize ASPIC data.</p>
			</sec>
		</ack>
		<refgrp>
			<bibl id="B1">
				<title>
					<p>Initial sequencing and analysis of the human genome</p>
				</title>
				<aug>
					<au>
						<cnm>International Human Genome Sequencing Consortium IHGSC</cnm>
					</au>
				</aug>
				<source>Nature</source>
				<pubdate>2001</pubdate>
				<volume>409</volume>
				<issue>6822</issue>
				<fpage>860</fpage>
				<lpage>921</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/35057062</pubid>
						<pubid idtype="pmpid" link="fulltext">11237011</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B2">
				<title>
					<p>Alternative splicing: increasing diversity in the proteomic world</p>
				</title>
				<aug>
					<au>
						<snm>Graveley</snm>
						<fnm>B</fnm>
					</au>
				</aug>
				<source>Trends Genet</source>
				<pubdate>2001</pubdate>
				<volume>17</volume>
				<issue>2</issue>
				<fpage>100</fpage>
				<lpage>107</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0168-9525(00)02176-4</pubid>
						<pubid idtype="pmpid" link="fulltext">11173120</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B3">
				<title>
					<p>A genomic view of alternative splicing</p>
				</title>
				<aug>
					<au>
						<snm>Modrek</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Lee</snm>
						<fnm>C</fnm>
					</au>
				</aug>
				<source>Nat Genet</source>
				<pubdate>2002</pubdate>
				<volume>30</volume>
				<issue>1</issue>
				<fpage>13</fpage>
				<lpage>19</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/ng0102-13</pubid>
						<pubid idtype="pmpid" link="fulltext">11753382</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B4">
				<title>
					<p>Low conservation of alternative splicing patterns in the human and mouse genomes</p>
				</title>
				<aug>
					<au>
						<snm>Nurtdinov</snm>
						<fnm>RN</fnm>
					</au>
					<au>
						<snm>Artamonova</snm>
						<fnm>II</fnm>
					</au>
					<au>
						<snm>Mironov</snm>
						<fnm>AA</fnm>
					</au>
					<au>
						<snm>Gelfand</snm>
						<fnm>MS</fnm>
					</au>
				</aug>
				<source>Hum Mol Genet</source>
				<pubdate>2003</pubdate>
				<volume>12</volume>
				<issue>11</issue>
				<fpage>1313</fpage>
				<lpage>1320</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/hmg/ddg137</pubid>
						<pubid idtype="pmpid" link="fulltext">12761046</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B5">
				<title>
					<p>Genome-wide detection of tissue-specific alternative splicing in the human transcriptome</p>
				</title>
				<aug>
					<au>
						<snm>Xu</snm>
						<fnm>Q</fnm>
					</au>
					<au>
						<snm>Modrek</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Lee</snm>
						<fnm>C</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2002</pubdate>
				<volume>30</volume>
				<issue>17</issue>
				<fpage>3754</fpage>
				<lpage>3766</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">137414</pubid>
						<pubid idtype="pmpid" link="fulltext">12202761</pubid>
						<pubid idtype="doi">10.1093/nar/gkf492</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B6">
				<title>
					<p>Computational analysis of alternative splicing using EST tissue information</p>
				</title>
				<aug>
					<au>
						<snm>Xie</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Zhu</snm>
						<fnm>WY</fnm>
					</au>
					<au>
						<snm>Wasserman</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Grebinskiy</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Olson</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Mintz</snm>
						<fnm>L</fnm>
					</au>
				</aug>
				<source>Genomics</source>
				<pubdate>2002</pubdate>
				<volume>80</volume>
				<issue>3</issue>
				<fpage>326</fpage>
				<lpage>330</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1006/geno.2002.6841</pubid>
						<pubid idtype="pmpid" link="fulltext">12213203</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B7">
				<title>
					<p>Alternative splicing: multiple control mechanisms and involvement in human disease</p>
				</title>
				<aug>
					<au>
						<snm>Caceres</snm>
						<fnm>JF</fnm>
					</au>
					<au>
						<snm>Kornblihtt</snm>
						<fnm>AR</fnm>
					</au>
				</aug>
				<source>Trends Genet</source>
				<pubdate>2002</pubdate>
				<volume>18</volume>
				<issue>4</issue>
				<fpage>186</fpage>
				<lpage>193</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0168-9525(01)02626-9</pubid>
						<pubid idtype="pmpid" link="fulltext">11932019</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B8">
				<title>
					<p>Theoretical analysis of alternative splice forms using computational methods</p>
				</title>
				<aug>
					<au>
						<snm>Boue</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Vingron</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Kriventseva</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Koch</snm>
						<fnm>I</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2002</pubdate>
				<volume>18</volume>
				<issue>Suppl 2</issue>
				<fpage>S65</fpage>
				<lpage>S73</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">12385985</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B9">
				<title>
					<p>EST comparison indicates 38% of human mRNAs contain possible alternative splice forms</p>
				</title>
				<aug>
					<au>
						<snm>Brett</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Hanke</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Lehmann</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Haase</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Delbruck</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Krueger</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Reich</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Bork</snm>
						<fnm>P</fnm>
					</au>
				</aug>
				<source>FEBS Letters</source>
				<pubdate>2000</pubdate>
				<volume>474</volume>
				<issue>1</issue>
				<fpage>83</fpage>
				<lpage>86</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0014-5793(00)01581-7</pubid>
						<pubid idtype="pmpid" link="fulltext">10828456</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B10">
				<title>
					<p>Splicing graphs and EST assembly problem</p>
				</title>
				<aug>
					<au>
						<snm>Heber</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Alekseyev</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Sze</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Tang</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Pevzner</snm>
						<fnm>P</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2002</pubdate>
				<volume>18</volume>
				<issue>Suppl 1</issue>
				<fpage>S181</fpage>
				<lpage>S188</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">12169546</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B11">
				<title>
					<p>The Alternative Splicing Gallery (ASG): bridging the gap between genome and transcriptome</p>
				</title>
				<aug>
					<au>
						<snm>Leipzig</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Pevzner</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Heber</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2004</pubdate>
				<volum