<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
	<ui>gb-2005-6-7-224</ui>
	<ji>GBJ</ji>
	<fm>
		<dochead>Review</dochead>
		<bibl>
			<title>
				<p>Text-mining and information-retrieval services for molecular biology</p>
			</title>
			<aug>
				<au id="A1" ca="yes">
					<snm>Krallinger</snm>
					<fnm>Martin</fnm>
					<insr iid="I1"/>
					<email> martink@cnb.uam.es</email>
				</au>
				<au id="A2" ca="yes">
					<snm>Valencia</snm>
					<fnm>Alfonso</fnm>
					<insr iid="I1"/>
					<email> valencia@cnb.uam.es</email>
				</au>
			</aug>
			<insg>
				<ins id="I1">
					<p>Protein Design Group, National Center of Biotechnology, CNB-CSIC, Cantoblanco, E-28049 Madrid, Spain</p>
				</ins>
			</insg>
			<source>Genome Biology</source>
			<issn>1465-6906</issn>
			<pubdate>2005</pubdate>
			<volume>6</volume>
			<issue>7</issue>
			<fpage>224</fpage>
			<url>http://genomebiology.com/2005/6/7/224</url>
			<xrefbib>
				<pubidlist><pubid idtype="pmpid">15998455</pubid><pubid idtype="doi">10.1186/gb-2005-6-7-224</pubid>
				</pubidlist></xrefbib>
		</bibl>
		<history>
			<pub>
				<date>
					<day>28</day>
					<month>6</month>
					<year>2005</year>
				</date>
			</pub>
		</history>
		<cpyrt>
			<year>2005</year>
			<collab>BioMed Central Ltd</collab>
		</cpyrt>
		<shorttitle>
			<p>Text-mining and information-retrieval services for molecular biology</p>
		</shorttitle>
		<shortabs>
			<p>A range of text-mining applications have been developed recently that will improve access to knowledge for biologists and database annotators.</p>
		</shortabs>
		<abs>
			<sec>
				<st>
					<p>Abstract</p>
				</st>
				<p>Text-mining in molecular biology - defined as the automatic extraction of information about genes, proteins and their functional relationships from text documents - has emerged as a hybrid discipline on the edges of the fields of information science, bioinformatics and computational linguistics. A range of text-mining applications have been developed recently that will improve access to knowledge for biologists and database annotators.</p>
			</sec>
		</abs>
	</fm>
	<meta>
		<classifications>
			<classification type="BMC" subtype="man_spc_id" id="30010002">Bioinformatics</classification>
			<classification type="BMC" subtype="man_spc_id" id="30010013">Methods</classification>
		</classifications>
	</meta>
	<bdy>
		<sec>
			<st>
				<p/>
			</st>
			<p>The use of large-scale experimental techniques and bioinformatic tools has increased the pace at which biologists produce relevant information. This also promotes the growth of the scientific literature, which contains information on those experimental results in the form of free text that is structured in a way that makes it straightforward for humans to read but more difficult for computers to interpret automatically. As a consequence, there is increasing interest in methods that can handle collections of biological texts. Such methods include systems that efficiently retrieve and classify documents in response to complex user queries, and beyond this, systems that carry out a deeper analysis of the literature to extract specific associations, such as protein-protein interactions and protein functions. This deeper analysis is called text-mining. The complex and concise nature of the scientific literature means that the use of text-mining tools developed for generic texts is often impractical; a set of freely available text-mining applications adapted to the needs of biology have been developed, however, and some of them are now available for practical use. In parallel, a number of strategies for evaluating text-mining applications have appeared, with the goal of assessing and improving the field by providing datasets that can be used for training and testing applications.</p>
		</sec>
		<sec>
			<st>
				<p>Finding relevant articles</p>
			</st>
			<p>Throughout the last decade, the amount of electronically accessible textual material has been growing exponentially. Internet-based technologies exploit the availability of these large collections of documents for the development of information-retrieval systems. Currently, biologists and bioinformaticians take advantage of those tools, not only when searching generic documents such as news articles using search engines such as Alta Vista <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> and Google <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>, but especially when querying publications specific to biomedicine, for example those stored in PubMed <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. The range of community-wide genome projects, for which Internet-based information exchange is crucial, together with the heavy use of biology databases through web-based tools, means that natural language processing (NLP) techniques could be useful. NLP is based on the use of computers to process language, and it includes techniques developed to provide the basic methodology required for automatically extracting relevant functional information from unstructured data, such as scientific publications. Information retrieval and NLP systems are soon likely to become important not only for extracting information but also for assisting in various aspects of research such as the discovery of new facts, the interpretation of findings, and the design of experiments.</p>
			<p>One of the first steps when handling textual data is the extraction of relevant documents from a large collection. This process is commonly known as information retrieval. In the case of indexed web pages, powerful search engines such as Google <abbrgrp><abbr bid="B2">2</abbr></abbrgrp> return a ranked list of documents relevant to a given user search. There are two basic search strategies: query-based and document-based searches. In query-based searches, documents are returned that contain certain user-specified combinations of keywords. As some words - 'stop words' such as 'and', 'if' and 'the' - are found at a high frequency within most documents and thus display a low information content, they are often excluded during the retrieval process. Keywords may be combined by Boolean operators, such as AND, OR and NOT. The second type of retrieval, document-based searching, aims to return a ranked list of documents similar to a given query document as a whole, rather than to a combination of a few keywords. The most widely used retrieval tool in molecular biology is Entrez <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>, the PubMed information retrieval system provided at the US National Center for Biotechnology Information (NCBI) <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. It supports basic keyword and Boolean query-based searches, as well as document-based searches to return all abstracts that are similar to a given document. The popular search engine Google <abbrgrp><abbr bid="B2">2</abbr></abbrgrp> has recently incorporated a search tool specific to the academic literature, Google Scholar <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>, for the retrieval of scientific articles, reports and books. The ranking of the returned hits is mainly based on the extent to which documents are connected by citations and web links. Other scientific literature databases and search engines include Crossref Search <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>, which enables searches of the full content provided by a set of publishers, and the Nature Publishing Group search engine <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>, which allows advanced search strategies.</p>
			<p>Although these tools are useful for many tasks, it is time-consuming to use them for efficient searches and article selection, and such functions must be repeated periodically to keep knowledge up-to-date. As PubMed already contains over 15 million citations of biomedical articles <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> and is steadily growing (more than 450,000 articles are added every year <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>), services that periodically retrieve relevant articles and automatically alert the user have been implemented. Among those systems, known as selective dissemination of information (SDI) services, are My NCBI (formerly PubMed Cubby) <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B11">11</abbr></abbrgrp>, BioMail <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> and PubCrawler <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp> (these and other services described in this article are listed in Table <tblr tid="T1">1</tblr>). These, together with some commercial tools, have been evaluated independently <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, showing that the combined use of different SDI systems results in useful automated searching.</p>
			<tbl id="T1">
				<title>
					<p>Table 1</p>
				</title>
				<caption>
					<p>Biomedical text-mining resources, servers and programs</p>
				</caption>
				<tblbdy cols="4">
					<r>
						<c ca="left">
							<p>Name</p>
						</c>
						<c ca="left">
							<p>Description</p>
						</c>
						<c ca="left">
							<p>URL</p>
						</c>
						<c ca="center">
							<p>Published reference or URL*</p>
						</c>
					</r>
					<r>
						<c cspan="4">
							<hr/>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>Abbreviation Server</p>
						</c>
						<c ca="left">
							<p>Biomedical abbreviation server</p>
						</c>
						<c ca="left">
							<p>
								<url>http://bionlp.stanford.edu/abbreviation/</url>
							</p>
						</c>
						<c ca="center">
							<p>[35]</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>AbGene</p>
						</c>
						<c ca="left">
							<p>Protein name tagger</p>
						</c>
						<c ca="left">
							<p>
								<url>ftp://ftp.ncbi.nlm.nih.gov/pub/tanabe</url>
							</p>
						</c>
						<c ca="center">
							<p>[29]</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>ABNER</p>
						</c>
						<c ca="left">
							<p>Protein/Gene/DNA/RNA/cell tagger</p>
						</c>
						<c ca="left">
							<p>
								<url>http://www.cs.wisc.edu/~bsettles/abner/</url>
							</p>
						</c>
						<c ca="center">
							<p>[31]</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>AliasServer</p>
						</c>
						<c ca="left">
							<p>Protein alias handler</p>
						</c>
						<c ca="left">
							<p>
								<url>http://cbi.labri.fr/outils/alias/index.php</url>
							</p>
						</c>
						<c ca="center">
							<p>[37]</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>ARGH</p>
						</c>
						<c ca="left">
							<p>Biomedical acronym resolver</p>
						</c>
						<c ca="left">
							<p>
								<url>http://invention.swmed.edu/argh/</url>
							</p>
						</c>
						<c ca="center">
							<p>[88]</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>ARROWSMITH</p>
						</c>
						<c ca="left">
							<p>Extended MEDLINE search tool</p>
						</c>
						<c ca="left">
							<p>
								<url>http://kiwi.uchicago.edu/</url>
							</p>
						</c>
						<c ca="center">
							<p>[84]</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>BioMail</p>
						</c>
						<c ca="left">
							<p>PubMed updating and alerting service</p>
						</c>
						<c ca="left">
							<p>
								<url>http://biomail.sourceforge.net/biomail/</url>
							</p>
						</c>
						<c ca="center">
							<p>[12]</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>BioRAT</p>
						</c>
						<c ca="left">
							<p>Biology information extraction tool</p>
						</c>
						<c ca="left">
							<p>
								<url>http://bioinf.cs.ucl.ac.uk/biorat/</url>
							</p>
						</c>
						<c ca="center">
							<p>[81]</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>BITOLA</p>
						</c>
						<c ca="left">
							<p>Literature-based biomedical discovery system</p>
						</c>
						<c ca="left">
							<p>
								<url>http://www.mf.uni-lj.si/bitola/</url>
							</p>
						</c>
						<c ca="center">
							<p>[86]</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>Chilibot</p>
						</c>
						<c ca="left">
							<p>Relationship extraction</p>
						</c>
						<c ca="left">
							<p>
								<url>http://www.chilibot.net</url>
							</p>
						</c>
						<c ca="center">
							<p>[57]</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>CrossRef Search</p>
						</c>
						<c ca="left">
							<p>Full content search engine</p>
						</c>
						<c ca="left">
							<p>
								<url>http://www.crossref.org/crossrefsearch.html</url>
							</p>
						</c>
						<c ca="center">
							<p>[8]</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>GAPSCORE</p>
						</c>
						<c ca="left">
							<p>Protein name tagger</p>
						</c>
						<c ca="left">
							<p>
								<url>http://bionlp.stanford.edu/gapscore</url>
							</p>
						</c>
						<c ca="center">
							<p>[23]</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>Geisha</p>
						</c>
						<c ca="left">
							<p>Text-mining tool to assist microarray analysis</p>
						</c>
						<c ca="left">
							<p>
								<url>http://www.pdg.cnb.uam.es/blaschke/cgi-bin/geisha</url>
							</p>
						</c>
						<c ca="center">
							<p>[67]</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>GeneScene</p>
						</c>
						<c ca="left">
							<p>Information extraction for regulatory pathways</p>
						</c>
						<c ca="left">
							<p>
								<url>http://genescene.arizona.edu/index.html</url>
							</p>
						</c>
						<c ca="center">
							<p>[59]</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>GOAnnotator</p>
						</c>
						<c ca="left">
							<p>Annotation extraction from literature</p>
						</c>
						<c ca="left">
							<p>
								<url>http://xldb.fc.ul.pt/rebil/tools/goa/</url>
							</p>
						</c>
						<c ca="center">
							<p>[51]</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>Google Scholar</p>
						</c>
						<c ca="left">
							<p>Scholar literature search engine</p>
						</c>
						<c ca="left">
							<p>
								<url>http://scholar.google.com/</url>
							</p>
						</c>
						<c ca="center">
							<p>[6]</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>iHOP</p>
						</c>
						<c ca="left">
							<p>Information on hyperlinked proteins</p>
						</c>
						<c ca="left">
							<p>
								<url>http://www.pdg.cnb.uam.es/UniPub/iHOP/</url>
							</p>
						</c>
						<c ca="center">
							<p>[40]</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>iProLINK</p>
						</c>
						<c ca="left">
							<p>Protein annotation and tagging</p>
						</c>
						<c ca="left">
							<p>
								<url>http://pir.georgetown.edu/iprolink</url>
							</p>
						</c>
						<c ca="center">
							<p>[55]</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>KAT</p>
						</c>
						<c ca="left">
							<p>Annotate proteins from scientific references</p>
						</c>
						<c ca="left">
							<p>
								<url>http://www.bork.embl-heidelberg.de/kat/</url>
							</p>
						</c>
						<c ca="center">
							<p>[52]</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>KeX</p>
						</c>
						<c ca="left">
							<p>Protein name tagger</p>
						</c>
						<c ca="left">
							<p>
								<url>http://www.hgc.jp/service/tooldoc/KeX</url>
							</p>
						</c>
						<c ca="center">
							<p>[33]</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>KinasePathway database</p>
						</c>
						<c ca="left">
							<p>Tool for extraction of protein, gene and compound interactions from text</p>
						</c>
						<c ca="left">
							<p>
								<url>http://kinasedb.ontology.ims.u-tokyo.ac.jp</url>
							</p>
						</c>
						<c ca="center">
							<p>[46]</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>MedBlast</p>
						</c>
						<c ca="left">
							<p>Document retrieval for sequences</p>
						</c>
						<c ca="left">
							<p>
								<url>http://medblast.sibsnet.org/</url>
							</p>
						</c>
						<c ca="center">
							<p>[63]</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>MedMiner</p>
						</c>
						<c ca="left">
							<p>Extraction of sentences relevant to genes</p>
						</c>
						<c ca="left">
							<p>
								<url>http://discover.nci.nih.gov/textmining/main.jsp</url>
							</p>
						</c>
						<c ca="center">
							<p>[69]</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>microGENIE</p>
						</c>
						<c ca="left">
							<p>Text-mining for microarrays</p>
						</c>
						<c ca="left">
							<p>
								<url>http://www.cs.vu.nl/microgenie</url>
							</p>
						</c>
						<c ca="center">
							<p>[76]</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>My NCBI</p>
						</c>
						<c ca="left">
							<p>PubMed updating and alerting service</p>
						</c>
						<c ca="left">
							<p>
								<url>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed</url>
							</p>
						</c>
						<c ca="center">
							<p>[11]</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>NDPG</p>
						</c>
						<c ca="left">
							<p>Scores the literature based coherence of gene clusters</p>
						</c>
						<c ca="left">
							<p>None</p>
						</c>
						<c ca="center">
							<p>[66]</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>NLProt</p>
						</c>
						<c ca="left">
							<p>Protein name tagger</p>
						</c>
						<c ca="left">
							<p>
								<url>http://cubic.bioc.columbia.edu/services/nlprot/</url>
							</p>
						</c>
						<c ca="center">
							<p>[25]</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>NPG search engine</p>
						</c>
						<c ca="left">
							<p>Nature Publishing Group search engine</p>
						</c>
						<c ca="left">
							<p>
								<url>http://search.nature.com/search/?sp_a=sp1001702d&amp;sp_t=advanced&amp;sp_x_1=ujournal&amp;sp-p=all&amp;sp</url>
							</p>
						</c>
						<c ca="center">
							<p>[9]</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>PreBIND</p>
						</c>
						<c ca="left">
							<p>Classifier of protein interaction documents</p>
						</c>
						<c ca="left">
							<p>
								<url>http://bind.ca/</url>
							</p>
						</c>
						<c ca="center">
							<p>[44]</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>PubCrawler</p>
						</c>
						<c ca="left">
							<p>PubMed updating and alerting service</p>
						</c>
						<c ca="left">
							<p>
								<url>http://pubcrawler.gen.tcd.ie/</url>
							</p>
						</c>
						<c ca="center">
							<p>[13]</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>PubGene</p>
						</c>
						<c ca="left">
							<p>Text-mining tool for microarrays</p>
						</c>
						<c ca="left">
							<p>
								<url>http://www.pubgene.org/</url>
							</p>
						</c>
						<c ca="center">
							<p>[72]</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>PubMatrix</p>
						</c>
						<c ca="left">
							<p>Multiplex literature mining tool</p>
						</c>
						<c ca="left">
							<p>
								<url>http://pubmatrix.grc.nia.nih.gov/secure-bin/index.pl</url>
							</p>
						</c>
						<c ca="center">
							<p>[74]</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>PubMed Entrez</p>
						</c>
						<c ca="left">
							<p>Biomedical citation retrieval system</p>
						</c>
						<c ca="left">
							<p>
								<url>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed</url>
							</p>
						</c>
						<c ca="center">
							<p>[3]</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>Relationship Extractor</p>
						</c>
						<c ca="left">
							<p>Biomedical relationship extractor</p>
						</c>
						<c ca="left">
							<p>
								<url>http://www-personal.engin.umich.edu/~murthyr/Relationship_Extractor.html</url>
							</p>
						</c>
						<c ca="center">
							<p>[90]</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>SAWTED</p>
						</c>
						<c ca="left">
							<p>Text-enhanced remote homolog detector</p>
						</c>
						<c ca="left">
							<p>
								<url>http://www.sbg.bio.ic.ac.uk/~sawted/</url>
							</p>
						</c>
						<c ca="center">
							<p>[61]</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>Scopus</p>
						</c>
						<c ca="left">
							<p>Scientific literature database and search</p>
						</c>
						<c ca="left">
							<p>
								<url>http://www.scopus.com/scopus/home</url>
							</p>
						</c>
						<c ca="center">
							<p>[93]</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>Textpresso</p>
						</c>
						<c ca="left">
							<p><it>C. elegans </it>literature information retrieval and extraction tool</p>
						</c>
						<c ca="left">
							<p>
								<url>http://www.textpresso.org/</url>
							</p>
						</c>
						<c ca="center">
							<p>[48]</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>XplorMed</p>
						</c>
						<c ca="left">
							<p>Explores bibliographic MEDLINE searches</p>
						</c>
						<c ca="left">
							<p>
								<url>http://www.bork.embl-heidelberg.de/xplormed</url>
							</p>
						</c>
						<c ca="center">
							<p>[91]</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>Yapex</p>
						</c>
						<c ca="left">
							<p>Protein name tagger</p>
						</c>
						<c ca="left">
							<p>
								<url>http://ellis.sics.se:8080/cgi-bin/Yapex/yapex.cgi</url>
							</p>
						</c>
						<c ca="center">
							<p>[27]</p>
						</c>
					</r>
				</tblbdy>
				<tblfn>
					<p>An overview of some of the available text-mining, information-extraction, information-retrieval and selective dissemination of information services currently available. *References to articles describing each tool are given; where no article has been published, the reference is to the URL.</p>
				</tblfn>
			</tbl>
		</sec>
		<sec>
			<st>
				<p>The first step in text mining: identification of biological entities</p>
			</st>
			<p>Biological research is name-centered: proteins are referred to in free text by their names or symbols rather than using the unambiguous identifiers provided by annotation databases (such as SwissProt accession numbers <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>). Identifying mentions of proteins and genes unambiguously within free text is a fundamental step for the later extraction of functional attributes of these entities. Unfortunately this is a difficult process, partly because of the complex nature and usage of gene and protein names. Genes and proteins maybe referred to in free text in a range of different ways: as full names (for example, porin), as symbols (the <it>Saccharomyces cerevisiae </it>gene <it>POR1)</it>, and also through typographical variants <it>(POR-1)</it>. Many genes also have several synonyms (such as <it>OMP2 </it>for <it>POR1)</it>, or the gene name may be ambiguous <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> and refer to words that also have a different meanings depending on the context (for example, <it>big brain</it>, the full name for the <it>Drosophila melanogaster </it>gene <it>bib</it>, could also be an anatomical description). Furthermore, it has been suggested that errors in gene names might be introduced automatically by certain applications in bioinformatics <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>.</p>
			<p>In the NLP field, the identification of entities in free text is known as named-entity recognition (NER). To identify biological entities such as genes, proteins and drugs automatically and unambiguously within free text, over 50 information-extraction and text-mining tools have recently been implemented, and two community-wide evaluations have been carried out <abbrgrp><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr></abbrgrp>. The top left of Figure <figr fid="F1">1</figr> shows nine existing NER applications for biology that are provided via an online server or are directly downloadable. Note that the average recovery of biological entities from free text by 15 NER tools was 80%, and the results had an accuracy of 80% <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>; these figures are significantly lower than in the case of entities found in documents from fields such as economics, which demonstrates the complex nature of protein names.</p>
			<fig id="F1">
				<title>
					<p>Figure 1</p>
				</title>
				<caption>
					<p>An overview of biological natural language processing (BioNLP) and text-mining applications for biology</p>
				</caption>
				<text>
					<p>An overview of biological natural language processing (BioNLP) and text-mining applications for biology. The major topics are represented by the inner circle of seven approaches, and the corresponding applications are given in the outer layers of boxes. Most of the tools are available online or for download. Some applications could be classified into multiple topics; they are shown here associated with one of their most significant topics. For instance, most of the text-mining applications (that is, the applications that are not simply for article retrieval) have integrated modules for named entity recognition (NER), and selective dissemination of information (SDI) services often use automated Boolean queries for article retrieval. References and URLs for each application, where available, are given in Table 1.</p>
				</text>
				<graphic file="gb-2005-6-7-224-1"/>
			</fig>
			<p>Proteins and genes are characterized within biological databases through unique identifiers; each identifier is associated with its corresponding protein or nucleotide sequence and functional descriptions. The automatic recognition of entities such as genes and proteins in free text is insufficient if it is not linked to the corresponding database identifiers. Distinguishing between the use of protein names and protein-family names constitutes a serious obstacle in the task of highlighting protein entities in free text, as text passages sometimes refer to the general properties of protein families and at other times to the properties of individual proteins.</p>
			<p>Different research communities have addressed the issue of named-entity recognition in biology in different ways. The NLP community has typically tried to identify names by analyzing the syntactic structure of sentences, making use of information about parts of speech in a sentence and the syntactic roles of words, whereas bioinformaticians have instead explored the identification of variants of the names contained in databases, even adapting standard bioinformatics algorithms such as BLAST to the problem of protein-name identification <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. Neither of these two strategies seems to be efficient by itself, and many intermediate combinations are therefore appearing, including the following examples. GAPSCORE <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp> is an easy-to-use online tool for detecting protein and gene names within free text (a 'protein tagger'). The text to be analyzed can be pasted into an online form and submitted to the server, which returns a list of the words observed in the document and a statistical quality score that indicates how probable it is that the each word represents a gene or protein name. Another online protein tagger is NLProt, developed at Columbia University <abbrgrp><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr></abbrgrp>. NLProt is based on a machine learning technique called support vector machines (SVMs) and allows protein identification either in a submitted text or in the text corresponding to a list of submitted PubMed article identifiers. Additional protein taggers include Yapex <abbrgrp><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr></abbrgrp>, also available online, and three downloadable tools, AbGene <abbrgrp><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr></abbrgrp>, ABNER <abbrgrp><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr></abbrgrp> and KEX <abbrgrp><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp>. Abbreviations or acronyms are often used as a shorter form to refer to gene names in articles; the Abbreviation Server <abbrgrp><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr></abbrgrp> developed at Stanford University allows a similar search strategy to that used by GAPSCORE to be applied to biomedical abbreviations such as gene symbols. Finally, the AliasServer <abbrgrp><abbr bid="B37">37</abbr><abbr bid="B38">38</abbr></abbrgrp> helps in linking the various aliases of a given gene through different biological databases for various species.</p>
			<p>One of the main challenges when linking protein names to database entries is distinguising between proteins that have the same names but belong to different genomes - a process called inter-species gene disambiguation. This is especially cumbersome in the case of mouse and human genes; the same gene symbol is often used in both species and both names are often mentioned in the same textual passage. The complex nature of protein- and gene-name identification is reinforced further by the dynamic nature of gene-name usage and name creation, with official gene names being changed and new synonyms being created <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>; it is clear that static approaches and dictionaries will not be sufficient for solving the problem.</p>
		</sec>
		<sec>
			<st>
				<p>One step further: mining interactions and relations</p>
			</st>
			<p>Although the identification of biological entities is a crucial step, in practice it is the extraction of associations between proteins and their functional features that poses an interesting biological problem. Several systems have been constructed for extracting annotations of genes and proteins automatically and for detecting protein-protein interactions and regulatory pathways. Protein-protein interactions have attracted particular interest in the light of recent developments in high-throughput proteomics. One system that extracts annotations and detects interactions is the iHOP system that we have implemented at the Spanish National Biotechnology Center <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>. This facilitates the direct linking of information in the INTACT <abbrgrp><abbr bid="B41">41</abbr></abbrgrp> protein-interaction database with corresponding bibliographic references (Figure <figr fid="F2">2</figr>). As well as highlighting direct associations between genes and functional descriptions, iHOP also includes advanced search modes for discovery and visualization of literature-based protein-interaction networks for a range of organisms, including human, mouse and yeast <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>. The basic approach followed by iHOP is protein-centric: it arranges relevant sentences from the literature around protein names, and the use of co-citation of protein names in each sentence facilitates navigation through the dispersed literature relevant to a particular protein. As a result, users can successively explore the functions of related proteins by building virtual protein-relation networks (Figure <figr fid="F2">2c</figr>). The iHOP system is based on the ideas previously developed for the SUISEKI knowledge-discovery system <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>.</p>
			<fig id="F2">
				<title>
					<p>Figure 2</p>
				</title>
				<caption>
					<p>Basic steps in the use of the iHOP text-mining tool [40], illustrated with screenshots [42]</p>
				</caption>
				<text>
					<p>Basic steps in the use of the iHOP text-mining tool [40], illustrated with screenshots [42]. For a given query (for example, the protein symbols <b>(a) </b>Wnt-1 or <b>(b) </b>LEF-1), all the sentences mentioning the name are retrieved from PubMed. These sentences also contain mentions of other proteins, which are highlighted and which might show associations with the query protein (see the magnified area in (b)). Functional terms (such as 'target' and 'complexes' and interaction verbs (such as 'activated' and 'stabilizes') are in bold. <b>(c) </b>By clicking on the 'Gene model' link in the left panel in (a,b), interaction networks of proteins that co-occur in sentences with the query proteins can be displayed.</p>
				</text>
				<graphic file="gb-2005-6-7-224-2"/>
			</fig>
			<p>Some other text-mining applications include PreBIND <abbrgrp><abbr bid="B44">44</abbr><abbr bid="B45">45</abbr></abbrgrp>, developed to assist in the extraction of protein-protein interactions; the KinasePathway database text-mining system, which extracts interactions between proteins, genes and compounds <abbrgrp><abbr bid="B46">46</abbr><abbr bid="B47">47</abbr></abbrgrp>; and Textpresso <abbrgrp><abbr bid="B48">48</abbr><abbr bid="B49">49</abbr></abbrgrp>, an information-retrieval and extraction tool developed for the <it>Caenorhabditis elegans </it>literature in the context of the model-organism database WormBase <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>. Textpresso defines 33 categories of word describing entities or relationships - such as genes, pathways, or regulation - and integrates this 'Textpresso Ontology' with a text-mining system for searching the <it>C. elegans </it>literature. Among the text-mining services available online that focus on automatic annotation extraction are GOAnnotator, which provides associations between protein names and Gene Ontology terms <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>; KAT <abbrgrp><abbr bid="B52">52</abbr><abbr bid="B53">53</abbr></abbrgrp>, a system for deriving terms relevant to annotations such as SwissProt keywords and Gene Ontology terms <abbrgrp><abbr bid="B54">54</abbr></abbrgrp> from PubMed abstracts for a given query protein; and the iProLINK tool <abbrgrp><abbr bid="B55">55</abbr><abbr bid="B56">56</abbr></abbrgrp>, which performs automated extraction of annotations for given protein names and provides information related to the organisms in which proteins are found and the protein families of which they are members. Figure <figr fid="F1">1</figr> and Table <tblr tid="T1">1</tblr> provide an overview of the different systems currently available.</p>
			<p>A system with a special focus on the extraction of relationships between genes, proteins and other information is Chilibot (<abbrgrp><abbr bid="B57">57</abbr><abbr bid="B58">58</abbr></abbrgrp>; user registration is required before running queries); it allows searches using gene symbols and keywords, and the color-coded output provides information about gene-expression levels when available. The extraction of complex relationships can be handled by GeneScene <abbrgrp><abbr bid="B59">59</abbr><abbr bid="B60">60</abbr></abbrgrp>, a toolkit that provides visualization and navigation facilities for exploring regulatory networks; the tool currently provides information only on the literature on yeast and on the p53 tumor suppressor and the AP1 transcription factor.</p>
			<p>Some attempts have been made to merge text-mining methods and bioinformatic methods involving sequence analysis into a single system. The integration of functional information extracted by NLP algorithms with standard bioinformatic methods such as sequence-comparison techniques has been exploited by the Structure Assignment With Text Description (SAWTED) system <abbrgrp><abbr bid="B61">61</abbr><abbr bid="B62">62</abbr></abbrgrp>, which can be tested online. It combines a document-comparison algorithm called a 'Vector-cosine model' with the PSI-BLAST sequence retrieval method, which is especially useful for detecting sequences that are distantly related. Another strategy that makes use of sequence information and free text is MedBlast <abbrgrp><abbr bid="B63">63</abbr><abbr bid="B64">64</abbr></abbrgrp>; using the web-based interface of MedBlast, for a given query sequence and optional additional keywords the system returns articles related to the protein corresponding to the query sequence.</p>
		</sec>
		<sec>
			<st>
				<p>Text mining and large gene collections</p>
			</st>
			<p>Technical advances in molecular biology mean that large collections of genes are nowadays often studied simultaneously using genomic approaches. Using conventional information retrieval to link these genes with the associated literature is not efficient, and a large list of irrelevant documents can be returned. For example, microarray experiments result in groups of genes with particular expression patterns; to interpret these groups in terms of the underlying biological meaning, information is needed not only on each individual gene but also on commonalities among the whole group. The functional information is commonly extracted from databases such as SwissProt <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> or GO <abbrgrp><abbr bid="B65">65</abbr></abbrgrp>, which in turn are nourished by extracting relevant functional features from the literature.</p>
			<p>A number of text-mining methods have been developed for linking groups of genes found in microarrays and other experiments directly and automatically with information contained in biomedical article databases. The neighbor divergence per gene (NDPG) approach <abbrgrp><abbr bid="B66">66</abbr></abbrgrp> uses the literature to score the functional coherence of gene clusters. GEISHA <abbrgrp><abbr bid="B67">67</abbr><abbr bid="B68">68</abbr></abbrgrp> automatically mines the literature for functional terms associated with gene groups and carries out a statistical analysis of the significance of those terms. Among the available online tools for assisting in interpreting microarray data are MedMiner <abbrgrp><abbr bid="B69">69</abbr><abbr bid="B70">70</abbr></abbrgrp>, which can be used to filter and organize information from free text obtained from automatic PubMed <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> and GeneCard <abbrgrp><abbr bid="B71">71</abbr></abbrgrp> searches and PubGene <abbrgrp><abbr bid="B72">72</abbr><abbr bid="B73">73</abbr></abbrgrp> which has additional visualization capabilities for displaying network information and pathway mapping. The analysis of frequency matrices of term cooccurrences of two lists of keywords is the basis of the PubMatrix system <abbrgrp><abbr bid="B74">74</abbr><abbr bid="B75">75</abbr></abbrgrp>, which can be used online after registering. Finally, microGENIE <abbrgrp><abbr bid="B76">76</abbr></abbrgrp> enables semi-automatic queries of very large collections of genes (UniGene and SwissProt gene names and GenBank accession numbers) in PubMed to speed up the retrieval of relevant articles. It is important to realize that existing text-mining technologies in biology are focused on identification and linking of functional information of proteins in free text, they are currently not providing automatically generated summaries of biologically relevant information.</p>
		</sec>
		<sec>
			<st>
				<p>Towards knowledge discovery</p>
			</st>
			<p>The field of 'BioNLP' - text mining and information extraction for molecular biology - is very recent, but the existing applications are improving steadily. This is partly because of newly available resources, such as collections of annotated documents suitable for training new systems (for example, the GENIA <abbrgrp><abbr bid="B77">77</abbr></abbrgrp> corpus and the BioCreative <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> corpus). The improvement also reflects the effect of community-wide assessments such as the BioCreative contest <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> and the KDD challenge cup <abbrgrp><abbr bid="B78">78</abbr></abbrgrp>, which enable evaluation of the efficiency of different methodologies, and the genomics track of the Text Retrieval Conference (TREC) workshops <abbrgrp><abbr bid="B79">79</abbr><abbr bid="B80">80</abbr></abbrgrp>, a forum for developing solutions to information-retrieval and document-classification tasks in biology. The development of controlled, computer-readable vocabularies (ontologies), dictionaries, and functional keywords (Gene Ontology concepts <abbrgrp><abbr bid="B54">54</abbr></abbrgrp> and SwissProt keywords <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>) defining relevant biological aspects of proteins have also been valuable for text-mining tools. Because of the restricted availability of full-text articles most of the existing text-mining systems for biology are centered on the analysis of abstracts, but changes in publishing policy and increasing access to repositories of whole articles make mining of full text a likely development in the near future. Some initiatives in this direction have been started already, for example the BioRAT system <abbrgrp><abbr bid="B81">81</abbr><abbr bid="B82">82</abbr></abbrgrp>, which processes full-text articles so as to identify target facts.</p>
			<p>Perhaps the most likely future developments will be the construction of networks and interactions for discovering new relationships through intermediate entities, followed by the proposal of new functions - this process is referred to as 'knowledge discovery'. Several exploratory attempts have been made to develop knowledge-discovery systems, but they are not yet of general practical use. Our SUISEKI system <abbrgrp><abbr bid="B83">83</abbr></abbrgrp>, for instance, extracts indirect relationships between proteins through associations with intermediate proteins in text. Two online tools that directly address the difficulty of making knowledge-discovery practical are ARROWSMITH <abbrgrp><abbr bid="B84">84</abbr><abbr bid="B85">85</abbr></abbrgrp> and BITOLA <abbrgrp><abbr bid="B86">86</abbr><abbr bid="B87">87</abbr></abbrgrp>. ARROWSMITH <abbrgrp><abbr bid="B84">84</abbr><abbr bid="B85">85</abbr></abbrgrp> aims to discover indirect relations between two entities that are not directly connected in the literature; the indirect relationship can be a substance or disease condition. BITOLA <abbrgrp><abbr bid="B86">86</abbr><abbr bid="B87">87</abbr></abbrgrp> is a biomedical discovery-support system with a focus on the discovery of disease candidate genes, taking advantage of Medical Subject Heading (MeSH) terms.</p>
			<p>Undoubtedly, the development of text-mining applications specific for biology is the only way to cope with the increasing amount of free textual data produced in this field. The increasing interest of users in efficiently retrieving and extracting relevant information, the need to keep up with new discoveries described in the literature or in biological databases, and the demands posed by the analysis of high-throughput experiments, are the underlying forces motivating the development of text-mining applications in molecular biology. Those technologies should provide the foundation for future knowledge-discovery tools able to identify previously undiscovered associations, something that will assist in the formulation of models of biological systems.</p>
		</sec>
	</bdy>
	<bm>
		<ack>
			<sec>
				<st>
					<p>Acknowledgements</p>
				</st>
				<p>The work of our group was supported by grants from the European Commission (ORIEL IST-2001-32688, TEMBLOR QLRT-2001-00015, Biosapiens LSHC-CT-2003-505265). We thank Robert Hoffmann for providing Figure <figr fid="F2">2</figr> and Christian Blaschke, as well as all the members of the group, for interesting discussions.</p>
			</sec>
		</ack>
		<refgrp>
			<bibl id="B1">
				<title>
					<p>Altavista</p>
				</title>
				<url>http://www.altavista.com</url>
			</bibl>
			<bibl id="B2">
				<title>
					<p>Google</p>
				</title>
				<url>http://www.google.com</url>
			</bibl>
			<bibl id="B3">
				<title>
					<p>Entrez: molecular biology database and retrieval system.</p>
				</title>
				<aug>
					<au>
						<snm>Schuler</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Epstein</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Ohkawa</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Kans</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Methods Enzymol</source>
				<pubdate>1996</pubdate>
				<volume>266</volume>
				<fpage>141</fpage>
				<lpage>162</lpage>
				<xrefbib>
					<pubid idtype="pmpid">8743683</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B4">
				<title>
					<p>Entrez PubMed</p>
				</title>
				<url>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed</url>
			</bibl>
			<bibl id="B5">
				<title>
					<p>Database resources of the National Center for Biotechnology.</p>
				</title>
				<aug>
					<au>
						<snm>Wheeler</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Church</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Federhen</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Lash</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Madden</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Pontius</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Schuler</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Schriml</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Sequeira</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Tatusova</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Wagner</snm>
						<fnm>L</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2003</pubdate>
				<volume>31</volume>
				<fpage>28</fpage>
				<lpage>33</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">165480</pubid>
						<pubid idtype="pmpid" link="fulltext">12519941</pubid>
						<pubid idtype="doi">10.1093/nar/gkg033</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B6">
				<title>
					<p>The ultimate search engine?</p>
				</title>
				<aug>
					<au>
						<cnm>Editorial</cnm>
					</au>
				</aug>
				<source>Nat Cell Biol</source>
				<pubdate>2005</pubdate>
				<volume>7</volume>
				<fpage>1</fpage>
			</bibl>
			<bibl id="B7">
				<title>
					<p>Google Scholar</p>
				</title>
				<url>http://scholar.google.com</url>
			</bibl>
			<bibl id="B8">
				<title>
					<p>CrossRef Search, publisher pilot for full-text scholarly research</p>
				</title>
				<url>http://www.crossref.org/crossrefsearch.html</url>
			</bibl>
			<bibl id="B9">
				<title>
					<p>Nature Publishing Group search engine</p>
				</title>
				<url>http://search.nature.com/search/?sp_a=sp1001702d&amp;sp_t=advanced &amp;sp_x_1=ujournal&amp;sp-p=all&amp;sp</url>
			</bibl>
			<bibl id="B10">
				<title>
					<p>Mining information for functional genomics.</p>
				</title>
				<aug>
					<au>
						<snm>Staab</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Blaschke</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Nedellec</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Park</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Schatz</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Valencia</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Bernardi</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Ratsch</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Kania</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Saric</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Rojas</snm>
						<fnm>I</fnm>
					</au>
					<au>
						<snm>Staab</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>IEEE Intelligent Systems</source>
				<pubdate>2002</pubdate>
				<volume>17</volume>
				<fpage>66</fpage>
				<lpage>80</lpage>
				<xrefbib>
					<pubid idtype="doi">10.1109/5254.988491</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B11">
				<title>
					<p>Internet Grateful Med to be retired; reminder of NLM Gateway availability.</p>
				</title>
				<aug>
					<au>
						<snm>Knecht</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Shooshan</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>NLM Tech Bull</source>
				<pubdate>2001</pubdate>
				<volume>318</volume>
				<fpage>e3</fpage>
			</bibl>
			<bibl id="B12">
				<title>
					<p>Biomail</p>
				</title>
				<url>http://biomail.sourceforge.net/biomail</url>
			</bibl>
			<bibl id="B13">
				<title>
					<p>PubCrawler: keeping up comfortably with PubMed and GenBank.</p>
				</title>
				<aug>
					<au>
						<snm>Hokamp</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Wolfe</snm>
						<fnm>K</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2004</pubdate>
				<volume>32</volume>
				<fpage>W16</fpage>
				<lpage>W19</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">441591</pubid>
						<pubid idtype="pmpid" link="fulltext">15215341</pubid>
						<pubid idtype="doi">10.1093/nar/gnh017</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B14">
				<title>
					<p>PubCrawler</p>
				</title>
				<url>http://pubcrawler.gen.tcd.ie/</url>
			</bibl>
			<bibl id="B15">
				<title>
					<p>MEDLINE SDI services: how do they compare?</p>
				</title>
				<aug>
					<au>
						<snm>Shultz</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>DeGroote</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>J Med Libr Assoc</source>
				<pubdate>2003</pubdate>
				<volume>91</volume>
				<fpage>460</fpage>
				<lpage>467</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">209512</pubid>
						<pubid idtype="pmpid">14566377</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B16">
				<title>
					<p>Expasy - SwissProt and TrEMBL</p>
				</title>
				<url>http://us.expasy.org/sprot</url>
			</bibl>
			<bibl id="B17">
				<title>
					<p>Gene name ambiguity of eukaryotic nomenclatures.</p>
				</title>
				<aug>
					<au>
						<snm>Chen</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Liu</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Friedman</snm>
						<fnm>C</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2005</pubdate>
				<volume>21</volume>
				<fpage>248</fpage>
				<lpage>256</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/bth496</pubid>
						<pubid idtype="pmpid" link="fulltext">15333458</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B18">
				<title>
					<p>Mistaken identifiers: gene name errors can be introduced inadvertently when using Excel in bioinformatics.</p>
				</title>
				<aug>
					<au>
						<snm>Zeeberg</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Riss</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Kane</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Bussey</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Uchio</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Linehan</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Barrett</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Weinstein</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>BMC Bioinformatics</source>
				<pubdate>2004</pubdate>
				<volume>5</volume>
				<fpage>80</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">459209</pubid>
						<pubid idtype="pmpid" link="fulltext">15214961</pubid>
						<pubid idtype="doi">10.1186/1471-2105-5-80</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B19">
				<title>
					<p>Overview of BioCreAtIvE: critical assessment of information extraction for biology.</p>
				</title>
				<aug>
					<au>
						<snm>Hirschman</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Yeh</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Blaschke</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Valencia</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>BMC Bioinformatics</source>
				<pubdate>2005</pubdate>
				<volume>6</volume>
				<issue>Suppl 1</issue>
				<fpage>S1</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1186/1471-2105-6-S1-S1</pubid>
						<pubid idtype="pmpid" link="fulltext">15960821</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B20">
				<title>
					<p>Introduction to the bioentity recognition task at JNLPBA.</p>
				</title>
				<aug>
					<au>
						<snm>Kim</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Ohta</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Tsuruoka</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Tateisi</snm>
						<fnm>Y</fnm>
					</au>
				</aug>
				<source>Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications Geneva</source>
				<fpage>70</fpage>
				<lpage>76</lpage>
				<url>http://www.genisis.ch/~natlang/JNLPBA04/JNLPBA.final.pdf</url>
				<note>28-29 August 2004</note>
			</bibl>
			<bibl id="B21">
				<title>
					<p>BioCreAtIvE task 1A: gene mention finding evaluation.</p>
				</title>
				<aug>
					<au>
						<snm>Yeh</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Morgan</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Colosimo</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Hirschman</snm>
						<fnm>L</fnm>
					</au>
				</aug>
				<source>BMC Bioinformatics</source>
				<pubdate>2005</pubdate>
				<volume>6</volume>
				<issue>Suppl 1</issue>
				<fpage>S2</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1186/1471-2105-6-S1-S2</pubid>
						<pubid idtype="pmpid" link="fulltext">15960832</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B22">
				<title>
					<p>Using BLAST for identifying gene and protein names in journal articles.</p>
				</title>
				<aug>
					<au>
						<snm>Krauthammer</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Rzhetsky</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Morozov</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Friedman</snm>
						<fnm>C</fnm>
					</au>
				</aug>
				<source>Gene</source>
				<pubdate>2000</pubdate>
				<volume>259</volume>
				<fpage>245</fpage>
				<lpage>252</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0378-1119(00)00431-5</pubid>
						<pubid idtype="pmpid" link="fulltext">11163982</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B23">
				<title>
					<p>GAPSCORE: finding gene and protein names one word at a time.</p>
				</title>
				<aug>
					<au>
						<snm>Chang</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Schutze</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Altman</snm>
						<fnm>R</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2004</pubdate>
				<volume>20</volume>
				<fpage>216</fpage>
				<lpage>225</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/btg393</pubid>
						<pubid idtype="pmpid" link="fulltext">14734313</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B24">
				<title>
					<p>Gene and Protein Name Server</p>
				</title>
				<url>http://bionlp.stanford.edu/gapscore</url>
			</bibl>
			<bibl id="B25">
				<title>
					<p>NLProt: extracting protein names and sequences from papers.</p>
				</title>
				<aug>
					<au>
						<snm>Mika</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Rost</snm>
						<fnm>B</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2004</pubdate>
				<volume>32</volume>
				<fpage>W634</fpage>
				<lpage>W637</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">441565</pubid>
						<pubid idtype="pmpid" link="fulltext">15215466</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B26">
				<title>
					<p>CUBIC: NLProt/Index</p>
				</title>
				<url>http://cubic.bioc.columbia.edu/services/nlprot</url>
			</bibl>
			<bibl id="B27">
				<title>
					<p>Protein names and how to find them.</p>
				</title>
				<aug>
					<au>
						<snm>Franzen</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Eriksson</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Olsson</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Asker</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Liden</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Coster</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Int J Med Inform</source>
				<pubdate>2002</pubdate>
				<volume>67</volume>
				<fpage>49</fpage>
				<lpage>61</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S1386-5056(02)00052-7</pubid>
						<pubid idtype="pmpid" link="fulltext">12460631</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B28">
				<title>
					<p>Yapex</p>
				</title>
				<url>http://ellis.sics.se:8080/cgi-bin/Yapex/yapex.cgi</url>
			</bibl>
			<bibl id="B29">
				<title>
					<p>Tagging gene and protein names in biomedical text.</p>
				</title>
				<aug>
					<au>
						<snm>Tanabe</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Wilbur</snm>
						<fnm>W</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2002</pubdate>
				<volume>18</volume>
				<fpage>1124</fpage>
				<lpage>1132</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/18.8.1124</pubid>
						<pubid idtype="pmpid" link="fulltext">12176836</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B30">
				<title>
					<p>AbGene</p>
				</title>
				<url>ftp://ftp.ncbi.nlm.nih.gov/pub/tanabe</url>
			</bibl>
			<bibl id="B31">
				<title>
					<p>Biomedical named entity recognition using conditional random fields and rich feature sets.</p>
				</title>
				<aug>
					<au>
						<snm>Settles</snm>
						<fnm>B</fnm>
					</au>
				</aug>
				<source>Proc. NLPBA/COLING 2004</source>
				<pubdate>2004</pubdate>
			</bibl>
			<bibl id="B32">
				<title>
					<p>ABNER: a biomedical named entity recognizer</p>
				</title>
				<url>http://www.cs.wisc.edu/~bsettles/abner</url>
			</bibl>
			<bibl id="B33">
				<title>
					<p>Toward information extraction: identifying protein names from biological papers.</p>
				</title>
				<aug>
					<au>
						<snm>Fukuda</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Tsunoda</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Tamura</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Takagi</snm>
						<fnm>T</fnm>
					</au>
				</aug>
				<source>Pac Symp Biocomput</source>
				<pubdate>1998</pubdate>
				<volume>3</volume>
				<fpage>707</fpage>
				<lpage>718</lpage>
			</bibl>
			<bibl id="B34">
				<title>
					<p>KeX</p>
				</title>
				<url>http://www.hgc.jp/service/tooldoc/KeX</url>
			</bibl>
			<bibl id="B35">
				<title>
					<p>Creating an online dictionary of abbreviations from MEDLINE.</p>
				</title>
				<aug>
					<au>
						<snm>Chang</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Schuetze</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Altman</snm>
						<fnm>R</fnm>
					</au>
				</aug>
				<source>J Am Med Inform Assoc</source>
				<pubdate>2002</pubdate>
				<volume>9</volume>
				<fpage>612</fpage>
				<lpage>620</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">349378</pubid>
						<pubid idtype="pmpid" link="fulltext">12386112</pubid>
						<pubid idtype="doi">10.1197/jamia.M1139</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B36">
				<title>
					<p>Biomedical Abbreviation Server</p>
				</title>
				<url>http://bionlp.stanford.edu/abbreviation</url>
			</bibl>
			<bibl id="B37">
				<title>
					<p>AliasServer: a web server to handle multiple aliases used to refer to proteins.</p>
				</title>
				<aug>
					<au>
						<snm>Iragne</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Barre</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Goffard</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>DeDaruvar</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2004</pubdate>
				<volume>20</volume>
				<fpage>2331</fpage>
				<lpage>2332</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/bth241</pubid>
						<pubid idtype="pmpid" link="fulltext">15059813</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B38">
				<title>
					<p>AliasServer</p>
				</title>
				<url>http://cbi.labri.fr/outils/alias/index.php</url>
			</bibl>
			<bibl id="B39">
				<title>
					<p>Life cycles of successful genes.</p>
				</title>
				<aug>
					<au>
						<snm>Hoffmann</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Valencia</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>Trends Genet</source>
				<pubdate>2003</pubdate>
				<volume>19</volume>
				<fpage>79</fpage>
				<lpage>81</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0168-9525(02)00014-8</pubid>
						<pubid idtype="pmpid" link="fulltext">12547515</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B40">
				<title>
					<p>A gene network for navigating the literature.</p>
				</title>
				<aug>
					<au>
						<snm>Hoffmann</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Valencia</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>Nat Genet</source>
				<pubdate>2004</pubdate>
				<volume>36</volume>
				<fpage>664</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/ng0704-664</pubid>
						<pubid idtype="pmpid" link="fulltext">15226743</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B41">
				<title>
					<p>IntAct: an open source molecular interaction database.</p>
				</title>
				<aug>
					<au>
						<snm>Hermjakob</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Montecchi-Palazzi</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Lewington</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Mudali</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Kerrien</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Orchard</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Vingron</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Roechert</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Roepstorff</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Valencia</snm>
						<fnm>A</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2004</pubdate>
				<volume>32</volume>
				<fpage>D452</fpage>
				<lpage>D455</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">308786</pubid>
						<pubid idtype="pmpid" link="fulltext">14681455</pubid>
						<pubid idtype="doi">10.1093/nar/gkh052</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B42">
				<title>
					<p>Information hyperlinked over proteins (iHOP)</p>
				</title>
				<url>http://www.pdg.cnb.uam.es/UniPub/iHOP</url>
			</bibl>
			<bibl id="B43">
				<title>
					<p>The frame-based module of the Suiseki information extraction system.</p>
				</title>
				<aug>
					<au>
						<snm>Blaschke</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Valencia</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>IEEE Intelligent Systems</source>
				<pubdate>2002</pubdate>
				<volume>17</volume>
				<fpage>14</fpage>
				<lpage>20</lpage>
			</bibl>
			<bibl id="B44">
				<title>
					<p>PreBIND and Textomy - mining the biomedical literature for protein-protein interactions using a support vector machine.</p>
				</title>
				<aug>
					<au>
						<snm>Donaldson</snm>
						<fnm>I</fnm>
					</au>
					<au>
						<snm>Martin</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>deBruijn</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Wolting</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Lay</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Tuekam</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Zhang</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Baskin</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Bader</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Michalickova</snm>
						<fnm>K</fnm>
					</au>
					<etal/>
				</aug>
				<source>BMC Bioinformatics</source>
				<pubdate>2003</pubdate>
				<volume>4</volume>
				<fpage>11</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">153503</pubid>
						<pubid idtype="pmpid" link="fulltext">12689350</pubid>
						<pubid idtype="doi">10.1186/1471-2105-4-11</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B45">
				<title>
					<p>BIND - The Biomolecular Interaction Network</p>
				</title>
				<url>http://bind.ca</url>
			</bibl>
			<bibl id="B46">
				<title>
					<p>Kinase pathway database: an integrated protein-kinase and NLP-based protein-interaction resource.</p>
				</title>
				<aug>
					<au>
						<snm>Koike</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Kobayashi</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Takagi</snm>
						<fnm>T</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2003</pubdate>
				<volume>13</volume>
				<fpage>1231</fpage>
				<lpage>1243</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">403651</pubid>
						<pubid idtype="pmpid" link="fulltext">12799355</pubid>
						<pubid idtype="doi">10.1101/gr.835903</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B47">
				<title>
					<p>Kinase Pathway database</p>
				</title>
				<url>http://kinasedb.ontology.ims.u-tokyo.ac.jp</url>
			</bibl>
			<bibl id="B48">
				<title>
					<p>Textpresso: an ontology-based information retrieval and extraction system for biological literature.</p>
				</title>
				<aug>
					<au>
						<snm>Muller</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Kenny</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Sternberg</snm>
						<fnm>P</fnm>
					</au>
				</aug>
				<source>PLoS Biol</source>
				<pubdate>2004</pubdate>
				<volume>2</volume>
				<fpage>e309</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">517822</pubid>
						<pubid idtype="pmpid" link="fulltext">15383839</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B49">
				<title>
					<p>Textpresso</p>
				</title>
				<url>http://www.textpresso.org</url>
			</bibl>
			<bibl id="B50">
				<title>
					<p>Wormbase</p>
				</title>
				<url>http://www.wormbase.org</url>
			</bibl>
			<bibl id="B51">
				<title>
					<p>GOAnnotator</p>
				</title>
				<url>http://xldb.fc.ul.pt/rebil/tools/goa</url>
			</bibl>
			<bibl id="B52">
				<title>
					<p>Gene annotation from scientific literature using mappings between keyword systems. Bi</p>
				</title>
				<aug>
					<au>
						<snm>Perez</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Perez-lratxeta</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Bork</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Thode</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Andrade</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Bioinformoatics</source>
				<pubdate>2004</pubdate>
				<volume>20</volume>
				<fpage>2084</fpage>
				<lpage>2091</lpage>
				<xrefbib>
					<pubid idtype="doi">10.1093/bioinformatics/bth207</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B53">
				<title>
					<p>KAT</p>
				</title>
				<url>http://www.bork.embl-heidelberg.de/kat</url>
			</bibl>
			<bibl id="B54">
				<title>
					<p>An Introduction to the Gene Ontology</p>
				</title>
				<url>http://www.geneontology.org/GO.doc.shtml</url>
			</bibl>
			<bibl id="B55">
				<title>
					<p>iProLINK: an integrated protein resource for literature mining.</p>
				</title>
				<aug>
					<au>
						<snm>Hu</snm>
						<fnm>Z</fnm>
					</au>
					<au>
						<snm>Mani</snm>
						<fnm>I</fnm>
					</au>
					<au>
						<snm>Hermoso</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Liu</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Wu</snm>
						<fnm>C</fnm>
					</au>
				</aug>
				<source>Comput Biol Chem</source>
				<pubdate>2004</pubdate>
				<volume>28</volume>
				<fpage>409</fpage>
				<lpage>416</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/j.compbiolchem.2004.09.010</pubid>
						<pubid idtype="pmpid" link="fulltext">15556482</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B56">
				<title>
					<p>iProLINK</p>
				</title>
				<url>http://pir.georgetown.edu/iprolink</url>
			</bibl>
			<bibl id="B57">
				<title>
					<p>Content-rich biological network constructed by mining PubMed abstracts.</p>
				</title>
				<aug>
					<au>
						<snm>Che</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Sharp</snm>
						<fnm>B</fnm>
					</au>
				</aug>
				<source>BMC Bioinformatics</source>
				<pubdate>2004</pubdate>
				<volume>5</volume>
				<fpage>147</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">528731</pubid>
						<pubid idtype="pmpid" link="fulltext">15473905</pubid>
						<pubid idtype="doi">10.1186/1471-2105-5-147</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B58">
				<title>
					<p>Chilibot</p>
				</title>
				<url>http://www.chilibot.net</url>
			</bibl>
			<bibl id="B59">
				<title>
					<p>Filling preposition-based templates to capture information from medical abstracts.</p>
				</title>
				<aug>
					<au>
						<snm>Leroy</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Chen</snm>
						<fnm>H</fnm>
					</au>
				</aug>
				<source>Pac Symp Biocomput</source>
				<pubdate>2002</pubdate>
				<fpage>350</fpage>
				<lpage>361</lpage>
				<xrefbib>
					<pubid idtype="pmpid">11928489</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B60">
				<title>
					<p>GeneScene</p>
				</title>
				<url>http://genescene.arizona.edu/index.html</url>
			</bibl>
			<bibl id="B61">
				<title>
					<p>SAWTED: structure assignment with text description-enhanced detection of remote homologues with automated SWISS-PROT annotation comparisons.</p>
				</title>
				<aug>
					<au>
						<snm>MacCallum</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Kelley</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Sternberg</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2000</pubdate>
				<volume>16</volume>
				<fpage>125</fpage>
				<lpage>129</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/16.2.125</pubid>
						<pubid idtype="pmpid" link="fulltext">10842733</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B62">
				<title>
					<p>SAWTED</p>
				</title>
				<url>http://www.sbg.bio.ic.ac.uk/~sawted</url>
			</bibl>
			<bibl id="B63">
				<title>
					<p>MedBlast: searching articles related to a biological sequence.</p>
				</title>
				<aug>
					<au>
						<snm>Tu</snm>
						<fnm>Q</fnm>
					</au>
					<au>
						<snm>Tang</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Ding</snm>
						<fnm>D</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2004</pubdate>
				<volume>20</volume>
				<fpage>75</fpage>
				<lpage>77</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/btg375</pubid>
						<pubid idtype="pmpid" link="fulltext">14693811</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B64">
				<title>
					<p>MedBlast</p>
				</title>
				<url>http://medblast.sibsnet.org</url>
			</bibl>
			<bibl id="B65">
				<title>
					<p>FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes.</p>
				</title>
				<aug>
					<au>
						<snm>Al-Shahrour</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Diaz-Uriarte</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Dopazo</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2004</pubdate>
				<volume>20</volume>
				<fpage>578</fpage>
				<lpage>580</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/btg455</pubid>
						<pubid idtype="pmpid" link="fulltext">14990455</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B66">
				<title>
					<p>A literature-based method for assessing the functional coherence of a gene group.</p>
				</title>
				<aug>
					<au>
						<snm>Raychaudhuri</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Altman</snm>
						<fnm>R</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2003</pubdate>
				<volume>19</volume>
				<fpage>396</fpage>
				<lpage>401</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/btg002</pubid>
						<pubid idtype="pmpid" link="fulltext">12584126</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B67">
				<title>
					<p>Expression profiles and biological function.</p>
				</title>
				<aug>
					<au>
						<snm>Oliveros</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Blaschke</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Herrero</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Dopazo</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Valencia</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>Genome Inform Ser Workshop Genome Inform</source>
				<pubdate>2000</pubdate>
				<volume>11</volume>
				<fpage>106</fpage>
				<lpage>117</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">11700592</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B68">
				<title>
					<p>DNA Array Analysis with Geisha</p>
				</title>
				<url>http://www.pdg.cnb.uam.es/blaschke/cgi-bin/geisha</url>
			</bibl>
			<bibl id="B69">
				<title>
					<p>MedMiner: an Internet text-mining tool for biomedical information, with application to gene expression profiling.</p>
				</title>
				<aug>
					<au>
						<snm>Tanabe</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Scherf</snm>
						<fnm>U</fnm>
					</au>
					<au>
						<snm>Smith</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Lee</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Hunter</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Weinstein</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Biotechniques</source>
				<pubdate>1999</pubdate>
				<volume>27</volume>
				<fpage>1210</fpage>
				<lpage>1217</lpage>
				<xrefbib>
					<pubid idtype="pmpid">10631500</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B70">
				<title>
					<p>MedMiner</p>
				</title>
				<url>http://discover.nci.nih.gov/textmining/main.jsp</url>
			</bibl>
			<bibl id="B71">
				<title>
					<p>GeneCards</p>
				</title>
				<url>http://bioinformatics.weizmann.ac.il/cards</url>
			</bibl>
			<bibl id="B72">
				<title>
					<p>A literature network of human genes for high-throughput analysis of gene expression.</p>
				</title>
				<aug>
					<au>
						<snm>Jenssen</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Laegreid</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Komorowski</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Hovig</snm>
						<fnm>E</fnm>
					</au>
				</aug>
				<source>Nat Genet</source>
				<pubdate>2001</pubdate>
				<volume>28</volume>
				<fpage>21</fpage>
				<lpage>28</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/88213</pubid>
						<pubid idtype="pmpid" link="fulltext">11326270</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B73">
				<title>
					<p>PubGene</p>
				</title>
				<url>http://www.pubgene.org</url>
			</bibl>
			<bibl id="B74">
				<title>
					<p>PubMatrix: a tool for multiplex literature mining.</p>
				</title>
				<aug>
					<au>
						<snm>Becker</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Hosack</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Dennis</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Lempicki</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Bright</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Cheadle</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Engel</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>BMC Bioinformatics</source>
				<pubdate>2003</pubdate>
				<volume>4</volume>
				<fpage>61</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">317283</pubid>
						<pubid idtype="pmpid" link="fulltext">14667255</pubid>
						<pubid idtype="doi">10.1186/1471-2105-4-61</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B75">
				<title>
					<p>PubMatrix</p>
				</title>
				<url>http://pubmatrix.grc.nia.nih.gov/secure-bin/index.pl</url>
			</bibl>
			<bibl id="B76">
				<title>
					<p>MicroGENIE</p>
				</title>
				<url>http://www.cs.vu.nl/microgenie</url>
			</bibl>
			<bibl id="B77">
				<title>
					<p>GENIA corpus - semantically annotated corpus for biotextmining.</p>
				</title>
				<aug>
					<au>
						<snm>Kim</snm>
						<fnm>JD</fnm>
					</au>
					<au>
						<snm>Ohta</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Tateisi</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Tsujii</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2003</pubdate>
				<volume>19</volume>
				<fpage>i180</fpage>
				<lpage>i182</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/btg1023</pubid>
						<pubid idtype="pmpid" link="fulltext">12855455</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B78">
				<title>
					<p>Evaluation of text data mining for database curation: lessons learned from the KDD Chal lenge Cup.</p>
				</title>
				<aug>
					<au>
						<snm>Yeh</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Hirschman</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Morgan</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2003</pubdate>
				<volume>19 (Supp11)</volume>
				<fpage>i331</fpage>
				<lpage>i339</lpage>
				<xrefbib>
					<pubid idtype="doi">10.1093/bioinformatics/btg1046</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B79">
				<title>
					<p>TREC GENOMICS track overview.</p>
				</title>
				<aug>
					<au>
						<snm>Hersh</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Bhupatiraju</snm>
						<fnm>R</fnm>
					</au>
				</aug>
				<source>Proceedings of the Twelfth Text Retrieval Conference 18-21 November Gaithersburg</source>
				<publisher>Gaithers burg: National Institute of Standards and Technology</publisher>
				<editor>Voorhees EM, Buckland LP</editor>
				<pubdate>2003</pubdate>
				<fpage>14</fpage>
				<lpage>24</lpage>
			</bibl>
			<bibl id="B80">
				<title>
					<p>TREC Genomics Trach</p>
				</title>
				<url>http://ir.ohsu.edu/genomics</url>
			</bibl>
			<bibl id="B81">
				<title>
					<p>BioRAT: extracting biological information from full-length papers.</p>
				</title>
				<aug>
					<au>
						<snm>Corney</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Buxton</snm>
						<fnm>BF</fnm>
					</au>
					<au>
						<snm>Langdon</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Jones</snm>
						<fnm>D</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2004</pubdate>
				<volume>20</volume>
				<fpage>3206</fpage>
				<lpage>3213</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/bth386</pubid>
						<pubid idtype="pmpid" link="fulltext">15231534</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B82">
				<title>
					<p>BioRAT</p>
				</title>
				<url>http://bioinf.cs.ucl.ac.uk/biorat</url>
			</bibl>
			<bibl id="B83">
				<title>
					<p>The potential use of SUISEKI as a protein interaction discovery tool.</p>
				</title>
				<aug>
					<au>
						<snm>Blaschke</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Valencia</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>Genome Inform Ser Workshop Genome Inform</source>
				<pubdate>2001</pubdate>
				<volume>12</volume>
				<fpage>123</fpage>
				<lpage>134</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">11791231</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B84">
				<title>
					<p>Using ARROWSMITH: a computer-assisted approach to formulating and assessing scientific hypotheses.</p>
				</title>
				<aug>
					<au>
						<snm>Smalheiser</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Swanson</snm>
						<fnm>D</fnm>
					</au>
				</aug>
				<source>Comput Methods Programs Biomed</source>
				<pubdate>1998</pubdate>
				<volume>57</volume>
				<fpage>149</fpage>
				<lpage>153</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0169-2607(98)00033-9</pubid>
						<pubid idtype="pmpid" link="fulltext">9822851</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B85">
				<title>
					<p>ARROWSMITH</p>
				</title>
				<url>http://kiwi.uchicago.edu/</url>
			</bibl>
			<bibl id="B86">
				<title>
					<p>Literature-based disease candidate gene discovery.</p>
				</title>
				<aug>
					<au>
						<snm>Hristovski</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Peterlin</snm>
						<fnm>B</fnm>
					</au>
				</aug>
				<source>Proceedings of Medinfo 2004</source>
				<publisher>Bethesda: American Medical Informatics Association</publisher>
				<editor>Fieschi M</editor>
				<pubdate>2004</pubdate>
				<fpage>1649</fpage>
			</bibl>
			<bibl id="B87">
				<title>
					<p>BITOLA - Biomedical Discovery Support System</p>
				</title>
				<url>http://www.mf.uni-lj.si/bitola</url>
			</bibl>
			<bibl id="B88">
				<title>
					<p>Heuristics for identification of acronym-definition patterns within text: towards an automated construction of comprehensive acronym-definition dictionaries.</p>
				</title>
				<aug>
					<au>
						<snm>Wren</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Garner</snm>
						<fnm>H</fnm>
					</au>
				</aug>
				<source>Methods Inf Med</source>
				<pubdate>2002</pubdate>
				<volume>41</volume>
				<fpage>426</fpage>
				<lpage>434</lpage>
				<xrefbib>
					<pubid idtype="pmpid">12501816</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B89">
				<title>
					<p>ARGH - Biomedical Acronym Resolver</p>
				</title>
				<url>http://invention.swmed.edu/argh</url>
			</bibl>
			<bibl id="B90">
				<title>
					<p>Relationship Extractor</p>
				</title>
				<url>http://www-personal.engin.umich.edu/~murthyr/Relationship_Extractor.html</url>
			</bibl>
			<bibl id="B91">
				<title>
					<p>XplorMed: a tool for exploring MEDLINE abstracts.</p>
				</title>
				<aug>
					<au>
						<snm>Perez-lratxeta</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Bork</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Andrade</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Trends biochem Sci</source>
				<pubdate>2001</pubdate>
				<volume>26</volume>
				<fpage>573</fpage>
				<lpage>575</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0968-0004(01)01926-0</pubid>
						<pubid idtype="pmpid" link="fulltext">11551795</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B92">
				<title>
					<p>XplorMed</p>
				</title>
				<url>http://www.bork.embl-heidelberg.de/xplormed</url>
			</bibl>
			<bibl id="B93">
				<title>
					<p>Scopus</p>
				</title>
				<url>http://www.scopus.com/scopus/home.url</url>
			</bibl>
		</refgrp>
	</bm>
</art>
