<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art><ui>1471-2105-12-188</ui><ji>1471-2105</ji><fm>
<dochead>Research article</dochead>
<bibl>
<title>
<p>The biomedical discourse relation bank</p>
</title>
<aug>
<au id="A1"><snm>Prasad</snm><fnm>Rashmi</fnm><insr iid="I1"/><email>rjprasad@seas.upenn.edu</email></au>
<au id="A2"><snm>McRoy</snm><fnm>Susan</fnm><insr iid="I4"/><email>mcroy@tigger.cs.uwm.edu</email></au>
<au id="A3"><snm>Frid</snm><fnm>Nadya</fnm><insr iid="I3"/><email>nadyafrid@yahoo.com</email></au>
<au id="A4"><snm>Joshi</snm><fnm>Aravind</fnm><insr iid="I1"/><insr iid="I2"/><email>joshi@seas.upenn.edu</email></au>
<au ca="yes" id="A5"><snm>Yu</snm><fnm>Hong</fnm><insr iid="I3"/><insr iid="I4"/><email>hongyu@uwm.edu</email></au>
</aug>
<insg>
<ins id="I1"><p>Institute for Research in Cognitive Science, University of Pennsylvania, 3401 Walnut Street, Philadelphia, PA 19104, USA</p></ins>
<ins id="I2"><p>Department of Computer and Information Science, University of Pennsylvania, 3330 Walnut Street, Philadelphia, PA 19104, USA</p></ins>
<ins id="I3"><p>Department of Health Sciences, University of Wisconsin-Milwaukee, P.O. Box 413, Milwaukee, WI 53201, USA</p></ins>
<ins id="I4"><p>Department of Electrical Engineering and Computer Science, University of Wisconsin-Milwaukee, P.O. Box 784, Milwaukee, WI 53201, USA</p></ins>
</insg>
<source>BMC Bioinformatics</source>
<issn>1471-2105</issn>
<pubdate>2011</pubdate>
<volume>12</volume>
<issue>1</issue>
<fpage>188</fpage>
<url>http://www.biomedcentral.com/1471-2105/12/188</url>
<xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-12-188</pubid><pubid idtype="pmpid">21605399</pubid></pubidlist></xrefbib>
</bibl>
<history><rec><date><day>14</day><month>10</month><year>2010</year></date></rec><acc><date><day>23</day><month>5</month><year>2011</year></date></acc><pub><date><day>23</day><month>5</month><year>2011</year></date></pub></history>
<cpyrt><year>2011</year><collab>Prasad et al; licensee BioMed Central Ltd.</collab><note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
<abs>
<sec>
<st>
<p>Abstract</p>
</st>
<sec>
<st>
<p>Background</p>
</st>
<p>Identification of discourse relations, such as causal and contrastive relations, between situations mentioned in text is an important task for biomedical text-mining. A biomedical text corpus annotated with discourse relations would be very useful for developing and evaluating methods for biomedical discourse processing. However, little effort has been made to develop such an annotated resource.</p>
</sec>
<sec>
<st>
<p>Results</p>
</st>
<p>We have developed the Biomedical Discourse Relation Bank (BioDRB), in which we have annotated explicit and implicit discourse relations in 24 open-access full-text biomedical articles from the GENIA corpus. Guidelines for the annotation were adapted from the Penn Discourse TreeBank (PDTB), which has discourse relations annotated over open-domain news articles. We introduced new conventions and modifications to the sense classification. We report reliable inter-annotator agreement of over 80% for all sub-tasks. Experiments for identifying the sense of explicit discourse connectives show the connective itself as a highly reliable indicator for coarse sense classification (accuracy 90.9% and F1 score 0.89). These results are comparable to results obtained with the same classifier on the PDTB data. With more refined sense classification, there is degradation in performance (accuracy 69.2% and F1 score 0.28), mainly due to sparsity in the data. The size of the corpus was found to be sufficient for identifying the sense of explicit connectives, with classifier performance stabilizing at about 1900 training instances. Finally, the classifier performs poorly when trained on PDTB and tested on BioDRB (accuracy 54.5% and F1 score 0.57).</p>
</sec>
<sec>
<st>
<p>Conclusion</p>
</st>
<p>Our work shows that discourse relations can be reliably annotated in biomedical text. Coarse sense disambiguation of explicit connectives can be done with high reliability by using just the connective as a feature, but more refined sense classification requires either richer features or more annotated data. The poor performance of a classifier trained in the open domain and tested in the biomedical domain suggests significant differences in the semantic usage of connectives across these domains, and provides robust evidence for a biomedical sublanguage for discourse and the need to develop a specialized biomedical discourse annotated corpus. The results of our cross-domain experiments are consistent with related work on identifying connectives in BioDRB.</p>
</sec>
</sec>
</abs>
</fm><bdy>
<sec>
<st>
<p>Background</p>
</st>
<p>Biomedical literature is a rich resource of biomedical knowledge. The desire to retrieve, organize, and extract biomedical knowledge from literature and then analyze the knowledge has boosted research in biomedical text mining. As described in recent reviews <abbrgrp>
<abbr bid="B1">1</abbr>
<abbr bid="B2">2</abbr>
<abbr bid="B3">3</abbr>
<abbr bid="B4">4</abbr>
</abbrgrp>, the past 10 years have shown significant research developments in named entity recognition <abbrgrp>
<abbr bid="B5">5</abbr>
<abbr bid="B6">6</abbr>
<abbr bid="B7">7</abbr>
</abbrgrp>, relation extraction <abbrgrp>
<abbr bid="B8">8</abbr>
<abbr bid="B9">9</abbr>
</abbrgrp>, information retrieval <abbrgrp>
<abbr bid="B10">10</abbr>
<abbr bid="B11">11</abbr>
</abbrgrp>, hypothesis generation <abbrgrp>
<abbr bid="B12">12</abbr>
</abbrgrp>, summarization <abbrgrp>
<abbr bid="B13">13</abbr>
<abbr bid="B14">14</abbr>
<abbr bid="B15">15</abbr>
<abbr bid="B16">16</abbr>
</abbrgrp>, multimedia <abbrgrp>
<abbr bid="B17">17</abbr>
<abbr bid="B18">18</abbr>
<abbr bid="B19">19</abbr>
<abbr bid="B20">20</abbr>
<abbr bid="B21">21</abbr>
</abbrgrp>, and question answering <abbrgrp>
<abbr bid="B22">22</abbr>
<abbr bid="B23">23</abbr>
</abbrgrp>. Garzone and Mercer <abbrgrp>
<abbr bid="B24">24</abbr>
<abbr bid="B25">25</abbr>
</abbrgrp> and Mercer and DiMarco <abbrgrp>
<abbr bid="B26">26</abbr>
</abbrgrp> have explored how to connect a citing paper and the work cited. Light et al <abbrgrp>
<abbr bid="B27">27</abbr>
</abbrgrp> have identified the use of speculative language in biomedical text. Wilbur et al. <abbrgrp>
<abbr bid="B28">28</abbr>
<abbr bid="B29">29</abbr>
</abbrgrp> defined five qualitative dimensions (i.e., <it>focus, polarity, certainty, evidence </it>and <it>directionality</it>) for categorizing the intention of a sentence.</p>
<p>Looking at larger units of text, Mullen et al. <abbrgrp>
<abbr bid="B30">30</abbr>
</abbrgrp> and Yu et al. <abbrgrp>
<abbr bid="B20">20</abbr>
<abbr bid="B31">31</abbr>
</abbrgrp> defined discourse zones of biomedical text including <it>introduction, method, result</it>, and <it>conclusion</it>, and developed supervised machine-learning approaches to automatically classify a sentence into the rhetorical zone category. Biber and Jones <abbrgrp>
<abbr bid="B32">32</abbr>
</abbrgrp> adapted unsupervised TextTiling methods <abbrgrp>
<abbr bid="B33">33</abbr>
</abbrgrp> to segment biomedical text into different discourse units on the basis of lexical similarities among the units. "BioContrasts" <abbrgrp>
<abbr bid="B34">34</abbr>
</abbrgrp> is an information extraction system that extracts contrastive information between proteins from texts on the basis of manually curated rules and regular expressions that focus on <it>negation </it>as an expression of contrast. Castano et al. <abbrgrp>
<abbr bid="B35">35</abbr>
</abbrgrp> built a system for anaphora resolution in biomedical literature. Szarvas et al <abbrgrp>
<abbr bid="B36">36</abbr>
</abbrgrp> annotated negation, speculation and scope in biomedical text. Agarwal and Yu <abbrgrp>
<abbr bid="B37">37</abbr>
<abbr bid="B38">38</abbr>
</abbrgrp> have investigated the detection of hedges, negation, and their scopes in biomedical literature.</p>
<p>One important output of this research on biomedical text has been the creation of new annotated resources specific to the biomedical domain. For example, the GENIA corpus is a collection of biomedical literature, annotated with various levels of linguistic and semantic information, including coreference <abbrgrp>
<abbr bid="B39">39</abbr>
</abbrgrp>. The ART corpus <abbrgrp>
<abbr bid="B40">40</abbr>
<abbr bid="B41">41</abbr>
</abbrgrp> contains sentence-wise annotations of scientific papers (covering topics in physical chemistry and biochemistry) with core scientific concepts (e.g. <it>goal, hypothesis, experiment, method, result, conclusion, motivation, observation</it>). These resources are valuable because they can be used to evaluate the effectiveness of text-mining methods developed for the biomedical domain. They can also be used to evaluate whether methods developed for the open domain can generalize to biomedical literature, which then determines whether new biomedical-specific training data needs to be created.</p>
<p>To date, there has been little work on processing or annotating <it>discourse relations </it>in biomedical text. A <it>discourse </it>is considered to be a coherent sequence of clauses, sentences or propositions. <it>Discourse relations</it>, such as causal, temporal, and contrastive relations, are relations between eventualities and propositions mentioned in a text, from which we can draw deep or complex inferences about the text. Often, discourse relations are realized in text by explicit words and phrases, called <it>discourse connectives</it>, but they can also be implicit.</p>
<p>Many tasks, including question answering and information extraction, require one to retrieve and process information that spans more than a single sentence while also recognizing discourse relations that exist between sentences. For instance, in Example (1), queries related to the "conflicting interactions of MRL631 with &#947;-secretase" can only be answered accurately once the contrastive discourse relation, expressed with the connective <it>however</it>, between the two sentences is identified.</p>
<p indent="1">(1) Our studies suggest that MRL631 is not able to access intracellular <it>&#947;</it>-secretase for APP processing and APP traffoiking. <ul>However</ul>, it interacts with <it>&#947;</it>-secretase residing at the cell surface for Notch processing. From <abbrgrp>
<abbr bid="B42">42</abbr>
</abbrgrp>.</p>
<p>Causal and justification relations also constitute a very important part of the knowledge dealt with in information extraction, and are often expressed across sentences: for instance, the connective <it>therefore </it>in Example (2) signals a justification relation between the first two sentences, i.e, the fact that "there is the presence of a major 90-to-100-kDa protein of unknown sequence in both the rat otoconia and the Xenopus utricular (calcitic) otoconia" is the reason for believing that "calcitic otoconia contain a similar 90-to-100-kDa protein."</p>
<p indent="1">(2) In both the rat otoconia and the Xenopus utricular (calcitic) otoconia, the presence of a major 90-to 100-kDa protein of unknown sequence has been reported <abbrgrp>
<abbr bid="B3">3</abbr>
</abbrgrp>. <ul>Therefore</ul>, calcitic otoconia probably contain a similar 90- to 100-kDa major protein, regardless of the species. <ul>In contrast</ul>, the Xenopus saccular (aragonitic) otoconia contain a major 22-kDa protein (otoconin-22) <abbrgrp>
<abbr bid="B5">5</abbr>
</abbrgrp>, which is a sPLA<sub>2</sub>-related 127-aa glycoprotein with two N-glycosylation sites. From <abbrgrp>
<abbr bid="B43">43</abbr>
</abbrgrp>.</p>
<p>Discourse relations can also be useful for categorizing citations and the relations between citations to enhance information retrieval: the connective <it>in contrast </it>in Example (2) signals a contrast relation between two cited articles, "3" and "5", mentioned in two different sentences.</p>
<p>Although the discourse relations in the examples above are explicitly expressed in the text by a discourse connective, this is not always the case. Discourse relations can also be implicit between sentences. In Example (3), for instance, a causal relation is inferred between the two sentences, i.e., "the overproduction of numerous cytokines in the synovial membrane" is inferred as being the result of "the membrane having an infiltrate of a variety of inflammatory cells." However, there is no explicit connective (e.g., <it>as a result</it>, or <it>so</it>) to express this relation.</p>
<p indent="1">(3) The synovial membrane of rheumatoid arthritis (RA) is characterized by an infiltrate of a variety of inflammatory cells, such as lymphocytes, macrophages, and dendritic cells, together with proliferation of synovial fibroblast-like cells. Numerous cytokines are overproduced in the inflamed joint.</p>
<p>The challenge of processing discourse relations involves several subtasks, which have been tackled in the open (non-specialized) domain.</p>
<p indent="1">
<it>&#8226; Identifying discourse connectives</it>. Many of the lexical items that can function as explicit connectives also have other non-connective functions <abbrgrp>
<abbr bid="B44">44</abbr>
<abbr bid="B45">45</abbr>
</abbrgrp>. Thus, connectives need to be functionally disambiguated.</p>
<p indent="1">
<it>&#8226; Identifying the arguments of discourse connectives</it>. In addition to identifying the connectives themselves, it is also important to accurately identify the two situations (called <it>arguments</it>) that the connectives relate, since they are not necessarily adjacent to each other <abbrgrp>
<abbr bid="B46">46</abbr>
<abbr bid="B47">47</abbr>
<abbr bid="B48">48</abbr>
<abbr bid="B49">49</abbr>
<abbr bid="B50">50</abbr>
</abbrgrp>).</p>
<p indent="1">
<it>&#8226; Identifying the senses (i.e., semantics) of the relation</it>. While detecting the senses of explicit connectives has met with a good degree of success <abbrgrp>
<abbr bid="B44">44</abbr>
<abbr bid="B51">51</abbr>
<abbr bid="B52">52</abbr>
</abbrgrp>, owing to the observation that explicit connectives are not very ambiguous, implicit relations, on the other hand, have proved to be much more challenging <abbrgrp>
<abbr bid="B53">53</abbr>
<abbr bid="B54">54</abbr>
<abbr bid="B55">55</abbr>
<abbr bid="B56">56</abbr>
<abbr bid="B57">57</abbr>
<abbr bid="B58">58</abbr>
</abbrgrp>.</p>
<p indent="1">
<it>&#8226; Deriving Composite Discourse Structures</it>. Once the elementary relation structures (i.e., a relation and its two arguments) have been identified, the task of combining these elementary structures into more complex structures has important ramifications for tasks such as summarization <abbrgrp>
<abbr bid="B59">59</abbr>
</abbrgrp>.</p>
<p>The largest effort at annotating discourse relations is the Penn Discourse Treebank, or the PDTB <abbrgrp>
<abbr bid="B49">49</abbr>
</abbrgrp>, which contains annotations of discourse relations on the open-domain Wall Street Journal corpus <abbrgrp>
<abbr bid="B60">60</abbr>
</abbrgrp>. To facilitate discourse processing research in the biomedical domain, we have adopted the PDTB framework to annotate discourse relations, their arguments, and their senses in biomedical literature. The corpus we have selected is a 24-article subset of the GENIA corpus <abbrgrp>
<abbr bid="B39">39</abbr>
</abbrgrp>, which is a collection of articles from the biomedical literature. It has been compiled and annotated within the scope of the GENIA project, and the 24 articles (with a total of approx. 112000 word tokens and approx. 5000 sentences) that form our <b>Biomedical Discourse Relation Bank (BioDRB) </b>have also been annotated for coreference relations and citation relations <abbrgrp>
<abbr bid="B61">61</abbr>
</abbrgrp>.</p>
<p>In this article, we describe our work towards the creation of the BioDRB. We show that the PDTB framework can be successfully adapted to the biomedical domain, and that discourse relations can be reliably annotated. We present classification experiments for sense disambiguation of explicit connectives, showing that the BioDRB sense classifier performs as well as the PDTB classifier. We also present experiments to show that the current size of the BioDRB corpus may be sufficient for this task. Finally, we explored whether NLP methods developed using the PDTB can be generalized to the biomedical domain. For the same task of explicit connective sense detection, we show that a classifier trained on the PDTB performs poorly on BioDRB. These results highlight the discourse-level differences between the open domain and the biomedical domain, and support the need for developing a specialized corpus of biomedical texts annotated with discourse relations. The results of our cross-domain experiments are consistent with our related work on identifying connectives in the BioDRB <abbrgrp>
<abbr bid="B45">45</abbr>
</abbrgrp>.</p>
</sec>
<sec>
<st>
<p>Methods</p>
</st>
<p>For annotating discourse relations in biomedical literature, we adapted the annotation framework of the Penn Discourse TreeBank (PDTB) <abbrgrp>
<abbr bid="B49">49</abbr>
</abbrgrp>. The PDTB <url>http://www.seas.upenn.edu/~pdtb</url> annotates the argument structure, semantics, and attribution of discourse relations and their arguments over the 1 million word Wall Street Journal portion of the Penn Treebank <abbrgrp>
<abbr bid="B60">60</abbr>
</abbrgrp>. It follows a lexically-grounded approach to discourse structure <abbrgrp>
<abbr bid="B62">62</abbr>
<abbr bid="B63">63</abbr>
</abbrgrp>. A discourse relation is defined as a strictly binary, informational relation between abstract objects (AOs) mentioned in a text, such as events, states, and propositions <abbrgrp>
<abbr bid="B64">64</abbr>
</abbrgrp>. By convention, the two AO arguments are called Arg1 and Arg2, with Arg2 as the argument syntactically bound to the connective, and Arg1 as the other argument. Discourse connectives are words or phrases used to express discourse relations in text, and in the PDTB, they are drawn from three well-defined syntactic classes: subordinating conjunctions (e.g., <it>because, when, since, although</it>), coordinating conjunctions (e.g., <it>but, or, nor</it>) and adverbials (e.g., <it>however, otherwise, then, as a result, for example</it>). Example (4) shows the causal connective <it>because </it>and its two arguments. (Throughout this paper, phrases expressing discourse relations are underlined, Arg1 appears in italics, Arg2 appears in boldface, and the sense is provided in parentheses at the end of the example.) Also annotated in the PDTB are implicit discourse relations between adjacent sentences, for which annotation involves insertion of a connective that best expresses the relation, and other explicit expressions (called <it>alternative lexicalizations</it>) of relations that do not belong to the pre-defined syntactic classes. For sense classification, a three-tier hierarchical scheme was developed for the PDTB, from which one or more labels are selected for each relation. Attribution, which is also annotated in the PDTB, is not handled currently in BioDRB.</p>
<p indent="1">(4) <it>She hasn't played any music </it>
<ul>since</ul>
<b>the earthquake hit</b>. (Temporal:Succession)</p>
<p>PDTB contains 100 distinct types of discourse connectives. Of the total 40,600 tokens in the corpus, 19053 are realized by explicit expressions, either connectives or alternative lexicalizations. Over the years, the PDTB research group has developed an effective set of discourse annotation tools, guidelines, work flows, and validation methodologies that we have used as a basis for our work.</p>
<p>The PTDB annotation framework has several important advantages over alternative approaches. First, the framework focuses on identifying individual relations and their arguments, which are important for text mining, while remaining neutral on the higher-level discourse organization. This is important because there is little agreement among researchers on the specification of the most descriptively adequate data structure for representing discourse <abbrgrp>
<abbr bid="B65">65</abbr>
</abbrgrp>. The structures proposed so far range from tree structures (e.g., Rhetorical Structure Theory (RST) <abbrgrp>
<abbr bid="B66">66</abbr>
</abbrgrp>, Linguistic Discourse Model (LDM) <abbrgrp>
<abbr bid="B67">67</abbr>
</abbrgrp>, and RST-based binary trees <abbrgrp>
<abbr bid="B68">68</abbr>
</abbrgrp> to more complex forms that incorporate multiple inheritance (D-LTAG <abbrgrp>
<abbr bid="B63">63</abbr>
</abbrgrp> and Segmented Discourse Representation Theory (SDRT) <abbrgrp>
<abbr bid="B69">69</abbr>
</abbrgrp>), to full-fledged graphs (Discourse Graphbank <abbrgrp>
<abbr bid="B70">70</abbr>
</abbrgrp>). The PDTB is, therefore, a particularly attractive framework since it aims to remain neutral with respect to higher-level discourse organization, and instead focuses on annotating the more local discourse relations. Higher-level structures in this approach are left to "emerge" from the annotations of low-level relations. Some recent investigations on the combinatorial possibilities of discourse relations in the PDTB suggests that directed acyclic graphs (DAGs), and not trees, may be the most appropriate structural representation for discourse <abbrgrp>
<abbr bid="B71">71</abbr>
<abbr bid="B72">72</abbr>
</abbrgrp>.</p>
<p>Second, discourse relations in the PDTB are lexically anchored, for both explicit and implicit connectives. In the latter case, annotators "insert" a connective expression to express the implicit relation, and then proceed to annotate the sense of the inserted connective. Such a lexically-grounded approach substantially increases the inter-annotator agreement <abbrgrp>
<abbr bid="B73">73</abbr>
</abbrgrp>, as confirmed in our pilot annotation study <abbrgrp>
<abbr bid="B74">74</abbr>
<abbr bid="B75">75</abbr>
</abbrgrp>.</p>
<p>Finally, since its release, the PDTB has been successfully used by many researchers for both linguistic and computational studies <abbrgrp>
<abbr bid="B44">44</abbr>
<abbr bid="B46">46</abbr>
<abbr bid="B47">47</abbr>
<abbr bid="B48">48</abbr>
<abbr bid="B50">50</abbr>
<abbr bid="B51">51</abbr>
<abbr bid="B52">52</abbr>
<abbr bid="B54">54</abbr>
<abbr bid="B55">55</abbr>
<abbr bid="B56">56</abbr>
<abbr bid="B57">57</abbr>
<abbr bid="B71">71</abbr>
<abbr bid="B72">72</abbr>
<abbr bid="B76">76</abbr>
<abbr bid="B77">77</abbr>
<abbr bid="B78">78</abbr>
<abbr bid="B79">79</abbr>
<abbr bid="B80">80</abbr>
<abbr bid="B81">81</abbr>
<abbr bid="B82">82</abbr>
<abbr bid="B83">83</abbr>
<abbr bid="B84">84</abbr>
</abbrgrp>, which shows that there is much to be gained from adopting this approach. The PDTB framework has also been adopted for discourse annotation in other languages (e.g., Turkish <abbrgrp>
<abbr bid="B85">85</abbr>
</abbrgrp>, Hindi <abbrgrp>
<abbr bid="B86">86</abbr>
<abbr bid="B87">87</abbr>
</abbrgrp>, Chinese <abbrgrp>
<abbr bid="B88">88</abbr>
</abbrgrp>, Czech <abbrgrp>
<abbr bid="B89">89</abbr>
</abbrgrp> and Italian <abbrgrp>
<abbr bid="B90">90</abbr>
</abbrgrp>) as well as other domains such as conversational dialogues <abbrgrp>
<abbr bid="B90">90</abbr>
</abbrgrp>.</p>
</sec>
<sec>
<st>
<p>Results and Discussion</p>
</st>
<sec>
<st>
<p>Biomedical Discourse Relation Bank: BioDRB</p>
</st>
<p>In the BioDRB, we have annotated all explicit and implicit discourse relations, the arguments of discourse relations, and the senses of discourse relations. In keeping with the theory-neutral approach of PDTB, we annotate only individual relations and do not attempt to show dependencies across relations. We have adapted the PDTB guidelines to better incorporate discourse-level features specific to biomedical texts. Here we present some salient aspects of the BioDRB annotation guidelines. Further details are provided in the complete documentation of the guidelines <abbrgrp>
<abbr bid="B91">91</abbr>
</abbrgrp>, available from <url>http://spring.ims.uwm.edu/uploads/biodrb_guidelines.pdf</url>
</p>
<sec>
<st>
<p>Discourse Relations and their Realization</p>
</st>
<p>Discourse relations in the BioDRB are first broadly classified in terms of their manner of realization. There are four types of relations:</p>
<p indent="1">(a) Relations realized by <it>Explicit discourse connectives</it>,</p>
<p indent="1">(b) <it>Implicit </it>relations,</p>
<p indent="1">(c) Relations realized by <it>alternatively lexicalized </it>expressions (AltLex),</p>
<p indent="1">(d) Absence of a discourse relation, or <it>No Relation </it>(NoRel).</p>
<p>
<b>Explicit Discourse Connectives </b>are closed-class lexical items drawn from four well-defined syntactic classes: subordinating conjunctions (Example 5), coordinating conjunctions (Example 6), discourse adverbials (Example 7), and subordinators (Example 8). The syntactic classes themselves are not provided as part of the annotation, but were rather used to train the annotators to identify connectives. Arguments of explicit connectives can be identified within the same sentence as the connective, i.e., <it>intra-sententially </it>(Example 5,6,8) or in different sentences, i.e., <it>inter-sententially </it>(Example 7).</p>
<p indent="1">(5) <ul>Because</ul>
<b>RA PBMC include several cell types in addition to T cells</b>, <it>some inflammatory cytokines released from macrophages and other lymphocytes might have affected the production of IL-17 from T cells</it>. (Cause:Reason)</p>
<p indent="1">(6) <it>IL-17 was also detected in the PBMC of patients with osteoarthritis</it>, <ul>but</ul>
<b> their expression levels were much lower than those of RA PBMC</b>. (Concession:Contra-expectation)</p>
<p indent="1">(7) <it>IL-17 production by activated RA PBMC is completely or partly blocked in the presence of the NF-&#954;B inhibitor pyrrolidine dithiocarbamate and the PI3K/Akt inhibitor wortmannin and LY294002, respectively</it>. <ul>However</ul>, <b>inhibition of activator protein-1 and extracellular signal-regulated kinase 1/2 did not affect IL-17 production</b>. (Contrast)</p>
<p indent="1">(8) Recent observations demonstrated that <it>IL-17 can also activate osteoclastic bone resorption </it>
<ul>by</ul>
<b> the induction of RANKL (receptor activator of nuclear factor </b>
<it>&#954;</it>
<b>B [NF-</b>
<it>&#954;</it>
<b>B] ligand), which is involved in bony erosion in RA </b>
<abbrgrp>
<abbr bid="B7">7</abbr>
</abbrgrp>. (Purpose:Enablement)</p>
<p>Annotation of an explicit connective proceeds by first identifying and marking the connective text span, then identifying and annotating the text spans associated with its two arguments, and finally, labeling the sense of the relation. Thus, for Example (5), the following information is annotated:</p>
<p indent="1">&#8226; <it>Relation type: </it>Explicit</p>
<p indent="1">&#8226; <it>Connective span: </it>"Because"</p>
<p indent="1">&#8226; <it>Arg1 span: </it>"some inflammatory cytokines released from macrophages and other lymphocytes might have affected the production of IL-17 from T cells"</p>
<p indent="1">&#8226; <it>Arg2 span: </it>"RA PBMC include several cell types in addition to T cells"</p>
<p indent="1">&#8226; <it>Sense: </it>Cause:Reason</p>
<p>An important task in annotating explicit connectives involves determining whether or not the lexical item in question expresses a discourse relation, i.e., a relation between two abstract objects. Several lexical items that function as discourse connectives have other non-connective functions as well. For instance, <it>also </it>as a discourse connective is used to express the presence of two AO items in a list, as in Example (9). However, <it>also </it>can sometimes be used in a non-list sense, when it is used to imply that something has been "presupposed" <abbrgrp>
<abbr bid="B92">92</abbr>
</abbrgrp>, as in Example (10).</p>
<p indent="1">(9) <it>These data show that ITK is required for IL-2 production induced by SEB in vivo, and may regulate signals leading IL-2 production, in part by regulating phosphorylation of c-jun</it>. <b>The data </b>
<ul>also</ul>
<b> suggest that perturbing T cell activation pathways leading to IL-2 does not necessarily lead to improved responses to SEB toxicity</b>. (Conjunction)</p>
<p indent="1">(10) To determine whether CD123+ cells in synovial tissue were also nuclear RelB+, formalin-fixed tissue was double-stained for RelB and CD123 without hematoxylin counterstaining.</p>
<p>
<b>Implicit Relations </b>are annotated inter-sententially between sentences not related by an explicit connective, and only within paragraphs. If a discourse relation is inferred between the sentences, the annotator must <it>insert </it>a connective that best expresses the inferred relation, then mark the arguments, and finally, assign a sense to the relation. Example (11) shows that the annotator perceived Arg2 as standing in contrast with Arg1, that there is no explicit connective to relate these two arguments, and that the annotator inserted <it>on the other hand </it>as the connective to express the inferred relation.</p>
<p indent="1">(11) <it>Expression of the brca1 mutant in a p21-null background caused little rescue of the cells in the thymus, but provided a recovery in the lymph nodes that was equivalent to that produced in the p53-null background</it>. <ul>Implicit = On the other hand</ul>
</p>
<p indent="1">
<b>Introduction of the brca1 gene in cells carrying an antiapoptotic Bcl2 transgene induced significant rescue of cells in the thymus, but produced little recovery of cells in peripheral (lymph node) compartments</b>. (Contrast)</p>
<p>For implicit relations, it is crucial that the annotator does not perceive any "redundancy" in the expression of the relation after inserting the connective. A redundancy effect would instead lead to the annotation of the AltLex relation type, discussed next.</p>
<p>
<b>Alternative Lexicalizations (AltLex) </b>of relations are also annotated inter-sententially. They are identified when a discourse relation is inferred between sentences not related by an explicit connective, but insertion of a connective to express the implicit relation leads to "redundancy" in the expression of the relation. What such redundancy means is that the relation has in fact been lexicalized, but with an expression that cannot be syntactically classified as an explicit connective. For instance, in Example (12), the situation described by Arg2 is implicitly perceived to be a result of the situation in Arg1, but insertion of an implicit connective such as <it>as a result </it>clearly creates a redundancy. In such cases, the annotator must look for and annotate the "AltLex" expression. In this example, the AltLex is identified with the subject-verb sequence <it>These results suggest</it>. In the annotation, AltLex spans are always fully contained within Arg2 spans. In Example (12), for instance, the underlined AltLex span is also in boldface, showing that it is contained in the Arg2 span.</p>
<p indent="1">(12) <it>As shown in </it>Figure 3a,3b, <it>the intensity of IL-10R1 expression on CD4+ T cells was signicantly increased in RA patients compared with in healthy controls</it>.</p>
<p indent="1">
<b>
<ul>These results suggest</ul> that the intracellular signal transduction pathway of IL-10 may be impaired in CD4+ T cells of active RA</b>. (Cause:Claim)</p>
<p>Syntactically, AltLex expressions are open class lexical items that cannot be defined as explicit connectives <abbrgrp>
<abbr bid="B81">81</abbr>
</abbrgrp>. In particular, while explicit connective expressions are fixed, or lexically invariant, AltLex expressions result from a more productive and compositional process. They often appear as subject-verb sequences (Example 12), although other syntactic patterns are found as well, such as prepositional phrases and verb phrases. Semantically, they are typically composed of two elements - one that denotes the relation, and the other that refers anaphorically to Arg1. In Example (12), the verb <it>suggest </it>denotes the relation, whereas the subject <it>These results </it>refers anaphorically to Arg1.</p>
<p>
<b>No Relation (NoRel) </b>is the type assigned when a sentence does not appear to relate to any other sentence in the prior text. NoRel is annotated in only two specific cases. The first kind of NoRel is annotated within the "Abstract" section of the articles, some of which are partitioned into "Background", "Case Presentation", "Results", "Conclusion", etc. These "Abstract" sections are not separated by any paragraph boundary, but we treat them as such, and indicate these boundaries with the NoRel label. Example (13) illustrates one such NoRel annotation from the "Abstract" section of an article.</p>
<p indent="1">(13) Background: CC Chemokine Receptor 3 (CCR3), the major chemokine receptor expressed on eosinophils, binds promiscuously to several ligands including eotaxins 1, 2, and 3. (...) <it>It is therefore important to elucidate the molecular mechanisms regulating receptor expression</it>. <ul>Implicit = NoRel</ul>
<b> Results: In order to define regions responsible for CCR3 transcription, a DNAse hypersensitive site was identified in the vicinity of exon 1</b>.</p>
<p>The second kind of NoRel was annotated for typological errors that led, for example, to some sentences being duplicated in the article. Since we didn't want to admit a non-semantic repetition relation, these were annotated as NoRel. Such cases are rare in the corpus.</p>
<p>For NoRel, Arg1 and Arg2 are, by convention, the immediately adjacent and complete sentences.</p>
</sec>
<sec>
<st>
<p>Arguments of Discourse Relations</p>
</st>
<p>The smallest syntactic unit for the realization of an AO argument of a discourse relation is a clause, tensed or non-tensed. Verb phrases can also be legal arguments when the connectives are not verb phrase conjunctions themselves. In addition, because we take discourse relations to hold between AOs, nominalizations are allowed (Example 14) as arguments, since they can denote events.</p>
<p indent="1">(14) <it>She was originally considered to be at high risk </it>
<ul>due to</ul>
<b> the familial occurrence of breast and other types of cancer</b>, (Cause:Reason)</p>
<p>There are no syntactic constraints on how many clauses or sentences an argument can contain. Semantically, however, arguments are required to be <it>minimal </it>in that "only as much should be selected as an argument as is necessary for interpreting the relation". Example (15) shows Arg1 as well as Arg2 spanning over multiple sentences for the AltLex generalization relation. However, for both Arg1 and Arg2, all the included sentences are necessary and sufficient because for the generalization relation in question, the specific details as well as the generalization of the details are distributed across exactly these multiple sentences.</p>
<p indent="1">(15) <it>We show here that mice lacking ITK have much reduced IL-2 production and T cell expansion in response to SEB in vitro and in vivo. We also show that SEB induced the activation of the JNK MAPK pathway in responding T cells in vivo, and that ITK null T cells were defective in the activation of this pathway in vivo. However, toxicity analysis indicated that both WT and ITK null animals were similarly affected by SEB exposure</it>. <b>
<ul>Our data suggest that</ul> ITK is required for full IL-2 secretion following SEB exposure, and that this may be due to the regulation of the JNK pathway by ITK in vivo. However, reducing T cell signals does not necessarily lead to better physiological responses to SEB exposure</b>. (Restatement:Generalization)</p>
<p>Finally, except for NoRel, there are also no constraints on how far away a relation's Arg1 and Arg2 arguments can be from each other. That is, they need not be adjacent. Example (16) shows Arg1 and Arg2 in non-adjacent sentences for the explicit connective <it>However</it>. Unlike PDTB, where arguments of implicit relations are required to be adjacent, implicit relations in BioDRB can have non-adjacent arguments.</p>
<p indent="1">(16) The studies concerning the functional interaction between the NF-<it>&#954;</it>B pathway and members of the steroid hormone receptor family, and their role in synovial inflammation, have advanced significantly, <it>although with controversial results </it>
<abbrgrp>
<abbr bid="B10">10</abbr>
<abbr bid="B11">11</abbr>
</abbrgrp>. In particular, after binding with E2, oestrogen receptors have been shown to interact with NF-<it>&#954;</it>B factors, via transcriptional co-factors, resulting in mutual or non-mutual antagonism. Other studies hypothesize that, since oestrogen receptors may repress both constitutive and inducible NF-<it>&#954;</it>B, the overexpression of NF-<it>&#954;</it>B-inducible genes in oestrogen receptor-negative cells might contribute to malignant cell growth and chemotherapeutic resistance <abbrgrp>
<abbr bid="B12">12</abbr>
<abbr bid="B13">13</abbr>
</abbrgrp>. On the contrary, further studies report that E2 blocks the transcriptional activity of p65 in macrophages <abbrgrp>
<abbr bid="B14">14</abbr>
</abbrgrp>. <ul>However</ul>, <b>these opposite observations arise using different cell lines (human/animals) and culture conditions as well as different hormone concentrations </b>
<abbrgrp>
<abbr bid="B15">15</abbr>
</abbrgrp>. ...</p>
</sec>
<sec>
<st>
<p>Senses of Discourse Relations</p>
</st>
<p>All explicit, implicit and AltLex relations are annotated with sense labels that indicate their semantics. Senses are organized in two tiers, with the second <it>subtype </it>tier specifying further refinements to the sense <it>type </it>in the top tier. The complete BioDRB sense classification is shown in Table <tblr tid="T1">1</tblr>.</p>
<tbl id="T1"><title><p>Table 1</p></title><caption><p>BioDRB sense classification for discourse relations</p></caption><tblbdy cols="4">
      <r>
         <c ca="left">
            <p>
               <b>Type</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Subtype</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Type</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Subtype</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="4">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>CAUSE</p>
         </c>
         <c ca="left">
            <p>Reason</p>
         </c>
         <c ca="left">
            <p>CONDITION</p>
         </c>
         <c ca="left">
            <p>Hypothetical</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>Result</p>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>Factual</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>Claim</p>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>Non-Factual</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>Justification</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c cspan="4">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>PURPOSE</p>
         </c>
         <c ca="left">
            <p>Goal</p>
         </c>
         <c ca="left">
            <p>TEMPORAL</p>
         </c>
         <c ca="left">
            <p>Synchronous</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>Enablement</p>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>Precedence</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>Succession</p>
         </c>
      </r>
      <r>
         <c cspan="4">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>CONCESSION</p>
         </c>
         <c ca="left">
            <p>Contra-Expectation</p>
         </c>
         <c ca="left">
            <p>ALTERNATIVE</p>
         </c>
         <c ca="left">
            <p>Chosen-Alternative</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>Expectation</p>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>Conjunctive</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>Disjunctive</p>
         </c>
      </r>
      <r>
         <c cspan="4">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>CONTRAST</p>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>INSTATIATION</p>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c cspan="4">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>CONJUNCTION</p>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>EXCEPTION</p>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c cspan="4">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>SIMILARITY</p>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>CONTINUATION</p>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c cspan="4">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>CIRCUMSTANCE</p>
         </c>
         <c ca="left">
            <p>Forward-Circumstance</p>
         </c>
         <c ca="left">
            <p>BACKGROUND</p>
         </c>
         <c ca="left">
            <p>Forward-Background</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>Backward-Circumstance</p>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>Backward-Background</p>
         </c>
      </r>
      <r>
         <c cspan="4">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>RESTATEMENT</p>
         </c>
         <c ca="left">
            <p>Equivalence</p>
         </c>
         <c ca="left">
            <p>REINFORCEMENT</p>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>Generalization</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>Specification</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
   </tblbdy></tbl>
<p>For any relation, the sense annotation consists of selecting a sense subtype label whenever subtypes are available for a type. Thus, for the "Cause" sense, the annotator is required to select one of its four subtypes, i.e., the type level label cannot be chosen. Type-level labels can only be selected when the sense does not have any subtypes available, for example "Contrast". Refinements at the subtype level are of two kinds. One kind specifies refinements of the semantics, while the other kind specifies the directionality of the arguments. Thus, for example, the three subtypes of the "Condition" sense type specify in more detail the nature of the conditional dependence between Arg1 (antecedent) and Arg2 (consequence), by indicating whether the antecedent describes a hypothetical situation ("Hypothetical"), an assumed fact ("Factual"), or a non-fact ("Non-Factual"). On the other hand, the two subtypes of the "Concession" sense (in which one argument creates an expectation denied by the other argument) indicate the directionality of the concession: In the "Contra-expectation" subtype, Arg1 raises the expectation that Arg2 denies, while in the "Expectation" subtype, Arg2 raises the expectation that Arg1 denies.</p>
<p>With some connectives, more than one sense can be inferred. Annotators are allowed to assign upto two senses to a connective. In Example (17), for instance, two senses are annotated for the connective <it>as</it>: "Temporal:Synchronous" and "Cause:Justification".</p>
<p indent="1">(17) Tumors detected by this new technology could have unique etiologies and/or presentations, <it>and may represent an increasing proportion of clinical practice </it>
<ul>as</ul>
<b>new screening methods are validated and applied</b>. (Temporal:Synchronous/Cause:Justification)</p>
<p>The BioDRB sense classification was adapted from the PDTB sense classification <abbrgrp>
<abbr bid="B93">93</abbr>
</abbrgrp>. Below, we first define the BioDRB senses, and then discuss the major differences with PDTB.</p>
<sec>
<st>
<p>Cause</p>
</st>
<p>The sense type "Cause" is used when the two arguments of the relation are related causally and are not in a conditional relation. There are four subtypes for this sense. "Reason" and "Result" hold when the situation described in one of the arguments is the cause of the situation described in the other argument. They differ from each other only in the directionality of the causality. "Reason" is used when Arg2 is the cause and Arg1 the effect, while "Result" is used when Arg1 is the cause and Arg2 the effect. The other two subtypes, "Claim" and "Justification", hold when the situation described by one of the arguments is the cause, not for the situation described by the other argument, but rather for the truth or validity of the proposition described by the argument. The difference between the two is again in directionality, with "Claim" used when Arg1 presents the evidence for the truth of Arg2, and "Justification" used when Arg2 presents the evidence for the truth of Arg1.</p>
</sec>
<sec>
<st>
<p>Condition</p>
</st>
<p>The sense type "Condition" is used to describe all subtypes of conditional relations. There are three subtypes. The subtype "Hypothetical" holds when if Arg2 holds true, Arg1 is caused to hold at some instant in all possible futures. However, Arg1 can be true in the future independently of Arg2. The subtype "Factual" is a special case of the subtype "Hypothetical", and applies when Arg2 is a situation that has either been presented as a fact in the prior discourse or is believed by somebody other than the speaker/writer. The subtype "NonFactual" applies when Arg2 describes a condition that either does not hold at present or did not hold in the past. Arg1 then describes what would also hold if Arg2 were true. (There were no occurrences of the Non-Factual conditionals in the corpus.)</p>
</sec>
<sec>
<st>
<p>Purpose</p>
</st>
<p>The sense type "Purpose" is used when one argument presents a situation and the other argument presents an action, and the engagement of the action enables the situation to occur. The two subtypes "Goal" and "Enablement" capture difference in directionality: "Goal" applies when Arg1 presents an action that enables the situation in Arg2 to obtain, whereas "Enablement" applies when Arg2 presents an action that enables the situation in Arg1 to obtain.</p>
</sec>
<sec>
<st>
<p>Temporal</p>
</st>
<p>The sense type "Temporal" is used when the events described in the arguments are related temporally. There are three subtypes, which reflect the ordering of the arguments. "Precedence" is used when the Arg1 event precedes the Arg2 event; "Succession" applies when the Arg1 event follows the Arg2 event; and "Synchronous" applies when the Arg1 and Arg2 events overlap.</p>
</sec>
<sec>
<st>
<p>Concession</p>
</st>
<p>The sense type "Concession" applies when one of the arguments describes a situation A that creates an expectation for a situation C, while the other asserts (or implies) &#172;C. Two "Concession" subtypes capture a difference in the roles of the arguments. "Expectation" is used when Arg2 creates an expectation that Arg1 denies, while "Contra-Expectation" is used when Arg1 creates an expectation that Arg2 denies.</p>
</sec>
<sec>
<st>
<p>Contrast</p>
</st>
<p>The sense type "Contrast" is used when the values for some shared property in Arg1 and Arg2 are in opposition to each other. These oppositions need not be at opposite ends of a graded scale and can be context-dependent. There are no subtypes for this sense.</p>
</sec>
<sec>
<st>
<p>Similarity</p>
</st>
<p>The sense type "Similarity" is like "Contrast" in that it involves the comparison of the values for some shared property of Arg1 and Arg2. The compared values in this case are similar to each other (and not in opposition).</p>
</sec>
<sec>
<st>
<p>Alternative</p>
</st>
<p>The sense type "Alternative" is used when the two arguments denote alternative situations. There are three subtypes. The "Conjunctive" subtype is used when both alternatives hold or are possible. The "Disjunctive" subtype is used when two situations are evoked in the discourse but only one of them holds. The "Chosen Alternative" subtype is used when multiple alternatives are evoked in the discourse, and one argument asserts that one of the alternatives was chosen.</p>
</sec>
<sec>
<st>
<p>Instantiation</p>
</st>
<p>The sense type "Instantiation" is used when Arg1 evokes a set and Arg2 instantiates one or more elements of the set. What is evoked may be a set of events, a set of reasons, or a generic set of events, behaviors, attitudes, etc. There are no subtypes for this sense.</p>
</sec>
<sec>
<st>
<p>Restatement</p>
</st>
<p>The sense type "Restatement" is used when the situation described by Arg2 restates the situation described by Arg1. The three subtypes "Specification", "Generalization", and "Equivalence" further specify the ways in which Arg2 restates Arg1. "Specification" applies when Arg2 describes the situation described in Arg1 in more detail. "Generalization" applies when Arg2 summarizes Arg1, or in some cases expresses a conclusion based on Arg1. "Equivalence" applies when Arg1 and Arg2 describe the same situation from different perspectives. (There are no occurrences of the "Equivalence" sense in the corpus.)</p>
</sec>
<sec>
<st>
<p>Conjunction</p>
</st>
<p>The sense type "Conjunction" is used when Arg1 and Arg2 are members of a list, defined in the prior discourse, explicitly or implicitly. No subtypes are defined for this sense.</p>
</sec>
<sec>
<st>
<p>Exception</p>
</st>
<p>The sense type "Exception" applies when Arg2 specifies an exception to the generalization specified by Arg1. In other words, Arg1 is false because Arg2 is true, but if Arg2 were false, Arg1 would be true. No subtypes are defined for this sense.</p>
</sec>
<sec>
<st>
<p>Reinforcement</p>
</st>
<p>The sense type "Reinforcement" is used when Arg2 is provided as fact to support claims or effects associated with Arg1. No subtypes are defined for this sense.</p>
</sec>
<sec>
<st>
<p>Continuation</p>
</st>
<p>The sense type "Continuation" applies when Arg1 expands the discourse by identifying an entity (concrete or abstract) in Arg1 and saying something about it. Crucially, for this relation, it must be the case that no other discourse relation holds. "Continuation" occurs frequently as an implicit relation, but it can also be associated with the explicit connective <it>and</it>.</p>
</sec>
<sec>
<st>
<p>Circumstance</p>
</st>
<p>The sense type "Circumstance" is used when one argument provides the circumstances under which the situation in the other argument was obtained. No causal relation is implied here. In BioDRB, this relation was introduced specifically to capture the circumstantial relation between an experimental set-up and the observations and results obtained from the experiments. Two subtypes capture difference in directionality. In "Backward Circumstance", Arg1 describes the circumstance and Arg2 describes the resulting situation. In "Forward Circumstance", Arg2 describes the circumstance and Arg1 describes the resulting situation.</p>
</sec>
<sec>
<st>
<p>Background</p>
</st>
<p>The sense type "Background" is used when one argument provides information that is deemed necessary or desirable for interpreting the other argument. Two subtypes capture difference in directionality. In "Backward Background" Arg1 provides the background information for Arg2, while in "Forward Background", Arg2 provides the background information for Arg1. No further subtypes are specified for this sense.</p>
<p>The BioDRB sense classification reflects the following changes from the PDTB classification:</p>
<p indent="1">&#8226; First, in the PDTB, the sense classification consists of three tiers, with four sense classes at the top tier. Three of the four class-level senses in the PDTB (namely, "Contingency", "Temporal", "Comparison", and "Expansion") are eliminated as we felt they were too broadly-defined to be useful. The only class-level sense we retained is "Temporal", but this has been reassigned as a type-level sense in the two-level BioDRB hierarchy.</p>
<p indent="1">&#8226; Second, we have collapsed some of the subtype-level senses. For the "Condition" sense type, for example, we do not maintain the PDTB distinction between the subtypes "Present-Factual" and "Past-Factual", and label both as "Factual". A similar reduction is done for "Non-Factual".</p>
<p indent="1">&#8226; Third, we have introduced some new senses, namely "Purpose", "Similarity", "Continuation", "Background", "Reinforcement". "Continuation" and "Background" are reformulations of the PDTB EntRel (Entity Relation) relation type, whereas "Purpose", "Similarity", and "Reinforcement" are senses that we believe were confounded with other senses in PDTB. For example, "Purpose" relations were annotated as "Result", "Similarity" relations were annotated as "Conjunction", and "Reinforcement" relations were annotated as either "Conjunction" or "Restatement".</p>
<p indent="1">&#8226; Finally, we have eliminated the separate type-level representation of pragmatic senses and have instead listed them as subtypes. These apply to the current subtypes for "Cause", namely "Claim" and "Justification". We did not find instances of the other pragmatic senses listed in PDTB.</p>
<p>Even though the PDTB class-level senses are not used in BioDRB, it is still possible to reconstruct the PDTB sense classes from the BioDRB sense types. This may be important for comparing the performance of NLP methods across the two domains, as we have needed to do for our own experiments on sense disambiguation below. Table <tblr tid="T2">2</tblr> provides the reconstructed generalization of the BioDRB sense types into the four sense classes of PDTB.</p>
<tbl id="T2"><title><p>Table 2</p></title><caption><p>Grouping of BioDRB sense types into PDTB generalized classes</p></caption><tblbdy cols="2">
      <r>
         <c ca="left">
            <p>
               <b>BioDRB Type-level Senses</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>PDTB Class-level Sense</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Concession, Contrast</p>
         </c>
         <c ca="left">
            <p>Comparison</p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Cause, Condition, Purpose</p>
         </c>
         <c ca="left">
            <p>Contingency</p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Temporal</p>
         </c>
         <c ca="left">
            <p>Temporal</p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Alternative, Background, Circumstance, Conjunction, Continuation, Exception, Instantiation, Reinforcement, Restatement, Similarity</p>
         </c>
         <c ca="left">
            <p>Expansion</p>
         </c>
      </r>
   </tblbdy></tbl>
</sec>
</sec>
</sec>
<sec>
<st>
<p>Summary of BioDRB Annotations</p>
</st>
<p>The BioDRB corpus is available through the GENIA corpus release site <url>http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/</url>. BioDRB contains a total of 5859 relation tokens for the four different relation types: Explicit, Implicit, AltLex and NoRel. Table <tblr tid="T3">3</tblr> shows the relation type distribution in the corpus. Token counts are given in the second column, and the unique (expression) types are shown in the third column. In counting the unique types for explicit connectives, we did not treat modified and unmodified connective expressions as the same type. Thus, for example, the connectives <it>after </it>and <it>one day after </it>were treated as distinct types. For implicit relations, we counted the connectives that were inserted by the annotators.</p>
<tbl id="T3"><title><p>Table 3</p></title><caption><p>BioDRB distribution of relation types</p></caption><tblbdy cols="3">
      <r>
         <c ca="left">
            <p>
               <b>Relation Type</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>No. of Tokens (%)</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>Types</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Explicit</p>
         </c>
         <c ca="right">
            <p>2636 (45%)</p>
         </c>
         <c ca="right">
            <p>179</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Implicit</p>
         </c>
         <c ca="right">
            <p>3001 (51.2%)</p>
         </c>
         <c ca="right">
            <p>57</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Altlex</p>
         </c>
         <c ca="right">
            <p>193 (3.3%)</p>
         </c>
         <c ca="right">
            <p>165</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>NoRel</p>
         </c>
         <c ca="right">
            <p>29 (0.5%)</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>TOTAL</p>
         </c>
         <c ca="right">
            <p>5859</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
      </r>
   </tblbdy></tbl>
<p>Table <tblr tid="T4">4</tblr> shows the sense distributions across the different relation types. Since explicit connectives and AltLex expressions can have multiple senses, we have listed multiple sense occurrences separately, to illustrate the extent of this kind of ambiguity. Note that for implicit relations, multiple senses are not permitted.</p>
<tbl id="T4"><title><p>Table 4</p></title><caption><p>Distribution of senses in BioDRB.</p></caption><tblbdy cols="5">
      <r>
         <c ca="left">
            <p>
               <b>Sense</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>Explicit</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>Implicit</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>AltLex</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>TOTAL</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Alternative</p>
         </c>
         <c ca="right">
            <p>31</p>
         </c>
         <c ca="right">
            <p>3</p>
         </c>
         <c ca="right">
            <p>3</p>
         </c>
         <c ca="right">
            <p>37</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Background</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
         <c ca="right">
            <p>132</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>133</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Cause</p>
         </c>
         <c ca="right">
            <p>339</p>
         </c>
         <c ca="right">
            <p>98</p>
         </c>
         <c ca="right">
            <p>105</p>
         </c>
         <c ca="right">
            <p>542</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Circumstance</p>
         </c>
         <c ca="right">
            <p>8</p>
         </c>
         <c ca="right">
            <p>221</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>230</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Concession</p>
         </c>
         <c ca="right">
            <p>257</p>
         </c>
         <c ca="right">
            <p>70</p>
         </c>
         <c ca="right">
            <p>2</p>
         </c>
         <c ca="right">
            <p>329</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Condition</p>
         </c>
         <c ca="right">
            <p>22</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
         <c ca="right">
            <p>22</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Conjunction</p>
         </c>
         <c ca="right">
            <p>421</p>
         </c>
         <c ca="right">
            <p>641</p>
         </c>
         <c ca="right">
            <p>3</p>
         </c>
         <c ca="right">
            <p>1065</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Continuation</p>
         </c>
         <c ca="right">
            <p>24</p>
         </c>
         <c ca="right">
            <p>831</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
         <c ca="right">
            <p>855</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Contrast</p>
         </c>
         <c ca="right">
            <p>205</p>
         </c>
         <c ca="right">
            <p>75</p>
         </c>
         <c ca="right">
            <p>2</p>
         </c>
         <c ca="right">
            <p>282</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Exception</p>
         </c>
         <c ca="right">
            <p>7</p>
         </c>
         <c ca="right">
            <p>2</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
         <c ca="right">
            <p>9</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Instantiation</p>
         </c>
         <c ca="right">
            <p>21</p>
         </c>
         <c ca="right">
            <p>53</p>
         </c>
         <c ca="right">
            <p>14</p>
         </c>
         <c ca="right">
            <p>88</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Purpose</p>
         </c>
         <c ca="right">
            <p>616</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>617</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Reinforcement</p>
         </c>
         <c ca="right">
            <p>22</p>
         </c>
         <c ca="right">
            <p>60</p>
         </c>
         <c ca="right">
            <p>19</p>
         </c>
         <c ca="right">
            <p>101</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Restatement</p>
         </c>
         <c ca="right">
            <p>69</p>
         </c>
         <c ca="right">
            <p>445</p>
         </c>
         <c ca="right">
            <p>19</p>
         </c>
         <c ca="right">
            <p>533</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Similarity</p>
         </c>
         <c ca="right">
            <p>5</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
         <c ca="right">
            <p>5</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Temporal</p>
         </c>
         <c ca="right">
            <p>394</p>
         </c>
         <c ca="right">
            <p>370</p>
         </c>
         <c ca="right">
            <p>16</p>
         </c>
         <c ca="right">
            <p>780</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Cause/Background</p>
         </c>
         <c ca="right">
            <p>8</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
         <c ca="right">
            <p>8</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Cause/Conjunction</p>
         </c>
         <c ca="right">
            <p>5</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
         <c ca="right">
            <p>5</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Cause/Reinforcement</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Cause/Temporal</p>
         </c>
         <c ca="right">
            <p>6</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
         <c ca="right">
            <p>3</p>
         </c>
         <c ca="right">
            <p>9</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Concession/Background</p>
         </c>
         <c ca="right">
            <p>2</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
         <c ca="right">
            <p>2</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Concession/Circumstance</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Condition/Circumstance</p>
         </c>
         <c ca="right">
            <p>2</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
         <c ca="right">
            <p>2</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Condition/Temporal</p>
         </c>
         <c ca="right">
            <p>5</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
         <c ca="right">
            <p>5</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Conjunction/Temporal</p>
         </c>
         <c ca="right">
            <p>70</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>71</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Continuation/Reinforcement</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Contrast/Background</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Contrast/Concession</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Purpose/Conjunction</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Reinforcement/Conjunction</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Temporal/Circumstance</p>
         </c>
         <c ca="right">
            <p>92</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
         <c ca="right">
            <p>92</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Temporal/Continuation</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
         <c ca="right">
            <p>-</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>TOTAL</b>
            </p>
         </c>
         <c ca="right">
            <p>2636</p>
         </c>
         <c ca="right">
            <p>3001</p>
         </c>
         <c ca="right">
            <p>193</p>
         </c>
         <c ca="right">
            <p>5830</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>Multiple senses provided for connectives are shown separately.</p>
   </tblfn></tbl>
<p>In a given context, explicit connectives can have multiple sense interpretations, as shown in Table <tblr tid="T4">4</tblr>. However, a given connective can have different sense interpretations in different contexts as well. The extent of contextual ambiguity is shown in Table <tblr tid="T5">5</tblr>. For connectives with multiple senses, only the first sense provided in the annotation is used here. There are a total of 27 connectives types (column 1) exhibiting sense ambiguity to varying degrees.</p>
<tbl id="T5"><title><p>Table 5</p></title><caption><p>Contextual ambiguity of explicit connectives</p></caption><tblbdy cols="3">
      <r>
         <c ca="left">
            <p>
               <b>Connective Type</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Senses</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>Tokens</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>accordingly</p>
         </c>
         <c ca="left">
            <p>2: Cause, Conjunction</p>
         </c>
         <c ca="right">
            <p>2</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>although</p>
         </c>
         <c ca="left">
            <p>2: Concession, Contrast</p>
         </c>
         <c ca="right">
            <p>76</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>and</p>
         </c>
         <c ca="left">
            <p>6: Cause, Concession, Conjunction, Continuation, Purpose, Temporal</p>
         </c>
         <c ca="right">
            <p>274</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>as</p>
         </c>
         <c ca="left">
            <p>3: Cause, Purpose, Temporal</p>
         </c>
         <c ca="right">
            <p>23</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>both upon</p>
         </c>
         <c ca="left">
            <p>2: Circumstance, Temporal</p>
         </c>
         <c ca="right">
            <p>2</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>but</p>
         </c>
         <c ca="left">
            <p>2: Concession, Contrast</p>
         </c>
         <c ca="right">
            <p>42</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>by</p>
         </c>
         <c ca="left">
            <p>3: Cause, Purpose, Temporal</p>
         </c>
         <c ca="right">
            <p>262</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>nally</p>
         </c>
         <c ca="left">
            <p>2: Conjunction, Temporal</p>
         </c>
         <c ca="right">
            <p>21</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>however</p>
         </c>
         <c ca="left">
            <p>2: Concession, Contrast</p>
         </c>
         <c ca="right">
            <p>117</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>in part by</p>
         </c>
         <c ca="left">
            <p>2: Cause, Purpose</p>
         </c>
         <c ca="right">
            <p>3</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>in particular</p>
         </c>
         <c ca="left">
            <p>2: Instantiation, Restatement</p>
         </c>
         <c ca="right">
            <p>4</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>in response to</p>
         </c>
         <c ca="left">
            <p>3: Cause, Circumstance, Temporal</p>
         </c>
         <c ca="right">
            <p>12</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>in turn</p>
         </c>
         <c ca="left">
            <p>3: Cause, Conjunction, Temporal</p>
         </c>
         <c ca="right">
            <p>6</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>in</p>
         </c>
         <c ca="left">
            <p>2: Circumstance, Purpose</p>
         </c>
         <c ca="right">
            <p>3</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>indeed</p>
         </c>
         <c ca="left">
            <p>2: Circumstance, Reinforcement</p>
         </c>
         <c ca="right">
            <p>15</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>on the other hand</p>
         </c>
         <c ca="left">
            <p>2: Concession, Contrast</p>
         </c>
         <c ca="right">
            <p>6</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>once</p>
         </c>
         <c ca="left">
            <p>2: Circumstance, Temporal</p>
         </c>
         <c ca="right">
            <p>7</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>second</p>
         </c>
         <c ca="left">
            <p>2: Conjunction, Temporal</p>
         </c>
         <c ca="right">
            <p>3</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>since</p>
         </c>
         <c ca="left">
            <p>2: Cause, Temporal</p>
         </c>
         <c ca="right">
            <p>52</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>so</p>
         </c>
         <c ca="left">
            <p>2: Cause, Restatement</p>
         </c>
         <c ca="right">
            <p>7</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>then</p>
         </c>
         <c ca="left">
            <p>2: Restatement, Temporal</p>
         </c>
         <c ca="right">
            <p>91</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>therefore</p>
         </c>
         <c ca="left">
            <p>2: Cause, Restatement</p>
         </c>
         <c ca="right">
            <p>75</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>thus</p>
         </c>
         <c ca="left">
            <p>2: Cause, Restatement</p>
         </c>
         <c ca="right">
            <p>77</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>upon</p>
         </c>
         <c ca="left">
            <p>2: Cirsumstance, Temporal</p>
         </c>
         <c ca="right">
            <p>15</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>when</p>
         </c>
         <c ca="left">
            <p>3: Circumstance, Condition, Temporal</p>
         </c>
         <c ca="right">
            <p>65</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>while</p>
         </c>
         <c ca="left">
            <p>4: Concession, Conjunction, Contrast, Temporal</p>
         </c>
         <c ca="right">
            <p>64</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>whilst</p>
         </c>
         <c ca="left">
            <p>2: Concession, Contrast</p>
         </c>
         <c ca="right">
            <p>4</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Total</p>
         </c>
         <c ca="left">
            <p>-</p>
         </c>
         <c ca="right">
            <p>1328</p>
         </c>
      </r>
   </tblbdy></tbl>
<p>Column 2 provides the number and names of different senses associated with the connectives, while column 3 provides the total number of tokens for the connective. The total number of tokens for all these ambiguous connectives is 1328, which constitutes 50.4% (1328/2636) of the total number of explicit connective tokens.</p>
</sec>
<sec>
<st>
<p>Annotation Task Procedure</p>
</st>
<p>For the task of annotating discourse relations, each annotator was given an article and instructed to read the article from beginning to end while marking up relations. No pre-defined lists of connectives were provided to annotators, although the connective list from PDTB was provided as an example of what to look for. Annotators were strongly encouraged to identify additional connectives when they were observed. At a high-level, the annotation procedure is encapsulated as follows:</p>
<p>For every new sentence encountered while reading the text:</p>
<p indent="1">1. First determine if there is an explicit connective that relates the sentence to the prior context via a discourse relation. If so, mark this explicit connective, its arguments, and its sense(s). Label the relation type as <it>Explicit</it>.</p>
<p indent="1">2. If there is no explicit connective present to relate the sentence with the prior context, try to insert an implicit connective to express the inferred implicit relation, annotate its sense, and mark its arguments. In case the inferred relation is one of the senses of "Continuation", "Background", or "Circumstance", no connective can be inserted, so use the dummy label "NONE" in place of an implicit connective. Label the relation type as <it>Implicit</it>.</p>
<p indent="1">3. If insertion of an implicit connective leads to redundancy in the expression of the relation, identify and mark the AltLex expression that expresses the relation, annotate its sense, and mark its arguments. Label the relation type as <it>AltLex</it>.</p>
<p indent="1">4. If the sentence does not seem to relate coherently to any sentence in the prior text, label the relation type as <it>NoRel</it>, mark the current sentence as Arg2 and the previous sentence as Arg1.</p>
<p indent="1">5. After annotating the relation of the sentence with the previous context, identify and annotate any sentence-internal explicit connectives that have both their arguments in the same sentence.</p>
</sec>
<sec>
<st>
<p>Limitations</p>
</st>
<p>While we believe that the scope of discourse relations captured in BioDRB is larger than that of the framework from which it was adapted, there are two types of relations that are currently not handled. We describe these below. The main reason for their exclusion is the challenge associated with their annotation. We plan to address these challenges in future extensions to the corpus.</p>
<p>First, we have not annotated implicit or AltLex relations between events and situations mentioned within a single sentence. For example, in the sentence "In particular, after binding with E2, oestrogen receptors have been shown to interact with NF-<it>&#954;</it>B factors, via transcriptional co-factors, resulting in mutual or non-mutual antagonism.", an Altlex "Result" relation can be inferred between the "interaction of oestrogen receptors with NF-<it>&#954;</it>B factors" and "mutual or non-mutual antagonism", anchored in the verb <it>resulting</it>. Such relations were excluded because it is challenging to identify the clausal boundary "sites" where they are inferred. Although the syntactic parse of a sentence can be used for this purpose, we did not have a sufficiently accurate sentence parser for our texts.</p>
<p>Second, coordinating conjunctions (e.g., <it>and, or</it>) that conjoin verb phrases in a sentence can potentially indicate discourse relations between two situations. What's more, the conjunction <it>and </it>can often express more than the sense of "Conjunction", including at least the "Temporal" and "Result" senses. For example, the cojunction <it>and </it>in the sentence "Thus SEB can interact directly with MHC class II molecules on APCs and activate T cells bearing the proper TcR V<it>&#946; </it>chains." can be taken to express a conjunction of two independent situations, namely "SEB interacting with MHC class II molecules on APCs" and "SEB activating T cells bearing the proper TcR V<it>&#946; </it>chains". In addition, either a causal, temporal or enablement relation might be inferred here. While such conjunctions appear often in the BioDRB, we decided to exclude them because it is difficult to distinguish them from conjunctions that don't have a discourse function.</p>
</sec>
<sec>
<st>
<p>Evaluation of Annotation Reliability</p>
</st>
<p>Each article was annotated by two annotators who were premed students at the University of Pennsylvania. The domain expertise of the annotators is crucial for allowing them to identify the correct sense of discourse connectives and to identify the existence of implicit relations. The annotators were extensively trained (by the first author) with regard to knowledge of linguistic syntax, semantics, and discourse, following which they were given a tutorial on the biomedical discourse annotation guidelines. The annotation was carried out over a period of three years, with annotators annotating at an average speed of 7 minutes per relation.</p>
<p>We computed agreement for connective identification, argument identification and sense labeling. Explicit and AltLex relations were treated separately from implicit relations.</p>
<p>For agreement on the identification of explicit connectives and AltLex expressions, we calculated the percentage of overlapping tokens identified by the annotators, since one annotator could have selected some connectives or AltLex's that the other did not. For example, if one annotator identified 20 connectives and the other identified 30 connectives, this could mean that there were 15 tokens that were common to both, and that there were 35 tokens some of which were identified by one annotator while the others were identified by the other annotator. The agreement was then reported as the percentage of common over common and uncommon tokens (i.e., 43% (15/35) for the artificial case illustrated above). We achieved 82% agreement. The major sources of mismatch were subordinators, which are harder to identify than conjunctions and adverbials, and AltLex's.</p>
<p>For agreement on argument spans, we used both the exact match criterion as well as the more relaxed partial match criterion <abbrgrp>
<abbr bid="B73">73</abbr>
</abbrgrp>. With the exact match criterion, annotators are taken to agree on an argument only when their respective selections are identical or fully overlapping, whereas the partial match criterion allows agreement even in the case of partial overlap. Argument agreement was computed only on the connectives where the annotators agreed. For Explicit and AltLex relations, we achieved an exact match of 88% and 81% on Arg2 and Arg1, respectively. This difference is understandable, since Arg1s are generally harder to identify than Arg2s. With partial match, we achieved an agreement of 93% and 86% for Arg2 and Arg1, respectively. Agreement on implicit relations was lower, at 88% and 75% for Arg2 and Arg1, respectively. The most likely reason for lower agreement for implicits is that non-adjacent arguments were allowed in the BioDRB, which makes the task of identifying the arguments harder.</p>
<p>Since sense guidelines allow an annotator to select multiple senses for a given connective, we took annotators to agree on sense labeling if at least one sense for a connective was the same across both annotators. Furthermore, since the sense labeling task involved classifying a given set of connectives into multiple nominal categories, namely 31 sense categories in total (see Table <tblr tid="T1">1</tblr>), we report the agreement by computing the kappa score. For explicit and AltLex relations, the kappa score was 0.71, with the observed agreement at 0.85 and the expected agreement at 0.48. For implicit relations, the kappa score was 0.63, with the observed agreement at 0.82 and the expected agreement at 0.52. The kappa scores for both explicit and implicit relations are therefore in the range generally accepted as substantial agreement.</p>
<p>Following the double-blind annotation and agreement calculations, the disagreements were adjudicated by an expert. We also made further reviews of the corpus to correct for any remaining guideline-related errors.</p>
</sec>
<sec>
<st>
<p>BioDRB Data, Tools and Representation</p>
</st>
<sec>
<st>
<p>Data</p>
</st>
<p>The source corpus over which the BioDRB has been annotated consists of 24 full-text articles from the GENIA corpus <abbrgrp>
<abbr bid="B39">39</abbr>
</abbrgrp>. The GENIA corpus is a collection of articles from the biomedical literature. It has been compiled and annotated within the scope of the GENIA project <url>http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/</url>.</p>
<p>The 24 GENIA articles were selected by the GENIA group in 2006 by searching the PubMed entries with two MeSH terms "Blood cells" and "Transcription factors". Among the returned entries, 24 articles were open-access that are considered representative of the scientific text style of this domain <abbrgrp>
<abbr bid="B94">94</abbr>
</abbrgrp>. This full-text data collection has been annotated with coreference (by the GENIA group) and citation relations <abbrgrp>
<abbr bid="B61">61</abbr>
</abbrgrp>, and therefore represents one of the most comprehensively annotated full-text biomedical corpora. Our annotation of discourse relations on this corpus will further enrich the data resource, and will assist future text mining applications.</p>
<p>Altogether, the articles have a total of 112483 words and 4911 sentences. Sentence counts were obtained with the UIUC sentence segmentation tool <url>http://cogcomp.cs.illinois.edu/page/tools_view/2</url>.</p>
</sec>
<sec>
<st>
<p>Annotation Tools and Representation</p>
</st>
<p>We used a recently released version of the discourse annotation tool, called "Annotator", distributed by the PDTB group. It is freely available from <url>http://www.seas.upenn.edu/~pdtb/PDTBAPI</url>, and differs from earlier versions primarily with respect to its simpler data representation. The tool allows for the annotation of relations, their arguments, as well as senses, all within the same interface.</p>
<p>Following PDTB, BioDRB annotations are represented in a "stand-off" style, in that the annotation files are physically distinct from the source files. Text span annotations are represented in terms of their character o sets in the source files, and can be easily retrieved programmatically. When text spans are discontinuous, which is possible for both connective spans and argument spans, they are represented as sets of offsets. Each element of the set is associated with one part of the discontinuous spans and the order of the elements in the set reflects the linear order of the discontinuous spans in the text. Annotation files are at text files, with each line representing a single relation token and all its annotated features (separated with the " |" delimiter). Since we used the tool developed initially for PDTB, which also annotated additional attribution features, only some of the "|" separated fields are relevant for BioDRB. These are shown in Table <tblr tid="T6">6</tblr>. The first column provides the field number (starting count from 0) and the second column describes the annotation that the field contains. Other fields are simply left blank. For implicit relations, no span offsets are provided since there is no lexical item associated with the relation. To identify the location of the implicit relation, the start offset of its Arg2 span is used as the identifier.</p>
<tbl id="T6"><title><p>Table 6</p></title><caption><p>Annotation fields in the BioDRB data representation</p></caption><tblbdy cols="2">
      <r>
         <c ca="center">
            <p><b>Field Num</b>.</p>
         </c>
         <c ca="left">
            <p>
               <b>Description</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="left">
            <p>Relation type (Explicit, Implicit, AltLex, NoRel)</p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="left">
            <p>(Sets of) Span o sets for connective (when explicit)</p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>7</p>
         </c>
         <c ca="left">
            <p>Connective string "inserted" for Implicit relation</p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>8</p>
         </c>
         <c ca="left">
            <p>Sense1 of Explicit Connective (or Implicit Connective)</p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>9</p>
         </c>
         <c ca="left">
            <p>Sense2 of Explicit Connective (or Implicit Connective)</p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>14</p>
         </c>
         <c ca="left">
            <p>(Sets of) Span o sets for Arg1</p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>20</p>
         </c>
         <c ca="left">
            <p>(Sets of) Span o sets for Arg2</p>
         </c>
      </r>
   </tblbdy></tbl>
<p>Table <tblr tid="T7">7</tblr> shows several examples of the annotation representation. Row 1 shows the entry of multiple senses ("Temporal.Precedence" and "Conjunction") for an explicit connective. Row 2 shows a set of span o sets ("21670..21678;21729..21737") for a discontinuous explicit connective text span, with the elements separated by a semi-colon. A discontinuous text span for Arg2 ("10090..10100;10106..10209") is shown in Row 3. Rows 4 and 5 show the annotation for an implicit and AltLex relation, respectively.</p>
<tbl id="T7"><title><p>Table 7</p></title><caption><p>Annotation representation</p></caption><tblbdy cols="1">
      <r>
         <c ca="left">
            <p>Explicit|9171..9174|||||||Temporal.PrecedencejConjunction|||||9137..9170||||||9175..9244||||||</p>
         </c>
      </r>
      <r>
         <c cspan="1">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Explicit|21670..21678;21729..21737|||||||Conjunction||||||21679..21727||||||21738..21829||||||</p>
         </c>
      </r>
      <r>
         <c cspan="1">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Explicit|10101..10105||||||||Temporal.Precedence||||||9932..10088||||||10090..10100;10106..10209||||||</p>
         </c>
      </r>
      <r>
         <c cspan="1">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Implicit||||||||as a resultjCause.Result||||||3418..3655||||||3657..3714||||||</p>
         </c>
      </r>
      <r>
         <c cspan="1">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>AltLex|25183..25199||||||||ReinforcementjCause.Claim||||||24621..25181||||||25183..25444|||||||</p>
         </c>
      </r>
   </tblbdy></tbl>
</sec>
</sec>
<sec>
<st>
<p>Sense Detection of Explicit Connectives</p>
</st>
<p>Predicting the sense of discourse relations is an important subtask of discourse parsing. Prior work on discourse relation sense detection has tackled the task of identifying the senses of explicit connectives separately from implicit relations. Sense prediction for explicit connectives in the open-domain PDTB has been shown to be an easy task, with most connectives being unambiguous <abbrgrp>
<abbr bid="B44">44</abbr>
<abbr bid="B52">52</abbr>
</abbrgrp>. As a result, the connectives themselves serve as highly reliable predictors of their sense.</p>
<p>In this section, we describe our preliminary experiments for classifying the senses of explicit connectives in BioDRB. Similar to prior work with the PDTB, one of our goals here is to establish a baseline for this task by using just the (case-insensitive) connective text string as the predictive feature. We also carried out the same experiments with the PDTB data, in order to compare the results across the two domains, as well as to explore how well a classifier trained on the open-domain PDTB data generalizes to the domain-specific data of the BioDRB (described in the next section). For all experiments, we used SLIPPER <abbrgrp>
<abbr bid="B95">95</abbr>
</abbrgrp>, a learning system that generates rulesets based on confidence-rated boosting.</p>
<p>To effectively compare BioDRB and PDTB, we need to group the BioDRB sense types into the 4 generalized classes in the PDTB (Table <tblr tid="T2">2</tblr>), and perform 4-way classification for these generalized senses. The main reason for designing the comparative study at the class-level instead of the type-level is that sense annotation in the PDTB follows a " flexible" approach, wherein annotators are allowed to back-o to the most general class-level in the hierarchical classification. As a result, many connectives in PDTB are labeled with only class-level senses, which makes their comparison difficult with the type-level senses in BioDRB.</p>
<p>Since explicit connectives can have up to two senses (see Table <tblr tid="T4">4</tblr>), we allowed for three scenarios. In the first scenario, only the <it>first sense </it>of a connective was considered, yielding a total of 2636 sense instances. In the second scenario, only the <it>second sense </it>was considered. There are 195 such instances (7.4%) in the BioDRB. Selecting the second sense also yielded a total of 2636 sense instances. Finally, in the third scenario, we allowed for <it>both senses </it>to be selected, so that the data set consists of new sense instances for the 195 multiple-sense connectives. This yielded a total of 2831 (2636+195) sense instances. Our hypothesis was that the third scenario increases sense ambiguity in the data, and that the classifier performance should therefore decrease.</p>
<p>For the PDTB experiments, we used the same data set used in other previous work, and considered the same three scenarios described above for connectives with two senses. Of the 18459 explicit connectives in PDTB, 999 (5.4%) appear with two senses.</p>
<p>In all cases, we carried out ten-fold cross-validation. For BioDRB, the majority class was the "Contingency" sense, giving a baseline of 35%, averaging across all three scenarios. Average baseline for PDTB was 33%, with "Expansion" as the majority class. Results are reported in Table <tblr tid="T8">8</tblr>, showing that the overall classification performance is very similar across the two corpora. (Note that other previous work with the PDTB has been done for the third <it>both sense </it>scenario <abbrgrp>
<abbr bid="B44">44</abbr>
<abbr bid="B52">52</abbr>
</abbrgrp>, where a higher accuracy of 93% is reported. However, Pitler et al. used a Naive Bayes classifier in their experiments, and we expect that such a classifier on the BioDRB data would perform at similar levels.) Thus, explicit sense prediction can be done very reliably in the biomedical domain as well, using the connective as the only predictive variable. Also, the fact that the performance degrades when both senses of a multi-sense connective are considered confirms our hypothesis that this scenario increases ambiguity in the data. However, it is interesting to find that in both corpora, the performance is lowest when only the second sense is considered. It is possible that the second senses that were provided by annotators are often weak interpretations of the discourse relation, and that the first sense is the stronger, preferred, interpretation.</p>
<tbl id="T8"><title><p>Table 8</p></title><caption><p>Ten-fold cross validation accuracies for explicit connective sense classification in BioDRB and PDTB.</p></caption><tblbdy cols="4">
      <r>
         <c>
            <p/>
         </c>
         <c ca="right">
            <p>
               <b>First Sense</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>Second Sense</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>Both Senses</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="4">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>BioDRB</b>
            </p>
         </c>
         <c ca="right">
            <p>90.9%</p>
         </c>
         <c ca="right">
            <p>83.6%</p>
         </c>
         <c ca="right">
            <p>85.6%</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>PDTB</b>
            </p>
         </c>
         <c ca="right">
            <p>90.1%</p>
         </c>
         <c ca="right">
            <p>84.1%</p>
         </c>
         <c ca="right">
            <p>85.6%</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>Columns represent three scenarios for selecting from multiple senses provided for connectives.</p>
   </tblfn></tbl>
<p>In all remaining experiments here, we use the data from the <it>first sense </it>scenario, for which the classifier performs best. Macro average F1 score for both corpora was 0.91.</p>
<p>To examine how the classifier performs on each of the different classes, we computed the class-wise precision, recall and F1 score. The results in Table <tblr tid="T9">9</tblr> show that the worst scores are precision for "Contingency" (0.82) and recall for "Temporal" (0.75). Interestingly, a similar experiment with the PDTB (results shown in Table <tblr tid="T10">10</tblr>) shows the same two senses with the worst scores, but here, it is recall for "Contingency" (0.71) and precision for "Temporal" (0.88). This suggests that there might be some differences in the semantic usage of connectives across the two domains.</p>
<tbl id="T9"><title><p>Table 9</p></title><caption><p>Explicit sense classification in BioDRB: Class-wise Precision, Recall and F1.</p></caption><tblbdy cols="4">
      <r>
         <c ca="left">
            <p>
               <b>Class</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Precision</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Recall</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>F1</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="4">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Comparison</p>
         </c>
         <c ca="left">
            <p>0.983</p>
         </c>
         <c ca="left">
            <p>0.868</p>
         </c>
         <c ca="left">
            <p>0.922</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Contingency</p>
         </c>
         <c ca="left">
            <p>0.819</p>
         </c>
         <c ca="left">
            <p>0.992</p>
         </c>
         <c ca="left">
            <p>0.897</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Expansion</p>
         </c>
         <c ca="left">
            <p>0.923</p>
         </c>
         <c ca="left">
            <p>0.9</p>
         </c>
         <c ca="left">
            <p>0.911</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Temporal</p>
         </c>
         <c ca="left">
            <p>1.0</p>
         </c>
         <c ca="left">
            <p>0.754</p>
         </c>
         <c ca="left">
            <p>0.860</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>Macro average F1 score is 0.91.</p>
   </tblfn></tbl>
<tbl id="T10"><title><p>Table 10</p></title><caption><p>Explicit sense classification in PDTB: Class-wise Precision, Recall and F1.</p></caption><tblbdy cols="4">
      <r>
         <c ca="left">
            <p>
               <b>Class</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Precision</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Recall</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>F1</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="4">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Comparison</p>
         </c>
         <c ca="left">
            <p>0.948</p>
         </c>
         <c ca="left">
            <p>0.993</p>
         </c>
         <c ca="left">
            <p>0.970</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Contingency</p>
         </c>
         <c ca="left">
            <p>1.0</p>
         </c>
         <c ca="left">
            <p>0.706</p>
         </c>
         <c ca="left">
            <p>0.828</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Expansion</p>
         </c>
         <c ca="left">
            <p>0.907</p>
         </c>
         <c ca="left">
            <p>0.978</p>
         </c>
         <c ca="left">
            <p>0.941</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Temporal</p>
         </c>
         <c ca="left">
            <p>0.883</p>
         </c>
         <c ca="left">
            <p>0.889</p>
         </c>
         <c ca="left">
            <p>0.886</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>Macro average F1 score is 0.91.</p>
   </tblfn></tbl>
<p>Next, we considered whether the size of the BioDRB corpus is sufficient for sense detection. Given that the accuracy of the BioDRB classi er is at the same level as that trained on the more than 8 times larger PDTB, this suggests that the BioDRB corpus size may be sufficient for this task. We tested our conjecture by partitioning the data into a training set (2360 instances) and test set (276 instances), and incrementally increasing the size of the training examples, in order to see if the classifier performance stabilizes as the training size reaches the maximum, <it>n </it>= 2360. We used 8 increments (236 examples in each increment), using the same test set of 276 examples with each incremented training set. The results show that the peformance of the classifier improves up to <it>n </it>= 1888, achieving an accuracy of 90.6%, but further increments up to <it>n </it>= 2360 do not significantly improve the performance. We therefore conclude that the size of the BioDRB corpus is sufficient for the task of explicit connective sense identification. Furthermore, these results are consistent with our related work on connective identification in BioDRB <abbrgrp>
<abbr bid="B45">45</abbr>
</abbrgrp>, where we show that the performance of the classifier becomes stable when the training size reaches over 5000 words.</p>
<p>Finally, since the BioDRB sense classification was designed to provide more refined and, therefore, more informative sense distinctions, we performed classification with the 15 type-level senses for explicit connectives. (Note that the 16th sense, "Background", does not appear for explicit connectives.)</p>
<p>The majority class (the "Purpose" sense) baseline accuracy for the type-level senses was 23.5%. Again, we performed a ten-fold cross-validation on the full data set of 2636 connectives, considering only the first sense of the connective where multiple senses were provided. Not surprisingly, the accuracy of the classifier for more refined classification is lower, at 69.2%, although still significantly higher than the baseline. The macro average F1 score was 0.28, mainly because many senses are too sparse for rules to be learned reliably. Examination of class-wise scores shows that rules were reliably learned for three senses - "Temporal" (F1 score 0.94), "Conjunction" (F1 score 0.97), "Cause" (F1 score 0.81) - all of which have more than 300 instances each in the corpus (see Table <tblr tid="T4">4</tblr>). While these results suggest that we may need more annotated training data for reliable refined sense classification, our immediate goal is to first explore the use of richer features for the classifier. We conjecture that for more refined sense classification, the connective is not sufficient as the sole predictive variable.</p>
</sec>
<sec>
<st>
<p>Lessons to be Learned from a New Domain</p>
</st>
<p>A natural question that arises in the context of our work is whether it is necessary to develop an independently annotated biomedical corpus of discourse relations, instead of using tools that have already been developed for the open domain. In this section, we present two studies showing that developing an independent domain-specific corpus is indeed beneficial. Our conclusions are consistent with <it>sublanguage theories </it>
<abbrgrp>
<abbr bid="B96">96</abbr>
<abbr bid="B97">97</abbr>
<abbr bid="B98">98</abbr>
</abbrgrp> for technical domains such as the biomedical domain.</p>
<p>First, as demontrated in the previous section, although BioDRB and PDTB sense classifiers perform at very similar levels of accuracy, there are class-wise differences in performance which suggest differences in the semantic usage of connectives across the two domains. To explore this further, we trained the classifier on the PDTB data and tested it on BioDRB. The accuracy of this cross-domain classifier was 54.5% and the macro average F1 score was 0.57. Class-wise precision, recall and F1 scores reported in Table <tblr tid="T11">11</tblr> show that "Comparison" is the only sense with scores comparable to the within-domain classifier (see Table <tblr tid="T9">9</tblr>), with all other senses performing much worse. These results indicate that a sense classifier trained on the open-domain PDTB data does not generalize well to the biomedical domain, and that there is a significant advantage to developing an independent biomedical annotated corpus of discourse relations. Our findings here are consistent with our related work on identifying connectives in BioDRB <abbrgrp>
<abbr bid="B45">45</abbr>
</abbrgrp>, which shows that a connective identification classifier trained on PDTB does not perform well on BioDRB even with domain adaptation techniques (instance weighting, instance pruning, and feature augmentation), compared to a classifier trained on the BioDRB alone.</p>
<tbl id="T11"><title><p>Table 11</p></title><caption><p>Cross-domain sense classification: Class-wise Precision, Recall and F1.</p></caption><tblbdy cols="4">
      <r>
         <c ca="left">
            <p>
               <b>Class</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Precision</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Recall</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>F1</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="4">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Comparison</p>
         </c>
         <c ca="left">
            <p>0.983</p>
         </c>
         <c ca="left">
            <p>0.897</p>
         </c>
         <c ca="left">
            <p>0.938</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Contingency</p>
         </c>
         <c ca="left">
            <p>0.643</p>
         </c>
         <c ca="left">
            <p>0.732</p>
         </c>
         <c ca="left">
            <p>0.131</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Expansion</p>
         </c>
         <c ca="left">
            <p>0.347</p>
         </c>
         <c ca="left">
            <p>0.938</p>
         </c>
         <c ca="left">
            <p>0.507</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Temporal</p>
         </c>
         <c ca="left">
            <p>0.863</p>
         </c>
         <c ca="left">
            <p>0.585</p>
         </c>
         <c ca="left">
            <p>0.697</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>Macro average F1 score is 0.57.</p>
   </tblfn></tbl>
<p>Second, given that texts from the biomedical literature are typically segmented into the rhetorical categories of <it>Introduction, Methods, Results and Discussion </it>(IMRAD) <abbrgrp>
<abbr bid="B99">99</abbr>
<abbr bid="B100">100</abbr>
<abbr bid="B101">101</abbr>
<abbr bid="B102">102</abbr>
</abbrgrp>, we explored whether discourse relations within each of these segments exhibit regular patterns.</p>
<p>We examined all relation types (i.e, explicit, implicit, and Altlex) when they appeared in the clearly indicated IMRAD segments. Relations in other sections were ignored. For example, some articles did not have the conventional IMRAD structure at all, and were therefore ignored completely in our calculations. Further, sections such as <it>Conclusions, Authors' Contributions</it>, and <it>Figures and Table Captions </it>were ignored. Finally, in some cases, differently named sections were treated as the same. For example, <it>Background </it>sections were counted together with <it>Introduction</it>, and <it>Materials and Methods </it>were counted together with sections named <it>Methods</it>. In this way, we extracted the sense distribution for a total of 3953 explicit, implicit and AltLex relations for IMRAD segments, shown in Table <tblr tid="T12">12</tblr>.</p>
<tbl id="T12"><title><p>Table 12</p></title><caption><p>Sense distributions in IMRAD segments</p></caption><tblbdy cols="7">
      <r>
         <c ca="left">
            <p>
               <b>Type-level Sense</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>Introduction</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>Methods</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>Results</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>Abstract</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>Discussion</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>Total</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="7">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Alternative</p>
         </c>
         <c ca="right">
            <p>4 (13.8%)</p>
         </c>
         <c ca="right">
            <p>3 (10.3%)</p>
         </c>
         <c ca="right">
            <p>7 (24.1%)</p>
         </c>
         <c ca="right">
            <p>0 (0.0%)</p>
         </c>
         <c ca="right">
            <p>15 (51.7%)</p>
         </c>
         <c ca="right">
            <p>29</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Background</p>
         </c>
         <c ca="right">
            <p>24 (19.8%)</p>
         </c>
         <c ca="right">
            <p>7 (5.8%)</p>
         </c>
         <c ca="right">
            <p>36 (29.8%)</p>
         </c>
         <c ca="right">
            <p>15 (12.4%)</p>
         </c>
         <c ca="right">
            <p>39 (32.2%)</p>
         </c>
         <c ca="right">
            <p>121</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Cause</p>
         </c>
         <c ca="right">
            <p>80 (17.0%)</p>
         </c>
         <c ca="right">
            <p>16 (3.4%)</p>
         </c>
         <c ca="right">
            <p>134 (28.5%)</p>
         </c>
         <c ca="right">
            <p>33 (7.0%)</p>
         </c>
         <c ca="right">
            <p>208 (44.2%)</p>
         </c>
         <c ca="right">
            <p>471</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Circumstance</p>
         </c>
         <c ca="right">
            <p>11 (7.1%)</p>
         </c>
         <c ca="right">
            <p>7 (4.5%)</p>
         </c>
         <c ca="right">
            <p>112 (71.8%)</p>
         </c>
         <c ca="right">
            <p>13 (8.3%)</p>
         </c>
         <c ca="right">
            <p>13 (8.3%)</p>
         </c>
         <c ca="right">
            <p>156</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Concession</p>
         </c>
         <c ca="right">
            <p>59 (21.7%)</p>
         </c>
         <c ca="right">
            <p>3 (1.1%)</p>
         </c>
         <c ca="right">
            <p>73 (26.8%)</p>
         </c>
         <c ca="right">
            <p>21 (7.7%)</p>
         </c>
         <c ca="right">
            <p>116 (42.6%)</p>
         </c>
         <c ca="right">
            <p>272</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Condition</p>
         </c>
         <c ca="right">
            <p>1 (5.3%)</p>
         </c>
         <c ca="right">
            <p>6 (31.6%)</p>
         </c>
         <c ca="right">
            <p>0 0 (0.0%)</p>
         </c>
         <c ca="right">
            <p>1 (5.3%)</p>
         </c>
         <c ca="right">
            <p>11 (57.9%)</p>
         </c>
         <c ca="right">
            <p>19</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Conjunction</p>
         </c>
         <c ca="right">
            <p>105 (13.9%)</p>
         </c>
         <c ca="right">
            <p>100 (13.3%)</p>
         </c>
         <c ca="right">
            <p>271 (35.9%)</p>
         </c>
         <c ca="right">
            <p>78 (10.3%)</p>
         </c>
         <c ca="right">
            <p>195 (25.9%)</p>
         </c>
         <c ca="right">
            <p>754</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Continuation</p>
         </c>
         <c ca="right">
            <p>80 (19.3%)</p>
         </c>
         <c ca="right">
            <p>121 (29.2%)</p>
         </c>
         <c ca="right">
            <p>112 (27.0%)</p>
         </c>
         <c ca="right">
            <p>17 (4.1%)</p>
         </c>
         <c ca="right">
            <p>85 (20.5%)</p>
         </c>
         <c ca="right">
            <p>415</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Contrast</p>
         </c>
         <c ca="right">
            <p>26 (10.6%)</p>
         </c>
         <c ca="right">
            <p>9 (3.7%)</p>
         </c>
         <c ca="right">
            <p>118 (48.0%)</p>
         </c>
         <c ca="right">
            <p>12 (4.9%)</p>
         </c>
         <c ca="right">
            <p>81 (32.9%)</p>
         </c>
         <c ca="right">
            <p>246</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Exception</p>
         </c>
         <c ca="right">
            <p>1 (16.7%)</p>
         </c>
         <c ca="right">
            <p>2 (33.3%)</p>
         </c>
         <c ca="right">
            <p>2 (33.3%)</p>
         </c>
         <c ca="right">
            <p>0 (0.0%)</p>
         </c>
         <c ca="right">
            <p>1 (16.7%)</p>
         </c>
         <c ca="right">
            <p>6</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Instantiation</p>
         </c>
         <c ca="right">
            <p>17 (23.9%)</p>
         </c>
         <c ca="right">
            <p>0 (0.0%)</p>
         </c>
         <c ca="right">
            <p>9 (12.7%)</p>
         </c>
         <c ca="right">
            <p>3 (4.2%)</p>
         </c>
         <c ca="right">
            <p>42 (59.2%)</p>
         </c>
         <c ca="right">
            <p>71</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Purpose</p>
         </c>
         <c ca="right">
            <p>93 (20.2%)</p>
         </c>
         <c ca="right">
            <p>84 (18.3%)</p>
         </c>
         <c ca="right">
            <p>144 (31.3%)</p>
         </c>
         <c ca="right">
            <p>35 (7.6%)</p>
         </c>
         <c ca="right">
            <p>104 (22.6%)</p>
         </c>
         <c ca="right">
            <p>460</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Reinforcement</p>
         </c>
         <c ca="right">
            <p>14 (16.5%)</p>
         </c>
         <c ca="right">
            <p>3 (3.5%)</p>
         </c>
         <c ca="right">
            <p>14 (16.5%)</p>
         </c>
         <c ca="right">
            <p>4 (4.7%)</p>
         </c>
         <c ca="right">
            <p>50 (58.8%)</p>
         </c>
         <c ca="right">
            <p>85</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Restatement</p>
         </c>
         <c ca="right">
            <p>63 (19.2%)</p>
         </c>
         <c ca="right">
            <p>47 (14.3%)</p>
         </c>
         <c ca="right">
            <p>124 (37.8%)</p>
         </c>
         <c ca="right">
            <p>29 (8.8%)</p>
         </c>
         <c ca="right">
            <p>65 (19.8%)</p>
         </c>
         <c ca="right">
            <p>328</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Similarity</p>
         </c>
         <c ca="right">
            <p>0 (0.0%)</p>
         </c>
         <c ca="right">
            <p>0 (0.0%)</p>
         </c>
         <c ca="right">
            <p>2 (40%)</p>
         </c>
         <c ca="right">
            <p>0 (0.0%)</p>
         </c>
         <c ca="right">
            <p>3 (60%)</p>
         </c>
         <c ca="right">
            <p>5</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Temporal</p>
         </c>
         <c ca="right">
            <p>41 (8.0%)</p>
         </c>
         <c ca="right">
            <p>259 (50.3%)</p>
         </c>
         <c ca="right">
            <p>0 (0.0%)</p>
         </c>
         <c ca="right">
            <p>22 (4.3%)</p>
         </c>
         <c ca="right">
            <p>52 (10.1%)</p>
         </c>
         <c ca="right">
            <p>515</p>
         </c>
      </r>
   </tblbdy></tbl>
<p>It is revealing to see that the <it>Methods </it>segments contain "Temporal" relations more frequently than the other segments, since these segments describe the various steps of experiments that have been conducted. The segments from <it>Methods </it>also have negligible "Concession" relations, suggesting that these sections lack reasoning or argumentation. Indeed, "Contrast" and "Concession" relations are found more frequently in the <it>Results </it>and <it>Discussion </it>segments, where comparisons are made with related work, and arguments are made about the presented work. Also frequent in the <it>Discussion </it>section are "Causal", "Instantiation", and "Reinforcement" relations, since authors give justifications, reasons, and, in general, reinforcing arguments for their experiments and conclusions. There is a high proportion of "Circumstance" relations in the <it>Results </it>section, where outcomes of experiments are presented. "Background" relations are, curiously, not more frequent in the <it>Abstract </it>and <it>Introduction </it>sections, as one would expect, but rather in the <it>Result </it>and <it>Discussion </it>section. Overall, these senses show several useful patterns in the distribution of senses across the different IMRAD segments, suggesting that biomedical literature contains a highly domain-specific distribution of relations that can benefit text-mining applications. In future work, we plan to explore the feasibility of using the IMRAD segment type as a feature for classifying the senses of explicit connectives.</p>
</sec>
</sec>
<sec>
<st>
<p>Conclusion</p>
</st>
<p>We have developed the Biomedical Discourse Relation Bank (BioDRB), which contains discourse-level annotations of explicit and implicit discourse relations and their abstract object arguments, and the senses of discourse relations. Starting with the Penn Discourse Treebank (PDTB) as the underlying discourse annotation framework because of its theory-neutral and lexically grounded approach, we have successfully adapted the PDTB annotation guidelines for the biomedical discourse annotation, while introducing some features specific to, and necessary for, the biomedical domain. We have also carried out experiments on sense detection of explicit connectives. Our results show that using the connective as the only feature for the classification creates a very high baseline for the task, as in the open domain. At the same time, there are significant differences in the semantic usage of connectives across the two domains, since a sense classifier trained on the PDTB data does not generalize to the BioDRB. Together with similar results that we have obtained in our related work on identifying explicit connectives, we conclude that it is beneficial to take a "sublanguage" approach for discourse processing of biomedical literature, and develop an independent biomedical corpus of discourse annotations. Finally, we have also found that while the size of the BioDRB corpus is sufficient for coarse-sense classification, more training data might be needed for more refined sense classification, although future research should first explore the use of richer features. One such additional feature may be the IMRAD segments of these articles, which show some useful patterns of sense distributions.</p>
</sec>
<sec>
<st>
<p>Availability and Requirements</p>
</st>
<p>
<b>Project name: </b>Biomedical Discourse Relation Bank Project</p>
<p>
<b>Project home page: </b>
<url>http://www.biodiscourserelation.org</url>
</p>
<p>
<b>Operating system(s): </b>Platform independent</p>
<p>
<b>Programming language: </b>None</p>
<p>
<b>Other requirements: </b>Java 1.5 or higher (for annotation tools)</p>
<p>
<b>License: </b>None</p>
<p>
<b>Any restrictions to use by non-academics: </b>None</p>
</sec>
<sec>
<st>
<p>Authors' contributions</p>
</st>
<p>RP designed and directed the development of the BioDRB corpus, carried out all experiments, and drafted the manuscript. SM participated in contributed to the development of the annotation guidelines, and provided critical intellectual content for revisions on the draft. NF participated in the pilot annotation study and contributed to the development of the annotation guidelines. AJ contributed to the comparative studies in this work. HY conceived of the study and participated in its design and coordination. All authors have read and approved the final manuscript.</p>
</sec>
</bdy><bm>
<ack>
<sec>
<st>
<p>Acknowledgements</p>
</st>
<p>This work was partially supported by a seed grant from University of Wisconsin-Milwaukee Graduate School to Hong Yu, and NSF grant IIS-07-05671 (PIs: Aravind Joshi, Rashmi Prasad). We thank Geraud Campion for tool support. We are grateful to the anonymous reviewers for their helpful and insightful comments.</p>
</sec>
</ack>
<refgrp><bibl id="B1"><title><p>Literature mining for the biologist: from information retrieval to biological discovery</p></title><aug><au><snm>Jensen</snm><fnm>L</fnm></au><au><snm>Saric</snm><fnm>J</fnm></au><au><snm>Bork</snm><fnm>P</fnm></au></aug><source>Nature Reviews Genetics</source><pubdate>2006</pubdate><volume>7</volume><fpage>119</fpage><lpage>129</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nrg1768</pubid><pubid idtype="pmpid" link="fulltext">16418747</pubid></pubidlist></xrefbib></bibl><bibl id="B2"><title><p>Text-mining and information-retrieval services for molecular biology</p></title><aug><au><snm>Krallinger</snm><fnm>M</fnm></au><au><snm>Valencia</snm><fnm>A</fnm></au></aug><source>Genome Biol</source><pubdate>2005</pubdate><volume>6</volume><fpage>224</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/gb-2005-6-7-224</pubid><pubid idtype="pmcid">1175978</pubid><pubid idtype="pmpid" link="fulltext">15998455</pubid></pubidlist></xrefbib></bibl><bibl id="B3"><title><p>Mining the biomedical literature in the genomic era: an overview</p></title><aug><au><snm>Shatkay</snm><fnm>H</fnm></au><au><snm>Feldman</snm><fnm>R</fnm></au></aug><source>J Comput Biol</source><pubdate>2003</pubdate><volume>10</volume><fpage>821</fpage><lpage>855</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1089/106652703322756104</pubid><pubid idtype="pmpid" link="fulltext">14980013</pubid></pubidlist></xrefbib></bibl><bibl id="B4"><title><p>Frontiers of biomedical text mining: current progress</p></title><aug><au><snm>Zweigenbaum</snm><fnm>P</fnm></au><au><snm>Demner-Fushman</snm><fnm>D</fnm></au><au><snm>Yu</snm><fnm>H</fnm></au><au><snm>Cohen</snm><fnm>KB</fnm></au></aug><source>Briefings in Bioinformatics</source><pubdate>2007</pubdate><volume>8</volume><fpage>358</fpage><lpage>375</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bib/bbm045</pubid><pubid idtype="pmcid">2516302</pubid><pubid idtype="pmpid" link="fulltext">17977867</pubid></pubidlist></xrefbib></bibl><bibl id="B5"><title><p>Toward information extraction: identifying protein names from biological papers</p></title><aug><au><snm>Fukuda</snm><fnm>K</fnm></au><au><snm>Tamura</snm><fnm>A</fnm></au><au><snm>Tsunoda</snm><fnm>T</fnm></au><au><snm>Takagi</snm><fnm>T</fnm></au></aug><source>Proceedings of the Pacific Symposium on Biocomputing</source><pubdate>1998</pubdate><fpage>707</fpage><lpage>718</lpage></bibl><bibl id="B6"><title><p>Identifying gene and protein mentions in text using conditional random fields</p></title><aug><au><snm>McDonald</snm><fnm>R</fnm></au><au><snm>Pereira</snm><fnm>F</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2005</pubdate><volume>6</volume><issue>Suppl 1</issue><fpage>S6</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-6-S1-S6</pubid><pubid idtype="pmcid">1866379</pubid><pubid idtype="pmpid" link="fulltext">16351755</pubid></pubidlist></xrefbib></bibl><bibl id="B7"><title><p>Recognizing Biomedical Named Entities Using Skip-Chain Conditional Random Fields</p></title><aug><au><snm>Liu</snm><fnm>J</fnm></au><au><snm>Huang</snm><fnm>M</fnm></au><au><snm>Zhu</snm><fnm>X</fnm></au></aug><source>Proceedings of the Workshop on Biomedical Natural Language Processing, Uppsala, Sweden</source><pubdate>2010</pubdate><fpage>10</fpage><lpage>18</lpage></bibl><bibl id="B8"><title><p>GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles</p></title><aug><au><snm>Friedman</snm><fnm>C</fnm></au><au><snm>Kra</snm><fnm>P</fnm></au><au><snm>Yu</snm><fnm>H</fnm></au><au><snm>Krauthammer</snm><fnm>M</fnm></au><au><snm>Rzhetsky</snm><fnm>A</fnm></au></aug><source>Bioinformatics</source><pubdate>2001</pubdate><volume>17</volume><issue>Suppl 1</issue><fpage>S74</fpage><lpage>82</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/17.suppl_1.S74</pubid><pubid idtype="pmpid" link="fulltext">11472995</pubid></pubidlist></xrefbib></bibl><bibl id="B9"><title><p>Lancet: a high precision medication event extraction system for clinical text</p></title><aug><au><snm>Li</snm><fnm>Z</fnm></au><au><snm>Liu</snm><fnm>F</fnm></au><au><snm>Antieau</snm><fnm>L</fnm></au><au><snm>Yu</snm><fnm>H</fnm></au></aug><source>Journal of the American Medical Informatics Association (JAMIA)</source><pubdate>2010</pubdate><volume>17</volume><issue>5</issue><fpage>563</fpage><lpage>567</lpage><xrefbib><pubid idtype="doi">10.1136/jamia.2010.004077</pubid></xrefbib></bibl><bibl id="B10"><title><p>A thematic analysis of the AIDS literature</p></title><aug><au><snm>Wilbur</snm><fnm>WJ</fnm></au></aug><source>Proceedings of Pacific Symposium on Biocomputing</source><pubdate>2002</pubdate><fpage>386</fpage><lpage>397</lpage></bibl><bibl id="B11"><title><p>An IR-aided machine learning framework for the BioCreative II.5 Challenge</p></title><aug><au><snm>Cao</snm><fnm>Y</fnm></au><au><snm>Li</snm><fnm>Z</fnm></au><au><snm>Liu</snm><fnm>F</fnm></au><au><snm>Agarwal</snm><fnm>S</fnm></au><au><snm>Zhang</snm><fnm>Q</fnm></au><au><snm>Yu</snm><fnm>H</fnm></au></aug><source>IEEE/ACM Transactions on Computational Biololgy and Bioinformatics</source><pubdate>2010</pubdate><volume>7</volume><issue>3</issue><fpage>454</fpage><lpage>461</lpage></bibl><bibl id="B12"><title><p>Mining MEDLINE for implicit links between dietary substances and diseases</p></title><aug><au><snm>Srinivasan</snm><fnm>P</fnm></au><au><snm>Libbus</snm><fnm>B</fnm></au></aug><source>Bioinformatics</source><pubdate>2004</pubdate><volume>20</volume><issue>Suppl 1</issue><fpage>I290</fpage><lpage>I296</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/bth914</pubid><pubid idtype="pmpid" link="fulltext">15262811</pubid></pubidlist></xrefbib></bibl><bibl id="B13"><title><p>Automatically generating gene summaries from biomedical literature</p></title><aug><au><snm>Ling</snm><fnm>X</fnm></au><au><snm>Jiang</snm><fnm>J</fnm></au><au><snm>He</snm><fnm>X</fnm></au><au><snm>Mei</snm><fnm>Q</fnm></au><au><snm>Zhai</snm><fnm>C</fnm></au><au><snm>Schatz</snm><fnm>B</fnm></au></aug><source>Proceedings of the Pacific Symposium on Biocomputing, Maui, Hawaii</source><pubdate>2006</pubdate><fpage>40</fpage><lpage>51</lpage></bibl><bibl id="B14"><title><p>FigSum: automatically generating structured text summaries for figures in biomedical literature</p></title><aug><au><snm>Agarwal</snm><fnm>S</fnm></au><au><snm>Yu</snm><fnm>H</fnm></au></aug><source>Proceedings of the 2009 AMIA Annual Symposium, San Francisco, CA</source><pubdate>2009</pubdate><fpage>6</fpage><lpage>10</lpage></bibl><bibl id="B15"><title><p>Ontology-Based Extraction and Summarization of Protein Mutation Impact Information</p></title><aug><au><snm>Naderi</snm><fnm>N</fnm></au><au><snm>Witte</snm><fnm>R</fnm></au></aug><source>Proceedings of the ACL Workshop on Biomedical Natural Language Processing, Uppsala, Sweden</source><pubdate>2010</pubdate><fpage>128</fpage><lpage>129</lpage></bibl><bibl id="B16"><title><p>Improving Summarization of Biomedical Documents Using Word Sense Disambiguation</p></title><aug><au><snm>Plaza</snm><fnm>L</fnm></au><au><snm>Stevenson</snm><fnm>M</fnm></au><au><snm>Diaz</snm><fnm>A</fnm></au></aug><source>Proceedings of the ACL Workshop on Biomedical Natural Language Processing, Uppsala, Sweden</source><pubdate>2010</pubdate><fpage>55</fpage><lpage>63</lpage></bibl><bibl id="B17"><title><p>Automated image analysis of protein localization in budding yeast</p></title><aug><au><snm>Chen</snm><fnm>SC</fnm></au><au><snm>Zhao</snm><fnm>T</fnm></au><au><snm>Gordon</snm><fnm>GJ</fnm></au><au><snm>Murphy</snm><fnm>RF</fnm></au></aug><source>Bioinformatics</source><pubdate>2007</pubdate><volume>23</volume><issue>13</issue><fpage>i66</fpage><lpage>171</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btm206</pubid><pubid idtype="pmpid" link="fulltext">17646347</pubid></pubidlist></xrefbib></bibl><bibl id="B18"><title><p>Integrating image data into biomedical text categorization</p></title><aug><au><snm>Shatkay</snm><fnm>H</fnm></au><au><snm>Chen</snm><fnm>N</fnm></au><au><snm>Blostein</snm><fnm>D</fnm></au></aug><source>Bioinformatics</source><pubdate>2006</pubdate><volume>22</volume><fpage>e446</fpage><lpage>453</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btl235</pubid><pubid idtype="pmpid" link="fulltext">16873506</pubid></pubidlist></xrefbib></bibl><bibl id="B19"><title><p>Accessing bioscience images from abstract sentences</p></title><aug><au><snm>Yu</snm><fnm>H</fnm></au><au><snm>Lee</snm><fnm>M</fnm></au></aug><source>Bioinformatics</source><pubdate>2006</pubdate><volume>22</volume><fpage>e547</fpage><lpage>556</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btl261</pubid><pubid idtype="pmpid" link="fulltext">16873519</pubid></pubidlist></xrefbib></bibl><bibl id="B20"><title><p>Are figure legends sufficient? Evaluating the contribution of associated text to biomedical figure comprehension</p></title><aug><au><snm>Yu</snm><fnm>H</fnm></au><au><snm>Agarwal</snm><fnm>S</fnm></au><au><snm>Johnston</snm><fnm>M</fnm></au><au><snm>Cohen</snm><fnm>A</fnm></au></aug><source>Journal of Biomedical Discovery and Collaboration</source><pubdate>2009</pubdate><volume>4</volume><fpage>1</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1747-5333-4-1</pubid><pubid idtype="pmcid">2631451</pubid><pubid idtype="pmpid" link="fulltext">19126221</pubid></pubidlist></xrefbib></bibl><bibl id="B21"><title><p>Automatic Figure Ranking and User Interfacing for Intelligent Figure Search</p></title><aug><au><snm>Yu</snm><fnm>H</fnm></au><au><snm>Liu</snm><fnm>F</fnm></au><au><snm>Ramesh</snm><fnm>BP</fnm></au></aug><source>PLoS ONE</source><pubdate>2010</pubdate><volume>5</volume><issue>10</issue><fpage>e12983</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pone.0012983</pubid><pubid idtype="pmcid">2951344</pubid><pubid idtype="pmpid" link="fulltext">20949102</pubid></pubidlist></xrefbib></bibl><bibl id="B22"><title><p>Development, implementation, and a cognitive evaluation of a definitional question answering system for physicians</p></title><aug><au><snm>Yu</snm><fnm>H</fnm></au><au><snm>Lee</snm><fnm>M</fnm></au><au><snm>Kaufman</snm><fnm>D</fnm></au><au><snm>Ely</snm><fnm>J</fnm></au><au><snm>Oshero</snm><fnm>JA</fnm></au><au><snm>Hripcsak</snm><fnm>G</fnm></au><au><snm>Cimino</snm><fnm>J</fnm></au></aug><source>Journal of Biomedical Informatics</source><pubdate>2007</pubdate><volume>40</volume><fpage>236</fpage><lpage>251</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.jbi.2007.03.002</pubid><pubid idtype="pmpid" link="fulltext">17462961</pubid></pubidlist></xrefbib></bibl><bibl id="B23"><title><p>Automatically extracting information needs from complex clinical questions</p></title><aug><au><snm>Cao</snm><fnm>YG</fnm></au><au><snm>Cimino</snm><fnm>JJ</fnm></au><au><snm>Ely</snm><fnm>J</fnm></au><au><snm>Yu</snm><fnm>H</fnm></au></aug><source>Journal of Biomedical Informatics</source><pubdate>2010</pubdate><volume>43</volume><fpage>962</fpage><lpage>971</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.jbi.2010.07.007</pubid><pubid idtype="pmpid" link="fulltext">20670693</pubid></pubidlist></xrefbib></bibl><bibl id="B24"><title><p>Automated classification of citations using linguistic semantic grammars</p></title><aug><au><snm>Garzone</snm><fnm>M</fnm></au></aug><source>PhD thesis</source><publisher>The University of Western Ontario, Ontario, Canada</publisher><pubdate>1996</pubdate></bibl><bibl id="B25"><title><p>Towards an automated citation classifier</p></title><aug><au><snm>Garzone</snm><fnm>M</fnm></au><au><snm>Mercer</snm><fnm>R</fnm></au></aug><source>Proceedings on 13th Biennial Conference of the Canadian Society for Computational Studies of Intelligence</source><pubdate>2000</pubdate><fpage>337</fpage><lpage>346</lpage></bibl><bibl id="B26"><title><p>Toward a catalogue of citation-related rhetorical cues in scientific texts</p></title><aug><au><snm>DiMarco</snm><fnm>C</fnm></au><au><snm>Mercer</snm><fnm>R</fnm></au></aug><source>Proceedings of Pacific Association for Computational Linguistics (PACLING 2003), Halifax, Canada</source><pubdate>2003</pubdate></bibl><bibl id="B27"><title><p>The language of bioscience: fact, speculations, and statements in between</p></title><aug><au><snm>Light</snm><fnm>M</fnm></au><au><snm>Qiu</snm><fnm>X</fnm></au><au><snm>Srinivasan</snm><fnm>P</fnm></au></aug><source>Proceedings of the HLT-NAACL 2004 Workshop: BioLINK, Linking Biological Literature, Ontologies and Databases, Boston, MA</source><pubdate>2004</pubdate><fpage>17</fpage><lpage>24</lpage></bibl><bibl id="B28"><title><p>Multi-Dimensional Classification Of Biomedical Text: Toward Automated, Practical Provision of High-Utility Text to Diverse Users</p></title><aug><au><snm>Shatkay</snm><fnm>H</fnm></au><au><snm>Pan</snm><fnm>F</fnm></au><au><snm>Rzhetsky</snm><fnm>A</fnm></au><au><snm>Wilbur</snm><fnm>WJ</fnm></au></aug><source>Bioinformatics</source><pubdate>2008</pubdate><volume>24</volume><issue>18</issue><fpage>2086</fpage><lpage>2093</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btn381</pubid><pubid idtype="pmcid">2530883</pubid><pubid idtype="pmpid" link="fulltext">18718948</pubid></pubidlist></xrefbib></bibl><bibl id="B29"><title><p>New directions in biomedical text annotation: definitions, guidelines and corpus construction</p></title><aug><au><snm>Wilbur</snm><fnm>WJ</fnm></au><au><snm>Rzhetsky</snm><fnm>A</fnm></au><au><snm>Shatkay</snm><fnm>H</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2006</pubdate><volume>7</volume><fpage>356</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-7-356</pubid><pubid idtype="pmcid">1559725</pubid><pubid idtype="pmpid" link="fulltext">16867190</pubid></pubidlist></xrefbib></bibl><bibl id="B30"><title><p>A baseline feature set for learning rhetorical zones using full articles in the biomedical domain</p></title><aug><au><snm>Mullen</snm><fnm>T</fnm></au><au><snm>Mizuta</snm><fnm>Y</fnm></au><au><snm>Collier</snm><fnm>N</fnm></au></aug><source>ACM SIGKDD Explorations Newsletter</source><pubdate>2005</pubdate><volume>7</volume><fpage>52</fpage><lpage>58</lpage><xrefbib><pubid idtype="doi">10.1145/1089815.1089823</pubid></xrefbib></bibl><bibl id="B31"><title><p>Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion</p></title><aug><au><snm>Agarwal</snm><fnm>S</fnm></au><au><snm>Yu</snm><fnm>H</fnm></au></aug><source>Bioinformatics</source><pubdate>2009</pubdate><volume>25</volume><issue>23</issue><fpage>3174</fpage><lpage>3180</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btp548</pubid><pubid idtype="pmcid">2913661</pubid><pubid idtype="pmpid" link="fulltext">19783830</pubid></pubidlist></xrefbib></bibl><bibl id="B32"><title><p>Merging corpus linguistic and discourse analytic research goals: Discourse units in biology research articles</p></title><aug><au><snm>Biber</snm><fnm>D</fnm></au><au><snm>Jones</snm><fnm>JK</fnm></au></aug><source>Corpus Linguistics and Linguistic Theory</source><pubdate>2005</pubdate><volume>1</volume><issue>2</issue><fpage>151</fpage><lpage>182</lpage></bibl><bibl id="B33"><title><p>TextTiling: Segmenting text into multi-paragraph subtopic passages</p></title><aug><au><snm>Hearst</snm><fnm>MA</fnm></au></aug><source>Computational Linguistics</source><pubdate>1997</pubdate><volume>23</volume><fpage>33</fpage><lpage>64</lpage></bibl><bibl id="B34"><title><p>BioContrasts: extracting and exploiting protein-protein contrastive relations from biomedical literature</p></title><aug><au><snm>jae Kim</snm><fnm>J</fnm></au><au><snm>Zhang</snm><fnm>Z</fnm></au><au><snm>Park</snm><fnm>JC</fnm></au><au><snm>Ng</snm><fnm>SK</fnm></au></aug><source>Bioinformatics</source><pubdate>2006</pubdate><volume>22</volume><issue>5</issue><fpage>597</fpage><lpage>605</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btk016</pubid><pubid idtype="pmpid" link="fulltext">16368768</pubid></pubidlist></xrefbib></bibl><bibl id="B35"><title><p>Anaphora resolution in biomedical literature</p></title><aug><au><snm>Castano</snm><fnm>J</fnm></au><au><snm>Zhang</snm><fnm>J</fnm></au><au><snm>Pustejovsky</snm><fnm>J</fnm></au></aug><source>International Symposium on Reference Resolution</source><pubdate>2002</pubdate></bibl><bibl id="B36"><title><p>The BioScope corpus: annotation for negation, uncertainty and their scope in biomedical texts</p></title><aug><au><snm>Szarvas</snm><fnm>G</fnm></au><au><snm>Vincze</snm><fnm>V</fnm></au><au><snm>Farkas</snm><fnm>R</fnm></au><au><snm>Csirik</snm><fnm>J</fnm></au></aug><source>Proceedings of BioNLP 2008: Current Trends in Biomedical Natural Language Processing, Columbus, Ohio</source><pubdate>2008</pubdate><fpage>38</fpage><lpage>45</lpage></bibl><bibl id="B37"><title><p>Detecting Hedge Cues and their Scope in Biomedical Literature with Conditional Random Fields</p></title><aug><au><snm>Agarwal</snm><fnm>S</fnm></au><au><snm>Yu</snm><fnm>H</fnm></au></aug><source>Journal of Biomedical Informatics</source><pubdate>2010</pubdate><volume>43</volume><issue>6</issue><fpage>953</fpage><lpage>961</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.jbi.2010.08.003</pubid><pubid idtype="pmpid" link="fulltext">20709188</pubid></pubidlist></xrefbib></bibl><bibl id="B38"><title><p>Biomedical Negation Scope Detection with Conditional Random Fields</p></title><aug><au><snm>Agarwal</snm><fnm>S</fnm></au><au><snm>Yu</snm><fnm>H</fnm></au></aug><source>Journal of the Americian Medical Informatics Association (JAMIA)</source><pubdate>2010</pubdate><volume>17</volume><fpage>696</fpage><lpage>701</lpage><xrefbib><pubid idtype="doi">10.1136/jamia.2010.003228</pubid></xrefbib></bibl><bibl id="B39"><title><p>GENIA corpus - semantically annotated corpus for bio-textmining</p></title><aug><au><snm>Kim</snm><fnm>J</fnm></au><au><snm>Ohta</snm><fnm>T</fnm></au><au><snm>Tateisi</snm><fnm>Y</fnm></au><au><snm>Tsujii</snm><fnm>J</fnm></au></aug><source>Bioinformatics</source><pubdate>2003</pubdate><volume>19</volume><issue>Suppl 1</issue><fpage>i180</fpage><lpage>182</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btg1023</pubid><pubid idtype="pmpid" link="fulltext">12855455</pubid></pubidlist></xrefbib></bibl><bibl id="B40"><title><p>Guidelines for the annotation of General Scientific Concepts</p></title><aug><au><snm>Liakata</snm><fnm>M</fnm></au><au><snm>Soldatova</snm><fnm>L</fnm></au></aug><pubdate>2008</pubdate><url>http://ie-repository.jisc.ac.uk</url><note>[JISC Project Report]</note></bibl><bibl id="B41"><title><p>Semantic Annotation of Papers: Interface &amp; Enrichment Tool (SAPIENT)</p></title><aug><au><snm>Liakata</snm><fnm>M</fnm></au><au><snm>Q</snm><fnm>C</fnm></au><au><snm>Soldatova</snm><fnm>LN</fnm></au></aug><source>Proceedings of the BioNLP 2009 Workshop, Boulder, Colorado: Association for Computational Linguistics</source><pubdate>2009</pubdate><fpage>193</fpage><lpage>200</lpage><url>http://www.aclweb.org/anthology/W09-1325</url></bibl><bibl id="B42"><title><p>Processing of Notch and amyloid precursor protein by gamma-secretase is spatially distinct</p></title><aug><au><snm>Tarassishin</snm><fnm>L</fnm></au><au><snm>Yin</snm><fnm>YI</fnm></au><au><snm>Bassit</snm><fnm>B</fnm></au><au><snm>Li</snm><fnm>YM</fnm></au></aug><source>Proceedings of the National Academy of Sciences USA</source><pubdate>2004</pubdate><volume>101</volume><issue>49</issue><fpage>17050</fpage><lpage>17055</lpage><xrefbib><pubid idtype="doi">10.1073/pnas.0408007101</pubid></xrefbib></bibl><bibl id="B43"><title><p>Characterization of otoconin-95, the major protein of murine otoconia, provides insights into the formation of these inner ear biominerals</p></title><aug><au><snm>Verpy</snm><fnm>E</fnm></au><au><snm>Leibovici</snm><fnm>M</fnm></au><au><snm>Petit</snm><fnm>C</fnm></au></aug><source>Proceedings of the National Academy of Sciences USA</source><pubdate>1999</pubdate><volume>96</volume><issue>2</issue><fpage>529</fpage><lpage>534</lpage><xrefbib><pubid idtype="doi">10.1073/pnas.96.2.529</pubid></xrefbib></bibl><bibl id="B44"><title><p>Using Syntax to Disambiguate Explicit Discourse Connectives in Text</p></title><aug><au><snm>Pitler</snm><fnm>E</fnm></au><au><snm>Nenkova</snm><fnm>A</fnm></au></aug><source>Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP (ACL-IJCNLP 2009: Short Papers), Suntec, Singapore</source><pubdate>2009</pubdate><fpage>13</fpage><lpage>16</lpage></bibl><bibl id="B45"><title><p>Identifying Discourse Connectives in Biomedical Text</p></title><aug><au><snm>Ramesh</snm><fnm>BP</fnm></au><au><snm>Yu</snm><fnm>H</fnm></au></aug><source>Proceedings of the AMIA 2010 Symposium, Washington, D.C</source><pubdate>2010</pubdate><fpage>657</fpage><lpage>661</lpage></bibl><bibl id="B46"><title><p>Attribution and the (Non)-Alignment of Syntactic and Discourse Arguments of Connectives</p></title><aug><au><snm>Dinesh</snm><fnm>N</fnm></au><au><snm>Lee</snm><fnm>A</fnm></au><au><snm>Miltsakaki</snm><fnm>E</fnm></au><au><snm>Prasad</snm><fnm>R</fnm></au><au><snm>Joshi</snm><fnm>A</fnm></au><au><snm>Webber</snm><fnm>B</fnm></au></aug><source>Proceedings of the ACL Workshop on Frontiers in Corpus Annotation II: Pie in the Sky, Ann Arbor, MI</source><pubdate>2005</pubdate><fpage>29</fpage><lpage>36</lpage></bibl><bibl id="B47"><title><p>Automatically Identifying the Arguments of Discourse Connectives</p></title><aug><au><snm>Wellner</snm><fnm>B</fnm></au><au><snm>Pustejovsky</snm><fnm>J</fnm></au></aug><source>Proceedings of EMNLP-CoNLL, Prague, Czech Republic</source><pubdate>2007</pubdate><fpage>92</fpage><lpage>101</lpage></bibl><bibl id="B48"><title><p>Discourse connective argument identification with connective specific rankers</p></title><aug><au><snm>Elwell</snm><fnm>R</fnm></au><au><snm>Baldridge</snm><fnm>J</fnm></au></aug><source>Proceedings of the IEEE International Conference on Semantic Computing (ICSC), Santa Clara, CA</source><pubdate>2008</pubdate><fpage>198</fpage><lpage>205</lpage></bibl><bibl id="B49"><title><p>The Penn Discourse TreeBank 2.0</p></title><aug><au><snm>Prasad</snm><fnm>R</fnm></au><au><snm>Dinesh</snm><fnm>N</fnm></au><au><snm>Lee</snm><fnm>A</fnm></au><au><snm>Miltsakaki</snm><fnm>E</fnm></au><au><snm>Robaldo</snm><fnm>L</fnm></au><au><snm>Joshi</snm><fnm>A</fnm></au><au><snm>Webber</snm><fnm>B</fnm></au></aug><source>Proceedings of 6th International Conference on Language Resources and Evaluation (LREC), Marrackech, Morocco</source><pubdate>2008</pubdate></bibl><bibl id="B50"><title><p>Exploiting Scope for Shallow Discourse Parsing</p></title><aug><au><snm>Prasad</snm><fnm>R</fnm></au><au><snm>Joshi</snm><fnm>A</fnm></au><au><snm>Webber</snm><fnm>B</fnm></au></aug><source>Proceedings of the Seventh International Conference on Language Resources and their Evaluation (LREC), Valletta, Malta</source><pubdate>2010</pubdate><fpage>2076</fpage><lpage>2083</lpage></bibl><bibl id="B51"><title><p>Experiments on Sense Annotation and Sense Disambiguation of Discourse Connectives</p></title><aug><au><snm>Miltsakaki</snm><fnm>E</fnm></au><au><snm>Dinesh</snm><fnm>N</fnm></au><au><snm>Prasad</snm><fnm>R</fnm></au><au><snm>Joshi</snm><fnm>A</fnm></au><au><snm>Webber</snm><fnm>B</fnm></au></aug><source>Proceedings of the Fourth Workshop on Treebanks and Linguistic Theories (TLT), Barcelona, Spain</source><pubdate>2005</pubdate></bibl><bibl id="B52"><title><p>Easily Identifiable Discourse Relations</p></title><aug><au><snm>Pitler</snm><fnm>E</fnm></au><au><snm>Raghupathy</snm><fnm>M</fnm></au><au><snm>Mehta</snm><fnm>H</fnm></au><au><snm>Nenkova</snm><fnm>A</fnm></au><au><snm>Lee</snm><fnm>A</fnm></au><au><snm>Joshi</snm><fnm>A</fnm></au></aug><source>Proceedings of the 22nd International Conference on Computational Linguistics (COLING 2008: Posters), Manchester, U.K</source><pubdate>2008</pubdate><fpage>87</fpage><lpage>90</lpage></bibl><bibl id="B53"><title><p>An Unsupervised Approach to Recognizing Discourse Relations</p></title><aug><au><snm>Marcu</snm><fnm>D</fnm></au><au><snm>Echihabi</snm><fnm>A</fnm></au></aug><source>Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA</source><pubdate>2002</pubdate><fpage>368</fpage><lpage>375</lpage></bibl><bibl id="B54"><title><p>Recognizing Implicit Discourse Relations in the Penn Discourse Treebank</p></title><aug><au><snm>Lin</snm><fnm>Z</fnm></au><au><snm>Kan</snm><fnm>MY</fnm></au><au><snm>Ng</snm><fnm>HT</fnm></au></aug><source>Proceedings of the Conference on Empirical Methods in Natural Language Processing, Suntec, Singapore</source><pubdate>2009</pubdate><fpage>343</fpage><lpage>351</lpage></bibl><bibl id="B55"><title><p>Automatic sense prediction for implicit discourse relations in text</p></title><aug><au><snm>Pitler</snm><fnm>E</fnm></au><au><snm>Louis</snm><fnm>A</fnm></au><au><snm>Nenkova</snm><fnm>A</fnm></au></aug><source>Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, Suntec, Singapore</source><pubdate>2009</pubdate><fpage>683</fpage><lpage>691</lpage></bibl><bibl id="B56"><title><p>Sequence Models and Re-ranking Methods for Discourse Parsing</p></title><aug><au><snm>Wellner</snm><fnm>B</fnm></au></aug><source>PhD thesis, Brandeis University, Boston, MA</source><pubdate>2009</pubdate></bibl><bibl id="B57"><title><p>Predicting Discourse Connectives for Implicit Discourse Relation Recognition</p></title><aug><au><snm>Zhi-Min</snm><fnm>Z</fnm></au><au><snm>Man</snm><fnm>L</fnm></au><au><snm>Yu</snm><fnm>X</fnm></au><au><snm>Zheng-Yu</snm><fnm>N</fnm></au><au><snm>Jian</snm><fnm>S</fnm></au></aug><source>Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010: Posters), Beijing, China</source><pubdate>2010</pubdate><fpage>1507</fpage><lpage>1514</lpage></bibl><bibl id="B58"><title><p>Using Entity Features to Classify Implicit Discourse Relations</p></title><aug><au><snm>Louis</snm><fnm>A</fnm></au><au><snm>Joshi</snm><fnm>A</fnm></au><au><snm>Prasad</snm><fnm>R</fnm></au><au><snm>Nenkova</snm><fnm>A</fnm></au></aug><source>Proceedings of the SIGDIAL Conference, Tokyo, Japan</source><pubdate>2010</pubdate><fpage>59</fpage><lpage>62</lpage></bibl><bibl id="B59"><title><p>The rhetorical parsing, summarization and generation of natural language texts</p></title><aug><au><snm>Marcu</snm><fnm>D</fnm></au></aug><source>PhD thesis, University of Toronto</source><pubdate>1997</pubdate></bibl><bibl id="B60"><title><p>Building a Large Annotated Corpus of English: The Penn Treebank</p></title><aug><au><snm>Marcus</snm><fnm>MP</fnm></au><au><snm>Santorini</snm><fnm>B</fnm></au><au><snm>Marcinkiewicz</snm><fnm>MA</fnm></au></aug><source>Computational Linguistics</source><pubdate>1993</pubdate><volume>19</volume><issue>2</issue><fpage>313</fpage><lpage>330</lpage></bibl><bibl id="B61"><title><p>Automatically Classifying the Role of Citations in Biomedical Articles</p></title><aug><au><snm>Agarwal</snm><fnm>S</fnm></au><au><snm>Choubey</snm><fnm>L</fnm></au><au><snm>Yu</snm><fnm>H</fnm></au></aug><source>Proceedings of American Medical Informatics Association Fall Symposium (AMIA), Washington, D.C</source><pubdate>2010</pubdate><fpage>11</fpage><lpage>15</lpage></bibl><bibl id="B62"><title><p>Anchoring a Lexicalized Tree-Adjoining Grammar for Discourse</p></title><aug><au><snm>Webber</snm><fnm>B</fnm></au><au><snm>Joshi</snm><fnm>A</fnm></au></aug><source>Discourse Relations and Discourse Markers: Proceedings of the Conference</source><publisher>Somerset, New Jersey: Association for Computational Linguistics</publisher><editor>Stede M, Wanner L, Hovy E</editor><pubdate>1998</pubdate><fpage>86</fpage><lpage>92</lpage></bibl><bibl id="B63"><title><p>Anaphora and Discourse Structure</p></title><aug><au><snm>Webber</snm><fnm>B</fnm></au><au><snm>Joshi</snm><fnm>A</fnm></au><au><snm>Stone</snm><fnm>M</fnm></au><au><snm>Knott</snm><fnm>A</fnm></au></aug><source>Computational Linguistics</source><pubdate>2003</pubdate><volume>29</volume><issue>4</issue><fpage>545</fpage><lpage>587</lpage><xrefbib><pubid idtype="doi">10.1162/089120103322753347</pubid></xrefbib></bibl><bibl id="B64"><aug><au><snm>Asher</snm><fnm>N</fnm></au></aug><source>Reference to Abstract Objects</source><publisher>Dordrecht: Kluwer</publisher><pubdate>1993</pubdate></bibl><bibl id="B65"><title><p>Review of 'coherence in natural language: data structures and applications'</p></title><aug><au><snm>Knott</snm><fnm>A</fnm></au></aug><source>Computational Linguistics</source><pubdate>2007</pubdate><volume>33</volume><fpage>591</fpage><lpage>595</lpage><xrefbib><pubid idtype="doi">10.1162/coli.2007.33.4.591</pubid></xrefbib></bibl><bibl id="B66"><title><p>Rhetorical Structure Theory. Toward a Functional Theory of Text Organization</p></title><aug><au><snm>Mann</snm><fnm>W</fnm></au><au><snm>Thompson</snm><fnm>S</fnm></au></aug><source>Text</source><pubdate>1988</pubdate><volume>8</volume><issue>3</issue><fpage>243</fpage><lpage>281</lpage></bibl><bibl id="B67"><title><p>The Linguistic Discourse Model: Towards a Formal Theory of Discourse Structure</p></title><aug><au><snm>Polanyi</snm><fnm>L</fnm></au></aug><publisher>Tech. Rep. 6409, Bolt Beranek and Newman, Inc., Cambridge, Mass</publisher><pubdate>1987</pubdate></bibl><bibl id="B68"><title><p>Evaluating and integrating treebank parsers on a biomedical corpus</p></title><aug><au><snm>Clegg</snm><fnm>A</fnm></au><au><snm>Shepherd</snm><fnm>A</fnm></au></aug><source>Proceedings of the Workshop on Software, Ann Arbor, Michigan</source><pubdate>2005</pubdate><fpage>14</fpage><lpage>33</lpage></bibl><bibl id="B69"><aug><au><snm>Asher</snm><fnm>N</fnm></au><au><snm>Lascarides</snm><fnm>A</fnm></au></aug><source>Logics of conversation</source><publisher>Cambridge University Press</publisher><pubdate>2003</pubdate></bibl><bibl id="B70"><title><p>Representing Discourse Coherence: A corpus-based study</p></title><aug><au><snm>Wolf</snm><fnm>F</fnm></au><au><snm>Gibson</snm><fnm>E</fnm></au></aug><source>Computational Linguistics</source><pubdate>2005</pubdate><volume>31</volume><issue>2</issue><fpage>249</fpage><lpage>288</lpage><xrefbib><pubid idtype="doi">10.1162/0891201054223977</pubid></xrefbib></bibl><bibl id="B71"><title><p>Complexity of Dependencies in Discourse: Are Dependencies in Discourse More Complex Than in Syntax?</p></title><aug><au><snm>Lee</snm><fnm>A</fnm></au><au><snm>Prasad</snm><fnm>R</fnm></au><au><snm>Joshi</snm><fnm>A</fnm></au><au><snm>Dinesh</snm><fnm>N</fnm></au><au><snm>Webber</snm><fnm>B</fnm></au></aug><source>Proceedings of the 5th International Workshop on Treebanks and Linguistic Theories (TLT), Prague, Czech Republic</source><pubdate>2006</pubdate></bibl><bibl id="B72"><title><p>Departures from Tree Structures in Discourse: Shared Arguments in the Penn Discourse Treebank</p></title><aug><au><snm>Lee</snm><fnm>A</fnm></au><au><snm>Prasad</snm><fnm>R</fnm></au><au><snm>Joshi</snm><fnm>A</fnm></au><au><snm>Webber</snm><fnm>B</fnm></au></aug><source>Proceedings of the Constraints in Discourse III Workshop, Potsdam, Germany</source><pubdate>2008</pubdate></bibl><bibl id="B73"><title><p>Annotating discourse connectives and their arguments</p></title><aug><au><snm>Miltsakaki</snm><fnm>E</fnm></au><au><snm>Prasad</snm><fnm>R</fnm></au><au><snm>Joshi</snm><fnm>A</fnm></au><au><snm>Webber</snm><fnm>B</fnm></au></aug><source>Proceedings of the HLT/NAACL Workshop on Frontiers in Corpus Annotation, Boston, MA</source><pubdate>2004</pubdate><fpage>9</fpage><lpage>16</lpage></bibl><bibl id="B74"><title><p>A Pilot Annotation to Investigate Discourse Connectivity in Biomedical Text</p></title><aug><au><snm>Yu</snm><fnm>H</fnm></au><au><snm>Frid</snm><fnm>N</fnm></au><au><snm>McRoy</snm><fnm>S</fnm></au><au><snm>Prasad</snm><fnm>R</fnm></au><au><snm>Lee</snm><fnm>A</fnm></au><au><snm>Joshi</snm><fnm>A</fnm></au></aug><source>Proceedings of the ACL:HLT 2008 BioNLP Workshop, Columbus, Ohio</source><pubdate>2008</pubdate><fpage>92</fpage><lpage>93</lpage></bibl><bibl id="B75"><title><p>Exploring Discourse Connectivity in Biomedical Text for Text Mining</p></title><aug><au><snm>Yu</snm><fnm>H</fnm></au><au><snm>Frid</snm><fnm>N</fnm></au><au><snm>McRoy</snm><fnm>S</fnm></au><au><snm>Simpson</snm><fnm>P</fnm></au><au><snm>Prasad</snm><fnm>R</fnm></au><au><snm>Lee</snm><fnm>A</fnm></au><au><snm>Joshi</snm><fnm>A</fnm></au></aug><source>Proceedings of the 16th Annual International Conference on Intelligent Systems for Molecular Biology BioLINK SIG Meeting, Toronto, Canada</source><pubdate>2008</pubdate></bibl><bibl id="B76"><title><p>Building and Refining Rhetorical-Semantic Relation Models</p></title><aug><au><snm>Blair-Goldensohn</snm><fnm>S</fnm></au><au><snm>McKeown</snm><fnm>KR</fnm></au><au><snm>Rambow</snm><fnm>O</fnm></au></aug><source>Proceedings of NAACL-HLT, Rochester, NY</source><pubdate>2007</pubdate><fpage>428</fpage><lpage>435</lpage></bibl><bibl id="B77"><title><p>Sentence-Initial Discourse Connectives, Discourse Structure and Semantics</p></title><aug><au><snm>Webber</snm><fnm>B</fnm></au><au><snm>Prasad</snm><fnm>R</fnm></au></aug><source>Proceedings of the Workshop on Formal and Experimental Approaches to Discourse Particles and Modal Adverbs, Hamburg, Germany</source><pubdate>2008</pubdate></bibl><bibl id="B78"><title><p>Genre distinctions for discourse in the Penn TreeBank</p></title><aug><au><snm>Webber</snm><fnm>B</fnm></au></aug><source>Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, Suntec, Singapore</source><pubdate>2009</pubdate><fpage>674</fpage><lpage>682</lpage></bibl><bibl id="B79"><title><p>A Discourse-based Approach to Generating Why-Questions from Texts</p></title><aug><au><snm>Prasad</snm><fnm>R</fnm></au><au><snm>Joshi</snm><fnm>A</fnm></au></aug><source>Proceedings of the Workshop on the Question Generation Shared Task and Evaluation Challenge, Arlington, VA</source><pubdate>2008</pubdate></bibl><bibl id="B80"><title><p>Refining the Meaning of Sense Labels in PDTB: "Concession"</p></title><aug><au><snm>Robaldo</snm><fnm>L</fnm></au><au><snm>Miltsakaki</snm><fnm>E</fnm></au><au><snm>Hobbs</snm><fnm>J</fnm></au></aug><source>Proceedings of Symposium on Semantics in Text Processing (STEP), Venice, Italy</source><pubdate>2008</pubdate><fpage>207</fpage><lpage>219</lpage></bibl><bibl id="B81"><title><p>Realization of Discourse Relations by Other Means: Alternative Lexicalizations</p></title><aug><au><snm>Prasad</snm><fnm>R</fnm></au><au><snm>Joshi</snm><fnm>A</fnm></au><au><snm>Webber</snm><fnm>B</fnm></au></aug><source>Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010: Posters), Beijing, China</source><pubdate>2010</pubdate><fpage>1023</fpage><lpage>1031</lpage></bibl><bibl id="B82"><title><p>A Semi-Supervised Approach to Improve Classification of Infrequent Discourse Relations Using Feature Vector Extension</p></title><aug><au><snm>Hernault</snm><fnm>H</fnm></au><au><snm>Bollegala</snm><fnm>D</fnm></au><au><snm>Ishizuka</snm><fnm>M</fnm></au></aug><source>Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-2010), Cambridge, MA</source><pubdate>2010</pubdate><fpage>399</fpage><lpage>409</lpage></bibl><bibl id="B83"><title><p>Discourse Indicators for Content Selection in Summarization</p></title><aug><au><snm>Louis</snm><fnm>A</fnm></au><au><snm>Joshi</snm><fnm>A</fnm></au><au><snm>Nenkova</snm><fnm>A</fnm></au></aug><source>Proceedings of the SIGDIAL Conference, Tokyo, Japan</source><pubdate>2010</pubdate><fpage>147</fpage><lpage>156</lpage></bibl><bibl id="B84"><title><p>A PDTB-Styled End-to-End Discourse Parser</p></title><aug><au><snm>Lin</snm><fnm>Z</fnm></au><au><snm>Ng</snm><fnm>HT</fnm></au><au><snm>Kan</snm><fnm>MY</fnm></au></aug><source>Tech. Rep. TRB8/10, School of Computing, National University of Singapore</source><pubdate>2010</pubdate></bibl><bibl id="B85"><title><p>A Discourse Resource for Turkish: Annotating Discourse Connectives in the METU Corpus</p></title><aug><au><snm>Zeyrek</snm><fnm>D</fnm></au><au><snm>Webber</snm><fnm>B</fnm></au></aug><source>Proceedings of the 6th Workshop on Asian Language Resources, Hyderabad, India</source><pubdate>2008</pubdate><fpage>65</fpage><lpage>71</lpage></bibl><bibl id="B86"><title><p>The Hindi Discourse Relation Bank</p></title><aug><au><snm>Oza</snm><fnm>U</fnm></au><au><snm>Prasad</snm><fnm>R</fnm></au><au><snm>Kolachina</snm><fnm>S</fnm></au><au><snm>Sharma</snm><fnm>DM</fnm></au><au><snm>Joshi</snm><fnm>A</fnm></au></aug><source>Proceedings of the Third Linguistic Annotation Workshop (LAW-III), ACL-IJCNLP-2009, Suntec, Singapore</source><pubdate>2009</pubdate><fpage>158</fpage><lpage>161</lpage></bibl><bibl id="B87"><title><p>Experiments with Annotating Discourse Relations in the Hindi Discourse Relation Bank</p></title><aug><au><snm>Oza</snm><fnm>U</fnm></au><au><snm>Prasad</snm><fnm>R</fnm></au><au><snm>Kolachina</snm><fnm>S</fnm></au><au><snm>Meena</snm><fnm>S</fnm></au><au><snm>Sharma</snm><fnm>DM</fnm></au><au><snm>Joshi</snm><fnm>A</fnm></au></aug><source>Proceedings of the 7th International Conference on Natural Language Processing (ICON-2009), Hyderabad, India</source><pubdate>2009</pubdate></bibl><bibl id="B88"><title><p>Annotating Discourse Connectives in the Chinese Treebank</p></title><aug><au><snm>Xue</snm><fnm>N</fnm></au></aug><source>Proceedings of the ACL Workshop on Frontiers in Corpus Annotation II: Pie in the Sky, Ann Arbor, MI</source><pubdate>2005</pubdate><fpage>84</fpage><lpage>91</lpage></bibl><bibl id="B89"><title><p>From Sentence to Discourse: Building an Annotation Scheme for Discourse Based on Prague Dependency Treebank</p></title><aug><au><snm>Mladova</snm><fnm>L</fnm></au><au><snm>Zikanova</snm><fnm>Sarka</fnm></au><au><snm>Hajicova</snm><fnm>E</fnm></au></aug><source>Proceedings of the Sixth International Language Resources and Evaluation (LREC&apos;08)</source><pubdate>2008</pubdate></bibl><bibl id="B90"><title><p>Annotation of Discourse Relations for Conversational Spoken Dialogs</p></title><aug><au><snm>Tonelli</snm><fnm>S</fnm></au><au><snm>Riccardi</snm><fnm>G</fnm></au><au><snm>Prasad</snm><fnm>R</fnm></au><au><snm>Joshi</snm><fnm>A</fnm></au></aug><source>Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010), Valletta, Malta</source><pubdate>2010</pubdate><fpage>2084</fpage><lpage>2090</lpage></bibl><bibl id="B91"><title><p>The Biomedical Discourse Relation Bank (BioDRB) Annotation Guidelines</p></title><aug><au><snm>Prasad</snm><fnm>R</fnm></au><au><snm>Mcroy</snm><fnm>S</fnm></au><au><snm>Frid</snm><fnm>N</fnm></au><au><snm>Yu</snm><fnm>H</fnm></au></aug><pubdate>2010</pubdate><url>Http://spring.ims.uwm.edu/uploads/biodrb_guidelines.pdf</url></bibl><bibl id="B92"><title><p>Presupposition and Linguistic Context</p></title><aug><au><snm>Karttunen</snm><fnm>L</fnm></au></aug><source>Theoretical Linguistics</source><pubdate>1974</pubdate><volume>1</volume><fpage>181</fpage><lpage>94</lpage><xrefbib><pubid idtype="doi">10.1515/thli.1974.1.1-3.181</pubid></xrefbib></bibl><bibl id="B93"><title><p>Sense Annotation in the Penn Discourse Treebank</p></title><aug><au><snm>Miltsakaki</snm><fnm>E</fnm></au><au><snm>Robaldo</snm><fnm>L</fnm></au><au><snm>Lee</snm><fnm>A</fnm></au><au><snm>Joshi</snm><fnm>A</fnm></au></aug><source>Computational Linguistics and Intelligent Text Processing, Lecture Notes in Computer Science</source><pubdate>2008</pubdate><volume>4919</volume><fpage>275</fpage><lpage>286</lpage><xrefbib><pubid idtype="doi">10.1007/978-3-540-78135-6_23</pubid></xrefbib></bibl><bibl id="B94"><title><p>The textual characteristics of traditional and Open Access scientific journals are similar</p></title><aug><au><snm>Verspoor</snm><fnm>K</fnm></au><au><snm>Cohen</snm><fnm>KB</fnm></au><au><snm>Hunter</snm><fnm>L</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2009</pubdate><volume>10</volume><fpage>183</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-10-183</pubid><pubid idtype="pmcid">2714574</pubid><pubid idtype="pmpid" link="fulltext">19527520</pubid></pubidlist></xrefbib></bibl><bibl id="B95"><title><p>A simple, fast, and effective rule learner</p></title><aug><au><snm>Cohen</snm><fnm>WW</fnm></au><au><snm>Singer</snm><fnm>Y</fnm></au></aug><source>Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence (AAAI &apos;99/IAAI &apos;99), Orlando, FL</source><pubdate>1999</pubdate><fpage>335</fpage><lpage>342</lpage></bibl><bibl id="B96"><aug><au><snm>Harris</snm><fnm>Z</fnm></au></aug><source>A Grammar of English on mathematical principles</source><publisher>New York: Wiley</publisher><pubdate>1982</pubdate></bibl><bibl id="B97"><aug><au><snm>Harris</snm><fnm>Z</fnm></au></aug><source>A theory of language and information: a mathematical approach</source><publisher>Oxford: Clarendon Press</publisher><pubdate>1991</pubdate></bibl><bibl id="B98"><title><p>Two biomedical sublanguages: A description based on the theories of Zellig</p></title><aug><au><snm>Friedman</snm><fnm>C</fnm></au><au><snm>Kra</snm><fnm>P</fnm></au><au><snm>Rzhetsky</snm><fnm>A</fnm></au></aug><source>Journal of Biomedical Informatics</source><pubdate>2002</pubdate><volume>35</volume><issue>4</issue><fpage>222</fpage><lpage>235</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S1532-0464(03)00012-1</pubid><pubid idtype="pmpid">12755517</pubid></pubidlist></xrefbib></bibl><bibl id="B99"><title><p>A qualitative comparison of scientific and journalistic texts from the perspective of extracting definitions</p></title><aug><au><snm>Gabbay</snm><fnm>I</fnm></au><au><snm>Sutcliffe</snm><fnm>R</fnm></au></aug><source>Proceedings of the ACL Workshop on Question Answering in Retricted Domains, Barcelona, Spain</source><pubdate>2004</pubdate><fpage>16</fpage><lpage>22</lpage></bibl><bibl id="B100"><title><p>Discoursal movements in medical English abstracts and their linguistic exponents: A genre analysis study</p></title><aug><au><snm>Salanger-Meyer</snm><fnm>F</fnm></au></aug><source>INTERFACE: Journal of Applied Linguistics</source><pubdate>1990</pubdate><volume>4</volume><issue>2</issue><fpage>107</fpage><lpage>124</lpage></bibl><bibl id="B101"><aug><au><snm>Swales</snm><fnm>J</fnm></au></aug><source>Genre Analysis: English in Academic and Research Settings</source><publisher>Cambridge, England: Cambridge University Press</publisher><pubdate>1990</pubdate></bibl><bibl id="B102"><title><p>The introduction, methods, results, and discussion (IMRAD) structure: a fifty-year survey</p></title><aug><au><snm>Sollaci</snm><fnm>LB</fnm></au><au><snm>Pereira</snm><fnm>MG</fnm></au></aug><source>Journal of the Medical Library Association</source><pubdate>2004</pubdate><volume>92</volume><issue>3</issue><fpage>364</fpage><lpage>371</lpage><xrefbib><pubidlist><pubid idtype="pmcid">442179</pubid><pubid idtype="pmpid">15243643</pubid></pubidlist></xrefbib></bibl></refgrp>
</bm></art>