Email updates

Keep up to date with the latest news and content from BMC Medical Informatics and Decision Making and BioMed Central.

Open Access Research article

Detecting causality from online psychiatric texts using inter-sentential language patterns

Jheng-Long Wu, Liang-Chih Yu and Pei-Chann Chang*

Author Affiliations

College of Informatics, Department of Information Management, Yuan Ze University, Chung-Li, Taiwan, Republic of China

For all author emails, please log on.

BMC Medical Informatics and Decision Making 2012, 12:72  doi:10.1186/1472-6947-12-72

The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1472-6947/12/72


Received:14 February 2012
Accepted:18 July 2012
Published:18 July 2012

© 2012 Wu et al.; licensee BioMed Central Ltd.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background

Online psychiatric texts are natural language texts expressing depressive problems, published by Internet users via community-based web services such as web forums, message boards and blogs. Understanding the cause-effect relations embedded in these psychiatric texts can provide insight into the authors’ problems, thus increasing the effectiveness of online psychiatric services.

Methods

Previous studies have proposed the use of word pairs extracted from a set of sentence pairs to identify cause-effect relations between sentences. A word pair is made up of two words, with one coming from the cause text span and the other from the effect text span. Analysis of the relationship between these words can be used to capture individual word associations between cause and effect sentences. For instance, (broke up, life) and (boyfriend, meaningless) are two word pairs extracted from the sentence pair: “I broke up with my boyfriend. Life is now meaningless to me”. The major limitation of word pairs is that individual words in sentences usually cannot reflect the exact meaning of the cause and effect events, and thus may produce semantically incomplete word pairs, as the previous examples show. Therefore, this study proposes the use of inter-sentential language patterns such as ≪broke up, boyfriend>, <life, meaningless≫ to detect causality between sentences. The inter-sentential language patterns can capture associations among multiple words within and between sentences, thus can provide more precise information than word pairs. To acquire inter-sentential language patterns, we develop a text mining framework by extending the classical association rule mining algorithm such that it can discover frequently co-occurring patterns across the sentence boundary.

Results

Performance was evaluated on a corpus of texts collected from PsychPark (http://www.psychpark.org webcite), a virtual psychiatric clinic maintained by a group of volunteer professionals from the Taiwan Association of Mental Health Informatics. Experimental results show that the use of inter-sentential language patterns outperformed the use of word pairs proposed in previous studies.

Conclusions

This study demonstrates the acquisition of inter-sentential language patterns for causality detection from online psychiatric texts. Such semantically more complete and precise features can improve causality detection performance.

Keywords:
Causality detection; Inter-sentential language patterns; Biomedical text mining; Natural language processing

Background

Online community-based services such as web forums, message boards, and blogs provide an efficient and effective way for sharing information and gathering knowledge [1-3]. In the field of mental health care, these services allow individuals to describe their life stresses and depressive problems to other Internet users or health professionals who can then make recommendations to help the subject developing the knowledge needed to seek appropriate care. Examples of these websites include Depression Forumsa, PsychParkb, SA-UKc, WebMDd, and Yahoo!Answerse. This paper refers to this type of online post as online psychiatric texts, and their major characteristic is that they are in the form of natural language texts, featuring many cause-effect relations between sentences. Some examples of causality sentences are presented below:

(E1) I couldn’t sleep for several days because my boss cut my salary.

(E2) I failed again. I felt very upset.

(E3) I broke up with my boyfriend. Life now is meaningless to me.

These examples indicate three depressive problems caused by negative life events experienced by the speaker. Awareness of such cause-effect relations between sentences can improve our understanding of users’ problems and make online psychiatric services more effective. For instance, systems capable of identifying causality from online forum posts could assist health professionals in capturing users’ background information more quickly, thus decreasing response time. Additionally, a dialog system could generate supportive responses if it could understand depressive problems and their associated reasons embedded in users’ input. Recent studies also show that causality is an important concept in biomedical informatics [4], and identifying cause-effect relations as well as other semantic relations could improve the effectiveness of many applications such as question answering [5-7], biomedical text mining [8-10], future event prediction [11], information retrieval [12], and e-learning [13]. Therefore, this paper proposes a text mining framework to detect cause-effect relations between sentences from online psychiatric texts.

Causality (or a cause-effect relation) is a relation between two events: cause and effect. In natural language texts, cause-effect relations can generally be categorized as explicit and implicit depending on whether or not a discourse connective (e.g., “because”, “therefore”) is found between the cause and effect text spans [14-16]. For instance, the example sentence E1 contains an explicit cause-effect relation due to the presence of the discourse connective “because” which signals the relation. Conversely, both E2 and E3 lack a discourse connective and thus the cause-effect relation between the sentences is implicit. Traditional approaches to identifying explicit cause-effect relations have focused on mining useful discourse connectives that can trigger the cause-effect relation. Wu et al. [17] manually collected a set of discourse connectives to identify cause-effect relations from psychiatric consultation records. Ramesh and Yu [18] proposed the use of a supervised machine learning method called conditional random fields (CRFs) to automatically identify discourse connectives in biomedical texts. Inui et al. [19] used a discourse connective “tame” to acquire causal knowledge from Japanese newspaper articles. Although discourse connectives are useful features for identifying causality, the difficulty inherent in collecting a complete set of discourse connectives may result in this approach failing to identify the cause-effect relations triggered by unknown discourse connectives. In addition, it may also fail to identify implicit cause-effect relations that lack an explicit discourse connective between the sentences. Accordingly, other useful features and algorithms have been investigated to identify implicit causality within [20,21] and between sentences [22,23]. Efforts to identify causality within sentences have investigated features that consider sentence structure. Rink et al. [20] proposed the use of textual graph patterns obtained from parse trees to determine whether two events from the same sentence have a causal relation. Mulkar-Mehta et al. [21] introduced a theory of granularity to identify sentences containing causal relations. Features across the sentence boundary could be useful in identifying causality between sentences because such features can capture feature relationships between sentences. For instance, word pairs in which one word comes from the cause text span and the other comes from the effect text span have been demonstrated to be useful features for discovering implicit causality between sentences [22,23] because they can capture individual word associations between cause and effect sentences. In the E2 sample sentence pair, the word pair (fail, upset) helps identify the implicit cause-effect relation that holds between the two sentences.

However, within the sentences, individual words usually cannot reflect the exact meaning of the cause and effect events which, taking E3 as an example, may produce semantically incomplete word pairs such as (broke up, life), (broke up, meaningless), (boyfriend, life), and (boyfriend, meaningless). In fact, many cause and effect events can be characterized by language patterns, i.e., meaningful combinations of words. For instance, in E3, the first sentence (cause) can be characterized by a language pattern < broke up, boyfriend>, and the second sentence (effect) can be characterized by < life, meaningless>. Combining these two intra-sentential language patterns constitutes a more semantically complete inter-sentential language pattern < <broke up, boyfriend>, <life, meaningless>>. Such inter-sentential language patterns can provide more precise information to improve the performance of causality detection because they can capture the associations of multiple words within and between sentences. Therefore, this study develops a text mining framework by extending the classical association rule mining algorithm [24-28] such that it can mine inter-sentential language patterns by associating frequently co-occurred patterns across the sentence boundary. The discovered patterns are then incorporated into a probabilistic model to detect causality between sentences.

The rest of this paper is organized as follows. We first describe the framework for inter-sentential language pattern mining and causality detection. We then summarize the experimental results of and present conclusions.

Methods

(Figure 1(a)) illustrates the framework of inter-sentential language pattern mining and causality detection. The online psychiatric texts are a collection of forum posts collected from PsychPark (http://www.psychpark.org webcite), a virtual psychiatric clinic maintained by a group of volunteer professionals belonging to the Taiwan Association of Mental Health Informatics [29,30]. A set of discourse connectives based on the results of previous studies [16,17] was created to select causality sentences from the online psychiatric texts. These causality sentences are then split into cause and effect text spans by removing the discourse connectives between them. For instance, in (Figure 1(b)), the sample causality sentences can be split by removing the discourse connective “so”. Next, the sets of cause and effect text spans are processed by the algorithm in two steps: intra-sentential and inter-sentential language pattern mining. Intra-sentential language pattern mining is used to discover language patterns of frequently co-occurring words within the cause and effect text spans. Once the intra-sentential language patterns are discovered, the frequently co-occurred patterns between the cause and effect text spans are then combined to form a set of inter-sentential language patterns. As indicated in (Figure 1(b)), two intra-sentential language patterns < broke up, boyfriend > and < life, meaningless > are discovered from their respective cause and effect text spans, and they constitute an inter-sentential language pattern <<broke up, boyfriend>, <life, meaningless>>. Finally, the acquired inter-sentential language patterns are used as features to detect causality between sentences.

thumbnailFigure 1. (a) Framework of inter-sentential language patterns mining and causality detection. (b) Example of inter-sentential language patterns mining.

The following subsections describe how the proposed mining algorithm extends the classical association rule mining to acquire both intra- and inter-sentential language patterns.

Intra-sentential language pattern mining

This section describes two methods for generating intra-sentential language patterns: extended association rule mining and sentence parsing.

Method 1: extended association rule mining

For the mining of intra-sentential language patterns, rather than mining frequent item sets in the classical association rule mining problem, we attempt to mine frequent word sets (frequently co-occurred words) in the sets of cause and effect text spans. For this purpose, we adopted a modified version of the Apriori algorithm [24,31,32]. The basic concept behind the Apriori algorithm is the recursive identification of frequent word sets from which intra-sentential language patterns are then generated. For simplicity, only nouns and verbs are considered in language pattern generation. The detailed procedure is described as follows.

Find frequent word sets within cause and effect text spans

A word set is frequent if it possesses a minimum level of support. The support of a word set is defined as the number of times the word set occurs in the set of cause (or effect) text spans. For instance, the support of a two-word set {wi,wj} denotes the number of times the word pair (wi,wj) occurs in the set of cause (or effect) text spans. The frequent k-word sets are discovered from (k-1)-word sets. First, the support of each word (i.e., the word frequency) was counted from the set of cause (or effect) text spans. The set of frequent one-word sets, denoted as L1, was then generated by choosing the words with a minimum support level. To calculate Lk, the following two-step process is performed iteratively until no more frequent k-word sets are found.

· Join step: A set of candidate k-word sets, denoted as Ck, is first generated by merging frequent word sets of Lk-1, in which only the word sets with identical first (k-2) words can be merged.

· Prune step: The support of each candidate word set in Ck is then counted to determine which candidate word sets are frequent. Finally, the candidate word sets with a support count greater than or equal to the minimum support form Lk. The candidate word sets with infrequent subsets were eliminated. Figure 2 shows an example of generating Lk. The maximum value of Lk is determined when no more frequent k-word sets are found in the generation process.

thumbnailFigure 2. Generating intra-sentential language patterns for both cause and effect events.

Generate intra-sentential language patterns from frequent word sets

Once the frequent word sets have been identified, the intra-sentential language patterns can be generated via a confidence measure. Let <a onClick="popup('http://www.biomedcentral.com/1472-6947/12/72/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6947/12/72/mathml/M1">View MathML</a> denotes an intra-sentential language pattern of k words. The confidence of <a onClick="popup('http://www.biomedcentral.com/1472-6947/12/72/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6947/12/72/mathml/M2">View MathML</a> is defined as the mutual information of the k words [33-35], as shown below:

<a onClick="popup('http://www.biomedcentral.com/1472-6947/12/72/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6947/12/72/mathml/M3">View MathML</a>

(1)

where <a onClick="popup('http://www.biomedcentral.com/1472-6947/12/72/mathml/M4','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6947/12/72/mathml/M4">View MathML</a> denotes the probability of the k words co-occurring in the set of cause (or effect) text spans, and <a onClick="popup('http://www.biomedcentral.com/1472-6947/12/72/mathml/M5','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6947/12/72/mathml/M5">View MathML</a> denotes the probability of a single word occurring in the set of cause (or effect) text spans. Accordingly, for every frequent word set in Lk, an intra-sentential language pattern is generated if the mutual information of the k words is greater than or equal to a minimum confidence. The resulting intra-sentential language patterns are those with a minimum confidence level. Figure 2 shows an example of generating intra-sentential language patterns from Lk.

Method 2: sentence parsing

In addition to the extended association rule mining presented above, sentence parsing that considers sentence structure can also be used to discover word dependencies in sentences. Therefore, this study uses a parser developed by Academia Sinica, Taiwan [36] to generate intra-sentential language patterns by deriving word pairs with proper dependencies from the parse trees of both cause and effect text spans. Figure 3 shows the parse tree output for the sample sentence: My boss cut my salary.

thumbnailFigure 3. Example of a parse tree.

The parser assigns a phrase label (e.g., NP, VP, PP, etc.) and a semantic label (e.g., Head, possessor, theme, etc.) to each constituent in the sentences. The dependencies of each word and its head are then considered as the intra-sentential language patterns. For example, in Figure 3, the intra-sentential language patterns for the sample sentences include (my, boss), (my, salary), (boss, cut), and (salary, cut).

Inter-sentential language pattern mining

An inter-sentential language pattern is composed of at least one intra-sentential language pattern for cause events and one for effect events. Therefore, once the intra-sentential language patterns for cause and effect events are generated using each of the abovementioned methods, the next step is to generate inter-sentential language patterns by finding frequently co-occurring patterns between the cause and effect text spans. This can be accomplished by repeating the same procedure presented above for extended association rule mining to find frequent pattern sets which are then used to generate inter-sentential language patterns.

Find frequent pattern sets between cause and effect text spans

The procedure for finding frequent pattern sets only differs from that of finding frequent word sets in terms of the definition of the support measure. In finding frequent word sets, the support of a word set is defined as the number of times the word set occurs in the set of cause (or effect) text spans. In this step, a pattern set is composed of at least one pattern from cause events and one from effect events. Therefore, the support of a pattern set is defined as the number of times the pattern set occurs between the sets of cause and effect text spans. For instance, suppose a two-pattern set {lpi,lpj} where lpi and lpj respectively denote an intra-sentential language pattern for the cause and effect events. The support of this two-pattern set is the number of times, lpi and lpj co-occur between the sets of cause and effect text spans. Therefore, in searching for frequent pattern sets, all combinations of the intra-sentential language patterns for the cause and effect events are considered as candidate pattern sets. The join and prune steps presented in the previous section can then be repeated to determine frequent pattern sets from all possible pattern combinations. Figure 4 shows an example.

thumbnailFigure 4. Generating inter-sentential language patterns for cause-effect relations.

Generate inter-sentential language patterns from frequent pattern sets

Similar to the procedure for generating intra-sentential language patterns, this step requires a confidence measure to generate inter-sentential language patterns from frequent pattern sets. In generating intra-sentential language patterns, the confidence score is used to measure the mutual information of the words in a frequent word set. In this step, the confidence score is used to measure the mutual information of the patterns in a frequent pattern set. Let <a onClick="popup('http://www.biomedcentral.com/1472-6947/12/72/mathml/M6','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6947/12/72/mathml/M6">View MathML</a> denotes an inter-sentential language pattern of k patterns. The confidence of islpi is defined as the mutual information of the k words, as shown below:

<a onClick="popup('http://www.biomedcentral.com/1472-6947/12/72/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6947/12/72/mathml/M7">View MathML</a>

(2)

where <a onClick="popup('http://www.biomedcentral.com/1472-6947/12/72/mathml/M8','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6947/12/72/mathml/M8">View MathML</a> denotes the probability of the k patterns co-occurring between the sets of cause and effect text spans, and <a onClick="popup('http://www.biomedcentral.com/1472-6947/12/72/mathml/M9','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6947/12/72/mathml/M9">View MathML</a> denotes the probability of a pattern occurring in the set of cause (or effect) text spans. The resulting inter-sentential language patterns are those with a minimum confidence score. Figure 4 shows an example.

Causality detection

This section describes the use of inter-sentential language patterns to detect causality between sentences, focusing on the detection of implicit cause-effect relations. Other studies have also demonstrated the use of surface text patterns for relation extraction [37,38]. Given a sentence pair (sisj) without any discourse connective between si and sj, the goal is to classify the sentence pair into causality or non-causality, as shown below:

<a onClick="popup('http://www.biomedcentral.com/1472-6947/12/72/mathml/M10','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6947/12/72/mathml/M10">View MathML</a>

(3)

where c* is the prediction output, representing causality (ck=1) or non-causality (ck=0). Before prediction, the input sentence pair (sisj) is first transformed into feature representation. This study uses both inter-sentential language patterns and previously proposed word pairs as features. As each sentence pair is transformed into pattern representation, it is represented by a single or multiple inter-sentential language patterns depending on the number of patterns the sentence pair matched in the set of discovered inter-sentential language patterns. Therefore, a sentence pair containing n inter-sentential language patterns can be formally represented as <a onClick="popup('http://www.biomedcentral.com/1472-6947/12/72/mathml/M11','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6947/12/72/mathml/M11">View MathML</a>. In the word-pair representation, each sentence pair is represented by a set of word pairs, denoted as <a onClick="popup('http://www.biomedcentral.com/1472-6947/12/72/mathml/M12','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6947/12/72/mathml/M12">View MathML</a>. By using these two features, Eq. (3) can be re-written as

<a onClick="popup('http://www.biomedcentral.com/1472-6947/12/72/mathml/M13','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6947/12/72/mathml/M13">View MathML</a>

(4)

where <a onClick="popup('http://www.biomedcentral.com/1472-6947/12/72/mathml/M14','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6947/12/72/mathml/M14">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1472-6947/12/72/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6947/12/72/mathml/M15">View MathML</a> represent the feature sets of inter-sentential language patterns and word pairs of the input sentence pair <a onClick="popup('http://www.biomedcentral.com/1472-6947/12/72/mathml/M16','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6947/12/72/mathml/M16">View MathML</a>, respectively. Assume that <a onClick="popup('http://www.biomedcentral.com/1472-6947/12/72/mathml/M17','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6947/12/72/mathml/M17">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1472-6947/12/72/mathml/M18','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6947/12/72/mathml/M18">View MathML</a> are independent. Eq. (4) can be re-written as

<a onClick="popup('http://www.biomedcentral.com/1472-6947/12/72/mathml/M19','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6947/12/72/mathml/M19">View MathML</a>

(5)

Assuming again that the elements in both <a onClick="popup('http://www.biomedcentral.com/1472-6947/12/72/mathml/M20','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6947/12/72/mathml/M20">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1472-6947/12/72/mathml/M21','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6947/12/72/mathml/M21">View MathML</a> are independent, then

<a onClick="popup('http://www.biomedcentral.com/1472-6947/12/72/mathml/M22','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6947/12/72/mathml/M22">View MathML</a>

(6)

where <a onClick="popup('http://www.biomedcentral.com/1472-6947/12/72/mathml/M23','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6947/12/72/mathml/M23">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1472-6947/12/72/mathml/M24','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6947/12/72/mathml/M24">View MathML</a> denote the respective probabilities of an inter-sentential language pattern and a word pair occurred in the causality or non-causality class, and <a onClick="popup('http://www.biomedcentral.com/1472-6947/12/72/mathml/M25','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6947/12/72/mathml/M25">View MathML</a> denotes the probability of the causality or non-causality class. These probabilities can be estimated from the training data:

<a onClick="popup('http://www.biomedcentral.com/1472-6947/12/72/mathml/M26','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6947/12/72/mathml/M26">View MathML</a>

(7)

<a onClick="popup('http://www.biomedcentral.com/1472-6947/12/72/mathml/M27','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6947/12/72/mathml/M27">View MathML</a>

(8)

<a onClick="popup('http://www.biomedcentral.com/1472-6947/12/72/mathml/M28','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6947/12/72/mathml/M28">View MathML</a>

(9)

where <a onClick="popup('http://www.biomedcentral.com/1472-6947/12/72/mathml/M29','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6947/12/72/mathml/M29">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1472-6947/12/72/mathml/M30','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6947/12/72/mathml/M30">View MathML</a> denote the respective frequency counts of an inter-sentential language pattern and a word pair occurring in the causality or non-causality class, <a onClick="popup('http://www.biomedcentral.com/1472-6947/12/72/mathml/M31','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6947/12/72/mathml/M31">View MathML</a> denotes the number of causality or non-causality sentences in the training data, and N denotes the total number of sentences in the training data.

Results and Discussion

This section presents the experimental results for causality detection. We first explain the experimental setup, including experiment data, features used for causality detection, and evaluation metrics. The selection of optimal parameter settings for inter-sentential language pattern mining is then described, followed by the evaluation results of causality detection with different features.

Experimental setup

· Data: A total of 9716 sentence pairs were collected from PsychPark [29,30], from which 8035, 481, and 1200 sentence pairs were randomly selected as the training set, development set, and test set, respectively. For each data set, a set of discourse connectives collected based on the results of previous studies [16,17], were used to select causality sentence pairs. The statistics of the data sets are presented in Table 1. The training set was used to generate the inter-sentential language patterns and word pairs. The validation set was used to select the optimal value of the parameters used in inter-sentential language pattern mining. The test set was used to evaluate the performance of causality detection.

· Features used for causality detection: This experiment used word pairs (WP) and inter-sentential language patterns (ISLP) as features to detect causality between sentences. For ISLP, we used ISLPARM and ISLPparsing to denote the sets of inter-sentential language patterns generated from the intra-sentential language patterns respectively discovered using the extended association rule mining and sentence parsing. Thus, the causality detection method was implemented using three feature sets: WP, WP + ISLPARM and WP + ISLPparsing, where WP was used to construct a baseline for causality detection, while WP + ISLPARM and WP + ISLPparsing were used to determine whether or not the newly proposed inter-sentential language patterns could further improve detection performance, and determine which method (i.e., extended association rule mining or sentence parsing) could generate intra-sentential language patterns more useful for subsequent inter-sentential language pattern mining for causality detection.

· Evaluation metrics: The metrics used for performance evaluation included recall, precision, and F-measure, respectively, defined as follows:

Table 1. Statistics of experimental data

<a onClick="popup('http://www.biomedcentral.com/1472-6947/12/72/mathml/M32','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6947/12/72/mathml/M32">View MathML</a>

(10)

<a onClick="popup('http://www.biomedcentral.com/1472-6947/12/72/mathml/M33','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6947/12/72/mathml/M33">View MathML</a>

(11)

<a onClick="popup('http://www.biomedcentral.com/1472-6947/12/72/mathml/M34','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1472-6947/12/72/mathml/M34">View MathML</a>

(12)

Evaluation of inter-sentential language pattern mining

In inter-sentential language pattern mining, two parameters may affect the quantity and quality of the discovered patterns: the size of training data and threshold value of confidence (Eq. (2)). The size of the training data set was used to control the number of documents used for pattern generation. The threshold value of confidence was used to control the number of patterns generated from training data. The optimal values of both parameters were determined by maximizing the performance of causality detection on the development set. Figure 5 shows the F-measure of causality detection for different proportions of training data. The results show that increasing the size of the training data set increased the performance of WP, WP + ISLPARM, and WP + ISLPparsing, mainly because more useful features can be discovered from a larger training set.

thumbnailFigure 5. Performance against different proportions of training data.

For the confidence threshold, a higher value represents a more confident pattern. In the pattern generation process, all discovered patterns were sorted in descending order of their confidence values. A threshold percentage was then applied to select the top N percent of patterns for causality detection. Figure 6 shows the F-measure of causality detection for different percentages of selected patterns. The results show that for WP + ISLPARM performance increased as the threshold value increased to 0.3, indicating that the top 30 % of patterns were useful for detecting causality due to their higher level of confidence. When the threshold value exceeded 0.3, the performance decreased because the lower ranks contained more noisy patterns that tended to increase ambiguity in causality detection. For WP + ISLPparsing, the optimal threshold value was 0.7.

thumbnailFigure 6. Performance against different threshold values of confidence.

Results of causality detection

This section presents the comparative results of using different feature sets for causality detection. The results presented in Table 2 were obtained from the test set with 10-fold cross validation, using the optimal parameter settings selected in the previous section. A paired, two-tailed t-test was used to determine whether the performance difference was statistically significant.

Table 2. Comparative results of causality detection with different feature sets

The row labeled WP indicates that it used word pairs alone as features, providing a baseline result for causality detection. Once inter-sentential language patterns were used, both WP + ISLPParsing and WP + ISLPARM improved the recall, precision, and F-measure over WP, indicating that the proposed inter-sentential language patterns are significant features for causality detection. As listed in Table 3, the inter-sentential language patterns are more semantically complete and can provide more precise information because they can capture the associations of multiple words within and between sentences. Conversely, word pairs such as (friend, energy) and (investment, life), which consider only individual word relationships, are usually semantically incomplete and ambiguous, thus yielding lower performance. Both WP + ISLPParsing and WP + ISLPARM achieved a similar F-measure, indicating that both extended association rule mining or sentence parsing can generate intra-sentential language patterns that are useful for subsequent inter-sentential language pattern mining for causality detection.

Table 3. Examples of inter-sentential language patterns

Conclusions

This study proposes the use of inter-sentential language patterns to detect cause-effect relations in online psychiatric texts. We also present a text mining framework to mine inter-sentential language patterns by associating frequently co-occurring language patterns across the sentence boundary. Experimental results show that using the proposed inter-sentential language patterns improved the performance above the use of word pairs alone, mainly because the inter-sentential language patterns are semantically more complete and can thus provide more precise information for causality detection. Future work will be devoted to investigating more useful cross-sentence features and information fusion methods to further improve system performance.

Endnotes

Competing interests

The author(s) declare that they have no competing interests.

Authors’ contributions

JLW collected the corpus, designed the experiment, and contributed to writing the paper. LCY designed the study, interpreted experiment results, and contributed to writing the paper. PJC restructured the paper and contributed to writing the paper. All of authors read and approved the final manuscript.

Acknowledgement

This work was supported by the National Science Council, Taiwan, ROC, under Grant No. NSC99-2221-E-155-036-MY3 and NSC100-2632-S-155-001. The authors would like to thank the reviewers and editors for their constructive comments.

References

  1. Eysenbach G: Medicine 2.0: Social Networking, Collaboration, Participation, Apomediation, and Openness.

    J Med Internet Res 2008, 10(3):e22. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  2. Huang CM, Chan E, Hyder AA: Web 2.0 and Internet Social Networking: A New tool for Disaster Management? - Lessons from Taiwan.

    BMC Med Inform Decis Mak 2010, 10:57. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text OpenURL

  3. Yardley L, Morrison LG, Andreou P, Joseph J, Little P: Understanding reactions to an internet-delivered health-care intervention: accommodating user preferences for information provision.

    BMC Med Inform Decis Mak 2010, 10:52. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text OpenURL

  4. Kleinberg S, Hripcsak G: A review of causal inference for biomedical informatics.

    J Biomed Inform 2011, 44(6):1102-1112. PubMed Abstract | Publisher Full Text OpenURL

  5. Girju R, Moldovan D: Mining answers for causation. In Proceedings of the AAAI Spring Symposium. AAAI Press, Stanford, CA, USA; 2002:15-25. OpenURL

  6. Niu Y, Hirst G: Analysis of semantic classes in medical text for question answering. In Proceedings of the ACL 2004 Workshop on Question Answering in Restricted Domains. Association for Computational Linguistics, Barcelona, Spain; 2004. OpenURL

  7. Demner-Fushman D, Lin J: Answering clinical questions with knowledge-based and statistical techniques.

    Comput Linguist 2007, 33(1):63-103. Publisher Full Text OpenURL

  8. Mulkar-Mehta R, Hobbs JR, Liu CC, Zhou XJ: Discovering causal and temporal relations in biomedical texts. In Proceedings of the AAAI Spring Symposium. AAAI Press, Stanford, CA, USA; 2009:74-80. OpenURL

  9. Boudin F, Nie JY, Bartlett JC, Grad R, Pluye P, Dawes M: Combining classifiers for robust PICO element detection.

    BMC Med Inform Decis Mak 2010, 10:29. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text OpenURL

  10. Prasad R, McRoy S, Frid N, Joshi A, Yu H: The biomedical discourse relation bank.

    BMC Bioinformatics 2011, 12:188. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text OpenURL

  11. Radinsky K, Davidovich S, Markovitch S: Learning causality from textual data. In Proceedings of the IJCAI Workshop on Learning by Reading and its Applications in Intelligent Question-Answering. AAAI Press, Barcelona, Spain; 2011:363-367. OpenURL

  12. Yu LC, Wu CH, Jang FL: Psychiatric document retrieval using a discourse-aware model.

    Artif Intell 2009, 173(7–8):817-829. OpenURL

  13. Faghihi U, Fournier-viger P, Nkambou R: A computational model for causal learning in cognitive agents.

    Knowl-based Syst 2012, 30:48-56. OpenURL

  14. Hobbs JR: On the coherence and structure of discourse, Report No. CSLI-85-37. Center for the Study of Language and Information. Stanford University Press, California; 1985. OpenURL

  15. Power R, Scott D, Bouayad-Agha N: Document structure.

    Comput Linguist 2003, 29(2):211-260. Publisher Full Text OpenURL

  16. Wolf F, Gibson E: Representing discourse coherence: a corpus-based study.

    Comput Linguist 2005, 31(2):249-287. Publisher Full Text OpenURL

  17. Wu CH, Yu LC, Jang FL: Using semantic dependencies to mine depressive symptoms from consultation records.

    IEEE Intell Syst 2005, 20(6):50-58. Publisher Full Text OpenURL

  18. Ramesh BP, Yu H: Identifying discourse connectives in biomedical text. In Proceedings of the AMIA 2010 Symposium: 22–26 Oct 2010. American Medical Informatics Association, Washington, DC; 2010:657-661. OpenURL

  19. Inui T, Inui K, Matsumoto Y: Acquiring causal knowledge from text using the connective markers.

    J Inf Process Soc Jpn 2004, 45(3):919-993. OpenURL

  20. Rink B, Bejan CA, Harabagiu S: Learning textual graph patterns to detect causal event relations. In Proceedings of the 23rd International Florida Artificial Intelligence Research Society Conference. AAAI Press, Daytona Beach, Florida, USA; 2010:265-270. OpenURL

  21. Mulkar-Mehta R, Welty C, Hobbs JR, Hovy EH: Using Part-Of relations for discovering causality. In Proceedings of the 24th International Florida Artificial Intelligence Research Society Conference. AAAI Press, Palm Beach, Florida, USA; 2011:57-62. OpenURL

  22. Marcu D, Echihabi A: An unsupervised approach to recognizing discourse relations. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistic, ACL’02. Association for Computational Linguistics, Philadelphia, PA, USA; 2002:368-375. OpenURL

  23. Chang DS, Choi KS: Incremental discourse connective learning and bootstrapping method for causality extraction using discourse connective and word pair probabilities.

    Inf Process Manage 2006, 42(3):662-678. Publisher Full Text OpenURL

  24. Agrawal R, Srikant R: Fast algorithms for mining association rules. In Proceedings of the 20th International Conference Very Large Data Bases. Morgan Kaufmann Publishers Inc., Hong Kong, China; 1994:487-499. OpenURL

  25. Tai YM, Chiu HW: Comorbidity study of ADHD: applying association rule mining (ARM) to National Health Insurance Database of Taiwan.

    Int J Med Inform 2009, 78(12):e75-e83. PubMed Abstract | Publisher Full Text OpenURL

  26. Hu H: Mining patterns in disease classification forests.

    J Biomed Inform 2010, 43(5):820-827. PubMed Abstract | Publisher Full Text OpenURL

  27. Herawan T, Mat Deris M: A soft set approach for association rules mining.

    Knowl-based Syst 2011, 24(1):186-195. Publisher Full Text OpenURL

  28. Liu H, Lin F, He J, Cai Y: New approach for the sequential pattern mining of high-dimensional sequence databases.

    Decis Support Syst 2010, 50(1):270-280. Publisher Full Text OpenURL

  29. Bai YM, Lin CC, Chen JY, Liu WC: Virtual psychiatric clinics.

    Am J Psychiat 2001, 158(7):1160-1161. PubMed Abstract | Publisher Full Text OpenURL

  30. Lin CC, Bai YM, Chen JY: Reliability of information provided by patients of a virtual psychiatric clinic.

    Psychiat Serv 2003, 54(8):1167-1168. Publisher Full Text OpenURL

  31. Chien JT: Association pattern language modeling.

    IEEE Trans Audio Speech Lang Process 2006, 14(5):1719-1728. OpenURL

  32. Wu CH, Chuang ZJ, Lin YC: Emotion recognition from text using semantic labels and separable mixture models.

    ACM Trans. Asian Lang Inf Process 2006, 5(2):165-182. Publisher Full Text OpenURL

  33. Church K, Hanks P: Word association norms, mutual information and lexicography.

    Comput Linguist 1991, 16(1):22-29. OpenURL

  34. Manning C, Schütze H: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA; 1999. OpenURL

  35. Yu LC, Chien WN, Chen ST: A baseline system for Chinese near-synonym choice. In Proceedings of the 5th International Joint Conference on Natural Language Processing, IJCNLP’11. Asian Federation of Natural Language Processing;, Chiang Mai, Thailand; 2011:1366-1370. OpenURL

  36. Hsieh YM, Yang DC, Chen KJ: Linguistically-motivated grammar extraction, generalization and adaptation. In Proceedings of the Second International Joint Conference on Natural Language Processing, IJCNLP’05. Springer, Jeju Island, Korea; 2005:177-187. PubMed Abstract | Publisher Full Text OpenURL

  37. Ravichandran D, Hovy EH: Learning surface text patterns for a question answering system. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistic, ACL’02. Association for Computational Linguistics, Philadelphia, PA, USA; 2002:41-47. OpenURL

  38. Bhagat R, Ravichandran D: Large scale acquisition of paraphrases for learning surface patterns. In Proceedings of the 46th Annual Meeting on Association for Computational Linguistic: Human Language Technologies, ACL’08: HLT. Association for Computational Linguistics, Columbus, OH, USA; 2008:674-682. OpenURL

Pre-publication history

The pre-publication history for this paper can be accessed here:

http://www.biomedcentral.com/1472-6947/12/72/prepub