<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-244X-4-13</ui>
   <ji>1471-244X</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>Computerized adaptive measurement of depression: A simulation study</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Gardner</snm>
               <fnm>William</fnm>
               <insr iid="I1"/>
               <email>gardnerw@pediatrics.ohio-state.edu</email>
            </au>
            <au id="A2">
               <snm>Shear</snm>
               <fnm>Katherine</fnm>
               <insr iid="I2"/>
               <email>shearmk@msx.upmc.edu</email>
            </au>
            <au id="A3">
               <snm>Kelleher</snm>
               <mi>J</mi>
               <fnm>Kelly</fnm>
               <insr iid="I1"/>
               <email>kellehek@pediatrics.ohio-state.edu</email>
            </au>
            <au id="A4">
               <snm>Pajer</snm>
               <mi>A</mi>
               <fnm>Kathleen</fnm>
               <insr iid="I1"/>
               <email>pajerk@pediatrics.ohio-state.edu</email>
            </au>
            <au id="A5">
               <snm>Mammen</snm>
               <fnm>Oommen</fnm>
               <insr iid="I2"/>
               <email>mammenok@msx.upmc.edu</email>
            </au>
            <au id="A6">
               <snm>Buysse</snm>
               <fnm>Daniel</fnm>
               <insr iid="I2"/>
               <email>buyssedj@msx.upmc.edu</email>
            </au>
            <au id="A7">
               <snm>Frank</snm>
               <fnm>Ellen</fnm>
               <insr iid="I2"/>
               <email>franke@msx.upmc.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Pediatrics, Children's Research Institute and Ohio State University, Columbus, OH, USA</p>
            </ins>
            <ins id="I2">
               <p>Psychiatry, Western Psychiatric Institute and University of Pittsburgh, Pittsburgh, PA, USA</p>
            </ins>
         </insg>
         <source>BMC Psychiatry</source>
         <issn>1471-244X</issn>
         <pubdate>2004</pubdate>
         <volume>4</volume>
         <issue>1</issue>
         <fpage>13</fpage>
         <url>http://www.biomedcentral.com/1471-244X/4/13</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="doi">10.1186/1471-244X-4-13</pubid>
               <pubid idtype="pmpid">15132755</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>23</day>
               <month>12</month>
               <year>2003</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>06</day>
               <month>5</month>
               <year>2004</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>06</day>
               <month>5</month>
               <year>2004</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2004</year>
         <collab>Gardner et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.</collab>
      </cpyrt>
      <kwdg>
         <kwd>Mood Disorders, Unipolar</kwd>
         <kwd>Computers</kwd>
         <kwd>Diagnosis and Classification</kwd>
         <kwd>Tests/Interviews, Psychometrics</kwd>
      </kwdg>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Efficient, accurate instruments for measuring depression are increasingly important in clinical practice. We developed a computerized adaptive version of the Beck Depression Inventory (BDI). We examined its efficiency and its usefulness in identifying Major Depressive Episodes (MDE) and in measuring depression severity.</p>
            </sec>
            <sec>
               <st>
                  <p>Methods</p>
               </st>
               <p>Subjects were 744 participants in research studies in which each subject completed both the BDI and the SCID. In addition, 285 patients completed the Hamilton Depression Rating Scale.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>The adaptive BDI had an AUC as an indicator of a SCID diagnosis of MDE of 88%, equivalent to the full BDI. The adaptive BDI asked fewer questions than the full BDI (5.6 versus 21 items). The adaptive latent depression score correlated <it>r </it>= .92 with the BDI total score and the latent depression score correlated more highly with the Hamilton (<it>r </it>= .74) than the BDI total score did (<it>r </it>= .70).</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusions</p>
               </st>
               <p>Adaptive testing for depression may provide greatly increased efficiency without loss of accuracy in identifying MDE or in measuring depression severity.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="bmc" subtype="user_supplied_xml" id="endnote"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>There is a pressing need for accurate and efficient instruments to screen for depression and to measure its severity, for several reasons. First, the U.S. Preventive Services Task Force <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> recommended adults be screened for depression, based on findings that feedback of depression screening to clinicians increased the recognition of depressive illness. Moreover, a great proportion of depression care is in the hands of clinicians who lack specialized mental health training <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>. These clinicians may benefit from methods to detect cases and evaluate the outcomes of care. For example, it has been recognized for some time that depression is an important source of morbidity in primary care <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>, and that improvement is needed in the recognition and management of depression in that setting <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp>. However, clinician time and attention are highly constrained in many health care settings <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>.</p>
         <p>Second, efficient yet accurate instruments to measure depression would also be of considerable value in monitoring the progress of treatment by mental health specialists and other treating clinicians. Systematic case management activities involving regular outcome measurement are part of an emerging paradigm of high quality care of chronic illnesses <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>. However, such programs will fail if patients are unwilling to adhere to measurement protocols. Therefore, follow-up measurement protocols cannot burden patients with repeated exposures to long questionnaires. Finally, researchers frequently assess severity of depression using standard instruments such as the Beck Depression Inventory (BDI) <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. Efficiency is important here as well, because researchers often assess many constructs, straining study participants' endurance.</p>
         <p>Given that many instruments permit assignments of diagnoses, measurement of symptoms, or assessment of functioning, one might ask why routine depression screening and treatment monitoring are not already common. Although these instruments are commonly used in research, they have had less effect on clinical care. One important reason is that the time and other costs of mental health assessments may outweigh their benefits for busy patient-care settings. These costs may also make screening or treatment monitoring suboptimal from a societal perspective. For example, a recent simulation study by Valenstein and her colleagues <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> suggested that screening for depression would not meet a reasonable criterion for cost-utility if the cost of administering a single test was substantially higher than $5, where that cost comprised a fee for the instrument, six minutes of staff time, and one minute of physician time.</p>
         <sec>
            <st>
               <p>Computerized adaptive measurement of depression</p>
            </st>
            <p>Two technical advances could substantially improve the tradeoff between efficiency and accuracy in the measurement of mental health problems, such as depression. First, the Internet is reducing the cost and other barriers to the delivery of computerized testing services to clinical offices. Wireless Internet connectivity is becoming widely available, and powerful, mobile tablet and handheld computers are now available at commodity prices. These technologies should substantially reduce the cost of putting computer-administered tests in the hands of patients and clinicians in front-line clinical settings. Computerized tests, particularly those that can be self-administered by patients, can reduce the staff and clinician time required to administer and score an instrument. There are several computerized mental health instruments including, for example, a computerized version of the Composite International Diagnostic Interview (CIDI) <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp>.</p>
            <p>The second technical advance, Computerized Adaptive Testing (CAT) <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>, has been widely used by educational and vocational testers, but has seen surprisingly little application in physical or mental health settings <abbrgrp><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp>. CAT is a technology for interactive administration of tests that tailors the test to the examinee (or, in our application, to the patient). These tests are 'adaptive' in the sense that the testing is driven by an algorithm that selects questions in real time and in response to the ongoing responses of the patient. We believe that computerized, adaptive mental health assessment services, delivered on stand-alone computers or over the web, could make significant contributions to both mental health research and clinical care. In this article, we discuss how CAT can be used to screen for, and measure severity of, depression.</p>
            <p>The need to achieve both accuracy and efficiency poses a difficult tradeoff for an instrument developer, for two reasons. First, classical test theory <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> teaches that, everything else being equal, the way to make a test more accurate is to increase its length, so that random errors in the responses to individual items cancel each other out.</p>
            <p>Second, the need to accurately measure patients with varying levels of severity of disorder lengthens tests. Failing to include items about symptoms reflecting a wide range of severity of disorder will result in an instrument with a floor or ceiling effect <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. Thus, an accurate and wide-ranging instrument should include several questions at each relevant level of severity of disorder.</p>
            <p>Unfortunately, a fixed instrument that has multiple questions for each of several ranges of severity is an inefficient instrument for any individual patient. That individual patient has a disorder the severity of which falls into only one of those ranges, and questions that ask about much more or much less severe symptoms are often irrelevant to that patient. In summary, until recently the goals of having a brief, efficient instrument and an accurate, wide-ranging instrument have seemed mutually incompatible.</p>
         </sec>
         <sec>
            <st>
               <p>Computerized adaptive testing</p>
            </st>
            <p>CAT can improve the terms on which accuracy and efficiency are traded off. It has two components. First, one administers the instrument via computer, using a device such as a touch screen, or through a computer-administered telephone interview <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp>. Research on computerized tests <abbrgrp><abbr bid="B22">22</abbr></abbrgrp> has shown that the medium has few negative effects on how subjects respond. To the contrary, computerized data collection directly from patients appears to reduce social desirability bias in the reporting of alcohol and drug use, sexual activity, and medication noncompliance <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. Of particular interest, is the suggestion that people seem to prefer revealing some types of very personal information e.g., gynecological details <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>, sexual abuse <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>, or suicidal ideation <abbrgrp><abbr bid="B26">26</abbr></abbrgrp> to a computer than a person. Similarly, alcoholics seeking treatment disclosed greater levels of consumption of alcohol to a computer than to a person <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>.</p>
            <p>CAT, however, goes farther. 'Adaptive' means that the computer follows an algorithm that administers a test (for example, the BDI) to a patient one question at a time. At each step, the patient's prior responses determine (a) whether to ask another question and (b) which question to ask <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B28">28</abbr></abbrgrp>. The test stops when the patient's score has been estimated to a prescribed level of precision. Hence, the computer adapts the test to use the fewest items required to assess <it>that particular patient </it>accurately. By comparison, an instrument using a fixed list of items may have too few items to accurately measure some patients, while posing unnecessary questions to others.</p>
            <p>To that end, at each step, the program uses the current subset of responses to estimate the patient's score on a latent trait, in this case depression, as well as a confidence interval (CI) around that estimate <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. The latent trait is conventionally denoted as &#952;, and is conventionally expressed in standardized units. However, &#952; could be rescaled to the same units as the BDI to aid clinicians familiar with that instrument. The CI around &#952; is then compared to the 'cut' or criterion score on the latent trait that defines a positive screening result. If the upper bound of the CI were to fall below the cut score, the program would declare the screening result negative, and stop testing. Conversely, if the lower bound of the CI were to fall above the cut score, testing would stop with a positive result. Otherwise, the CI includes the cut score, and testing continues.</p>
            <p>Suppose now that we are in mid-test, and the adaptive algorithm has to choose another question to pose to a respondent. Using the data already collected about the respondent, the program calculates an information statistic for each of the test items that have not yet been posed. The information statistic for an item is larger if the response to that item is expected to make a greater reduction in our uncertainty about the patient's true score on the latent depression dimension. The computer then presents the maximally informative item to the respondent. Everything else being equal, a question will be more informative if the severity of the symptoms it concerns is similar to our current estimate of the severity of the patient's depression. For example, if we already have substantial evidence of depression based on the responses thus far, the computer will discount the value of items that primarily ask about minor symptoms, and focus on those that ask about severe symptoms. Please notice that the adaptive algorithm we describe here is different from the branching logic used in many computerized tests to skip questions based on earlier patient responses. Programs that use branching identify questions to be skipped because those questions are irrelevant based on a patient's previous answers. Adaptive tests choose questions to be asked because those questions maximize the precision of the patient's estimated score on a latent dimension of interest.</p>
            <p>We reasoned that adaptive technology could substantially improve the efficiency of psychometric measurement in clinical settings, with little or no cost in the accuracy of measurement. We sought to test this by developing an adaptive version of the BDI. We chose the BDI because it is a well-validated instrument for depression and representative of the many screening instruments available for this common condition. It is brief, has been very widely used, and is already in a self-report format.</p>
            <p>The goals of this study were (a) to test whether an adaptive version of the BDI would predict a Structured Clinical Interview for DSM-IV Axis I Disorders (SCID) <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> diagnosis as accurately as the full BDI, (b) to estimate how many fewer questions the adaptive BDI would ask, and (c) to determine whether the adaptive BDI would measure the severity of depression as well as the full BDI. The statistical methodology underlying adaptive testing is well established and there is considerable experience in using it in other domains of measurement <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B31">31</abbr></abbrgrp>. In a previous study <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>, we showed that the screening decisions made by an adaptive version of the Pediatric Symptom Checklist (PSC) <abbrgrp><abbr bid="B33">33</abbr></abbrgrp> agreed nearly perfectly with the screening decisions made by the full PSC (&#954; = .97). The adaptive PSC achieved that agreement by asking an average of only 10.5 questions per patient, compared to the 35 items required by the full PSC. However, that study did not examine whether adaptive testing affected the PSC's accuracy, which would have required comparing screening decisions based on adaptive data to independent psychometric criteria. To our knowledge, there have been no studies of how an adaptive implementation of a screen for mental health problems affects the agreement between the screen and criterion measures. In this study, we evaluated the performance of an adaptive version of the BDI against an independent SCID diagnosis and Hamilton depression measure.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Study group and data</p>
            </st>
            <p>This study combined data from nine projects at the University of Pittsburgh. We looked for recent studies in which subjects had both a BDI and a SCID. 1) Two-hundred and nineteen assessments were obtained from mothers seeking treatment for their children in a rural mental clinic from 1998 to 2000. 2) Seventeen depressed women were recruited from a rural mental center in Western Pennsylvania in 1999 for a pilot psychotherapy protocol. 3) Twenty-three subjects participated in a pilot study of cognitive behavioral treatment for traumatic grief in 1999 and 2000. 4) Forty-three women came from a descriptive study of anger in pregnant or post-partum women <abbrgrp><abbr bid="B34">34</abbr></abbrgrp> who presented for treatment of mood and anxiety disorders in a psychiatric clinic in a university medical center in 1996 and 1997. 5) Eighty-seven subjects came from a study of maintenance therapy in bipolar disorder. 6) Nine subjects came from a study of borderline personality disorder. 7) Fourteen subjects came from a pilot study of brief interpersonal psychotherapy. 8) One hundred eighty-three subjects came from a study of maintenance psychotherapy in women with recurrent major depression. 9) Finally, 149 subjects came from a study of normal sleeping patterns in adults. These latter subjects were selected based on having no lifetime history of mental disorders as measured by the SADS or the SCID, as well as no first-degree family history of mental disorders.</p>
            <p>For 285 of these patients, we also had Hamilton Depression Rating Scale <abbrgrp><abbr bid="B35">35</abbr></abbrgrp> scores obtained within one week of the BDIs to serve as an independent measure of the severity of depression. For this subset, we were able to compare whether the Adaptive BDI correlated with the Hamilton as well as the total score of the full BDI.</p>
            <p>Pooling these data sets resulted in 744 subjects. Of these, 84% were female, 91% were European-Americans, and the average age was 37 (<it>SD </it>= 8.6 years). All subjects in these studies had completed a BDI and had received a diagnostic evaluation with the SCID. Patients completed the BDI before treatment, during a symptomatic period at or near the time of the diagnostic interview. Three hundred thirty-nine participants had either a SCID diagnosis of major depressive disorder, or bipolar disorder in which it could be established through independent and concurrent assessments that the patient had completed the BDI in a depressed phase. These unipolar and bipolar depressives were classified as having an MDE. Of the remaining 405 participants, 256 had diagnoses other than MDE, and 149 had no diagnosed disorders.</p>
         </sec>
         <sec>
            <st>
               <p>Beck Depression Inventory (BDI)</p>
            </st>
            <p>The BDI is a widely used 21-item depression survey (there is an additional skip-out item that was ignored in this analysis). Each item on the BDI includes four response statements describing increasing severity of depression. A few (&lt;0.2%) scores on specific BDI questions were missing. Randomly imputed scores replaced these values.</p>
         </sec>
         <sec>
            <st>
               <p>Item Response Theory (IRT) modeling</p>
            </st>
            <p>IRT <abbrgrp><abbr bid="B36">36</abbr><abbr bid="B37">37</abbr><abbr bid="B38">38</abbr></abbrgrp> has replaced classical test theory <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> as the leading psychometric theory for surveys and tests in education, the social sciences, and increasingly, for patient-reported data in health care <abbrgrp><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B32">32</abbr></abbrgrp>. In a test created using classical test theory, there are points assigned to each response to each question (for example, 1 point for a 'yes' response to a yes/no question about a depression symptom and 0 points for a 'no' response). You would score the test by summing the points to compute a total score. You would interpret the result by locating that total score to the distribution of total scores in a normative sample, perhaps judging the result problematic if the total score fell in the upper 10% of a national sample of respondents. IRT is based on a mathematical model that, for each item on a test, regresses the person's response to the item on a latent score that represents the attribute of the person that the instrument measures. The person's score on the test is estimate of the value of the latent variable that maximizes the likelihood of the person's pattern of responses. In the proposed research, the latent dimension of interest might be viewed as the severity of the patient's substance use. To model the BDI, we used the graded-response model <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>, a variant of IRT for polytomous data.</p>
         </sec>
         <sec>
            <st>
               <p>Adaptive testing simulation</p>
            </st>
            <p>The goal of this study was to determine how well the adaptive BDI predicted SCID diagnoses of MDE, how well it measured depression severity, and how efficient it was compared to the regular BDI. To simulate the adaptive use of the BDI, we wrote a program that interacted with the Adaptive BDI. This program simulated a patient taking the test by using participants' paper and pencil BDI data as if they had been collected adaptively. For each participant, the simulation began by asking the question that was most informative based on the assumption that the participant's latent depression score was the population mean; this is the BDI's question 7, which concerns the subject's disappointment with self. However, we knew how each participant had responded to question 7 on the paper and pencil BDI, and we assumed that he or she would have made the same response if the question had been asked through an adaptive process. Taking the participant's actual response to question 7 as the response to the first question in the simulated adaptive testing session, the computer used the adaptive algorithm to choose the next question. Similarly, at each subsequent step we used the participants' actual responses to drive the algorithm forward.</p>
            <p>Next, we used the simulated patient program to measure how well the Adaptive BDI would predict MDE and the BDI total score, using the strategy of internal cross-validation <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. In this strategy, we first partitioned the 744 cases into 100 groups of seven or eight participants. We then held out the cases in the first of the hundred groups and estimated the IRT model underlying the adaptive BDI using the remaining 99% of the data. The parameter estimates from the IRT model were then substituted into the program implementing the Adaptive BDI. The patient simulation program then used the 1% of participants whose data had not been used in the IRT estimation in a simulation of the adaptive use of the BDI. We then repeated this procedure for each of the other 1% subgroups of participants, until all 744 participants had served as 'fresh' cases in the simulations of the adaptive use of the BDI.</p>
            <p>To compute an ROC curve for the adaptive BDI, we evaluated how the adaptive BDI would behave at each of 30 evenly spaced cut points on &#952; (the latent depression score), ranging from -4.0 to 4.0 (&#952; has mean 0 and standard deviation 1). Through simulation, for each cut point we determined how many questions the algorithm would ask for each participant, and what screening decision it would make when it stopped. Thus, it was possible to compute the sensitivity and specificity of the adaptive BDI for each &#952; cut point, and therefore to calculate its ROC curve and the AUC.</p>
            <p>To measure the efficiency of the adaptive BDI, we calculated the average number of questions asked by the adaptive algorithm at a cutpoint that offered high levels of both sensitivity and specificity. To assess how well it functioned as a measure of depression severity, we calculated the correlation between the adaptive BDI and the Hamilton scale.</p>
         </sec>
         <sec>
            <st>
               <p>Simulating variance in the prevalence of MDE</p>
            </st>
            <p>An important problem in our study was that we used research samples, in which the prevalence of MDE was higher than would be found in many clinical settings. To address this, we compared the adaptive BDI and the regular BDI in several bootstrapping analyses <abbrgrp><abbr bid="B40">40</abbr></abbrgrp> in which we oversampled cases that did not have a diagnosis of MDE.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>IRT analysis of the BDI</p>
            </st>
            <p>The transformation of an existing instrument into an adaptive test begins with a psychometric analysis of the instrument, based on Item Response Theory (IRT) <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B36">36</abbr><abbr bid="B41">41</abbr><abbr bid="B42">42</abbr></abbrgrp>. To this end, we first performed a factor analysis of the BDI data to assess the dimensionality of the instrument. Unidimensionality of the factor structure of the test items &#8211; which means that the associations among patients' responses to the BDI items can be accounted for by a single factor &#8211; is an important assumption underlying unidimensional IRT and CAT <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. We used a factor analysis model appropriate for ordinal categorical data <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>, and estimated it using the program Mplus <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>.</p>
            <p>In our factor analysis, the first factor accounted for 58% of the variance in the BDI (eigenvalue = 12.2), while the next factor accounted for 6% (eigenvalue = 1.2). Fitting a one-factor model to the data produced a root mean-square residual statistic of .048. This statistic ranges between 0 and 1, with small values reflecting a better fit; .05 is often used as a criterion for adequacy of fit. We concluded that a unidimensional model fit the data adequately, as did Clark and his colleagues <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>. Other authors have fit more than one correlated factor to different sets of BDI data <abbrgrp><abbr bid="B46">46</abbr><abbr bid="B47">47</abbr><abbr bid="B48">48</abbr></abbrgrp>. Differing results in factor analyses often reflect differences in sample selection. Our study involved a mixture of patients and healthy participants and it is likely that there was greater variance in the severity of depression among these patients than in studies including primarily psychiatric cases or primarily healthy participants such as college students. If so, we would expect to find a large first factor measuring severity of depression that accounted for a high proportion of the variance in BDI responses. Having established that a unidimensional solution fit the data, the IRT modeling was performed using the program PARSCALE <abbrgrp><abbr bid="B49">49</abbr><abbr bid="B50">50</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>ROC analysis of the Adaptive BDI</p>
            </st>
            <p>The baseline for evaluating the accuracy of the adaptive BDI was the accuracy of the 21-item BDI, so we began by computing the Area Under the Curve (AUC) of the ROC curve <abbrgrp><abbr bid="B51">51</abbr></abbrgrp> for the BDI total score (<graphic file="1471-244X-4-13-i1.gif"/> = 16.3, SD = 12.8), when the latter was used as an indicator of a SCID diagnosis of MDE. The ROC curve for the 21-item BDI total score had an AUC = 89.4% (95% confidence interval = [87.1%, 91.7%]). The ROC curve for the adaptive BDI (Figure <figr fid="F1">1</figr>) was almost identical, with AUC = 88.4%. Note that this statistic and all the following results are cross-validated estimates.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>ROC curve for Adaptive BDI as an indicator of Major Depressive Disorder</p>
               </caption>
               <text>
                  <p>ROC curve for Adaptive BDI as an indicator of Major Depressive Disorder</p>
               </text>
               <graphic file="1471-244X-4-13-1"/>
            </fig>
            <p>We then examined the ROC curve for the adaptive BDI and chose the point that offered the best combination of high sensitivity and high specificity (sensitivity = 87.6%, specificity = 79.3%, positive predictive value = 78.0%, negative predictive value = 88.4%; cases were judged to be positive if &#952; &#8805; .135; this point is labeled 'Best Case' in Figure <figr fid="F1">1</figr>). Table <tblr tid="T1">1</tblr> presents the agreement between the adaptive BDI and the SCID for this case. Figure <figr fid="F2">2</figr> presents the distributions of estimated &#952; scores, depending on whether the patient had no diagnoses, a depression diagnosis other than MDE, or MDE. The Kappa <abbrgrp><abbr bid="B52">52</abbr></abbrgrp> for BDI-SCID concordance was .66, which is considered a good level of agreement by conventional standards. The average number of questions asked was 5.6 (SD = 6.6), and for 69% of the subjects the algorithm asked fewer than five questions (Figure <figr fid="F3">3</figr>). In addition, the algorithm asked more questions about cases in which it decided that the participant had an MDE (<graphic file="1471-244X-4-13-i1.gif"/> = 6.3, SD = 7.2) than in cases in which it decided that an MDE was not present (<graphic file="1471-244X-4-13-i1.gif"/> = 4.9, SD = 5.8; Levene's <it>t</it>(725) = 2.83, <it>p </it>&lt; .005).</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Cross-validated agreement between adaptive BDI 'best case' and SCID Major Depressive Episode: Unweighted results</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="2" ca="center">
                        <p>SCID</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Adaptive BDI</p>
                     </c>
                     <c ca="center">
                        <p>Negative</p>
                     </c>
                     <c ca="center">
                        <p>Positive</p>
                     </c>
                     <c ca="center">
                        <p>Total</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Negative</p>
                     </c>
                     <c ca="center">
                        <p>321</p>
                     </c>
                     <c ca="center">
                        <p>42</p>
                     </c>
                     <c ca="center">
                        <p>363</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Positive</p>
                     </c>
                     <c ca="center">
                        <p>84</p>
                     </c>
                     <c ca="center">
                        <p>297</p>
                     </c>
                     <c ca="center">
                        <p>381</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Total</p>
                     </c>
                     <c ca="center">
                        <p>405</p>
                     </c>
                     <c ca="center">
                        <p>339</p>
                     </c>
                     <c ca="center">
                        <p>744</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Box plot of distribution of &#920; by depression diagnosis</p>
               </caption>
               <text>
                  <p>Box plot of distribution of &#920; by depression diagnosis</p>
               </text>
               <graphic file="1471-244X-4-13-2"/>
            </fig>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Histogram of questions asked in 'best case' simulation <b>Questions</b></p>
               </caption>
               <text>
                  <p>Histogram of questions asked in 'best case' simulation <b>Questions</b></p>
               </text>
               <graphic file="1471-244X-4-13-3"/>
            </fig>
            <p>Finally, we asked whether the adaptive BDI would be as useful as the full BDI as a measure of the severity of depression. The estimated latent depression scores (<graphic file="1471-244X-4-13-i2.gif"/>) in the 'best case' simulation were highly correlated (<it>r </it>= .92, <it>N </it>= 744) with the BDI total score. For the 285 clinical cases for whom we had both a BDI and a Hamilton score, the BDI total score had a correlation of <it>r </it>= .70 with the Hamilton, while the correlation between <graphic file="1471-244X-4-13-i2.gif"/> and the Hamilton was <it>r </it>= .74. This difference is statistically significant [p &lt; .006, <abbrgrp><abbr bid="B53">53</abbr></abbrgrp>].</p>
         </sec>
         <sec>
            <st>
               <p>The effect of prevalence of MDE</p>
            </st>
            <p>Our study group included more positive cases than a sample that might be found in medical settings other than specialty mental health settings. To examine whether our results would still hold if the prevalence of MDE were lower, we conducted additional simulations in which we created new study groups of cases by randomly sampling cases with replacement from our data (i.e., bootstrapping). We generated 1000 bootstrap samples in which the sampling weights on cases were set such that the average prevalence of MDE in the bootstrap samples was 10%. We then repeated our AUC analyses in each bootstrapped sample. The results suggested that both the regular BDI and adaptive BDI performed as well or better when the prevalence of MDE was lower. That is, the average AUC for the agreements between the adaptive BDI and the SCID was 92.4%, and the average AUC for the agreements between the regular BDI and the SCID was 92.3%.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>The AUC for the adaptive BDI was a respectable 88%, indicating that the adaptive test could correctly classify large proportions of both positive and negative cases. The 'best case' adaptive BDI was able to classify a subject using an average of only 5.6 questions. The latent depression score generated in that simulation was highly correlated with the BDI total score. In addition, for the subset of the data for which Hamilton scores were available, the latent score was more highly correlated with the Hamilton than the BDI total score was. The latter results indicate that the adaptive BDI would be as useful as the full BDI as a measure of the severity of depression. Thus, in our simulation the adaptive BDI was as accurate as the full scale BDI while dramatically improving efficiency. We note that the CAT algorithm can be 'tuned' to the assessment purpose at hand, for which one might choose another point that emphasized either sensitivity or specificity.</p>
         <p>The results also suggested, however, that five or six questions would be the required number of items for only a few participants. Indeed, the adaptive BDI asked fewer than five questions for the majority of patients. Even when the adaptive BDI asked few questions, it usually made the same screening decision as the full BDI: the rates of disagreements between the adaptive BDI and the SCID were slightly higher when more questions were asked. In addition, the algorithm asked more questions about positive cases. This is an attractive outcome, because the patients' answers provide useful symptom data for the clinician. Thus, an adaptive test budgets the patient and clinician time spent on measuring depression, allocating it primarily to persons for whom there are reasons for concern.</p>
         <p>A reduction of 15 questions may not seem important, given that the full BDI takes only a few minutes to complete. In our experience, however, clinicians are very concerned about both office visit time and maintaining a smooth flow of patients through the waiting room. In addition, health care providers are confronted with recommendations that they screen for many illnesses and health-related problems. Saving questions on a depression screen might free time to screen for other problems such as domestic violence or substance abuse.</p>
         <p>Based on this simulation, it appears that adaptive testing has significant promise for settings where both high efficiency and high accuracy are essential. These settings include primary care, where clinician time is a rate-limiting factor, and ongoing monitoring after successful treatment in specialty mental health care, where respondent burden is a constraint. The next step should be to field test adaptive instruments and validate them prospectively, to make certain that they are acceptable to patients and clinicians, and to measure the costs of implementing them.</p>
         <sec>
            <st>
               <p>Limitations</p>
            </st>
            <p>The principal limitation of our study is that it is a simulation. In particular, we assumed that participants would respond to questions presented adaptively similarly to the way they responded on the paper BDI, and that the order in which questions are asked does not have an important effect on accuracy. We have some confidence in these assumptions, based on prior research showing that adaptive versions of tests in other fields have accuracy that is similar to paper and pencil versions <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>.</p>
            <p>A second limitation is that we compiled our study group from several research studies. Although the study group included a wide range of both healthy and acutely ill individuals, it would be preferable to have a random sample from a defined health care setting. We attempted to statistically control the prevalence of MDE in our bootstrap analyses and found that the performance of the regular and adaptive BDI improved slightly when the prevalence of MDE was decreased. While this is reassuring, we speculate that because our study group may have lacked some of the mildly depressed and difficult to classify cases that are found in many real world settings. The sample that we have collected may also explain why we found evidence for only one factor in the BDI data, where other researchers have found evidence for two or more. Although we view our study as providing strong support for the <it>concept </it>of adaptive measurement of depression, because of these limitations it needs to be replicated in an actual sample.</p>
            <p>A final limitation of our study is the BDI itself. Our results argue that a computerized adaptive test has the same sensitivity and specificity as its full-length version, and the same validity as a measure of symptom severity. Conversion to CAT cannot, however, produce a test with <it>higher </it>validity than the base instrument. Although the BDI is widely used, one may still judge that, in light of a false positive rate of 21%, neither the regular nor the adaptive BDI is sufficiently accurate to serve as a screen for MDE. Our results nevertheless suggest that whatever depression instrument one chooses, it would be more efficient to administer it adaptively.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusions</p>
         </st>
         <p>We believe that adaptive testing could substantially improve the tradeoff between accuracy and efficiency in the assessment of psychopathology. The simulation conducted here showed that the BDI, a widely used depression instrument, could be converted to a far more efficient adaptive test without loss of accuracy. We need more studies to assess the performance of adaptive tests in both mental health specialty and other clinical settings.</p>
      </sec>
      <sec>
         <st>
            <p>Competing interests</p>
         </st>
         <p>None declared.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>William Gardner: Principal author, data analyst, computer programming, and conceptualization of the study.</p>
         <p>Katherine Shear: Conceptualization of study, contribution of data, expertise on psychiatric measurement, critical revision of text.</p>
         <p>Kelly J. Kelleher: Conceptualization of study, critical revision of text.</p>
         <p>Kathleen A. Pajer: Conceptualization of study, expertise on psychiatric measurement, critical revision of text.</p>
         <p>Oommen Mammen: Contribution of data, expertise on psychiatric measurement, critical revision of text.</p>
         <p>Daniel Buysse: Contribution of data, expertise on psychiatric measurement, critical revision of text.</p>
         <p>Ellen Frank: Contribution of data, expertise on psychiatric measurement, critical revision of text.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>This study was supported by grants from the NIMH (MH30915, MH29618, MH49115, MH53817, MH56848) and the Staunton Farm Foundation. We thank Drs. David Kupfer and Victoria Grochocinski for comments. Thanks to Mary McShea, and Deb Stapf for assistance in assembling the data.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Guide to Clinical Preventive Services. Screening: Depression</p>
            </title>
            <aug>
               <au>
                  <cnm>U.S. Preventive Services Task Force</cnm>
               </au>
            </aug>
            <publisher>Agency for Health Care Research and Quality</publisher>
            <edition>3rd Edition: Periodic Updates</edition>
            <pubdate>2002</pubdate>
            <volume>2002</volume>
            <url>http://www.ahcpr.gov/clinic/uspstf/uspsdepr.htm</url>
         </bibl>
         <bibl id="B2">
            <title>
               <p>The hidden mental health network</p>
            </title>
            <aug>
               <au>
                  <snm>Schurman</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Kramer</snm>
                  <fnm>PD</fnm>
               </au>
               <au>
                  <snm>Mitchell</snm>
                  <fnm>JB</fnm>
               </au>
            </aug>
            <source>Archives of General Psychiatry</source>
            <pubdate>1985</pubdate>
            <volume>42</volume>
            <fpage>89</fpage>
            <lpage>94</lpage>
            <xrefbib>
               <pubid idtype="pmpid">3966857</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>The de facto US mental and addictive disorders service system:  Epidemiologic Catchment Area prospective 1-year prevalence rates of disorders and services</p>
            </title>
            <aug>
               <au>
                  <snm>Regier</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Narrow</snm>
                  <fnm>WE</fnm>
               </au>
               <au>
                  <snm>Rae</snm>
                  <fnm>DS</fnm>
               </au>
               <au>
                  <snm>Manderscheid</snm>
                  <fnm>RW</fnm>
               </au>
               <au>
                  <snm>Locke</snm>
                  <fnm>BZ</fnm>
               </au>
               <au>
                  <snm>Goodwin</snm>
                  <fnm>FK</fnm>
               </au>
            </aug>
            <source>Archives of General Psychiatry</source>
            <pubdate>1993</pubdate>
            <volume>50</volume>
            <fpage>85</fpage>
            <lpage>94</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8427558</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Epidemiology of depression in primary care</p>
            </title>
            <aug>
               <au>
                  <snm>Katon</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Schulberg</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>General Hospital Psychiatry</source>
            <pubdate>1992</pubdate>
            <volume>14</volume>
            <fpage>237</fpage>
            <lpage>247</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0163-8343(92)90094-Q</pubid>
                  <pubid idtype="pmpid">1505745</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Recognition, management, and outcomes of depression in primary care</p>
            </title>
            <aug>
               <au>
                  <snm>Simon</snm>
                  <fnm>GE</fnm>
               </au>
               <au>
                  <snm>VonKorff</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Archives of Family Medicine</source>
            <pubdate>1995</pubdate>
            <volume>4</volume>
            <fpage>99</fpage>
            <lpage>105</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1001/archfami.4.2.99</pubid>
                  <pubid idtype="pmpid">7842160</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Best clinical practice:  Guidelines for managing major depression in primary medical care</p>
            </title>
            <aug>
               <au>
                  <snm>Schulberg</snm>
                  <fnm>HC</fnm>
               </au>
               <au>
                  <snm>Katon</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Simon</snm>
                  <fnm>GE</fnm>
               </au>
               <au>
                  <snm>Rush</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Journal of Clinical Psychiatry</source>
            <pubdate>1999</pubdate>
            <volume>60</volume>
            <fpage>19</fpage>
            <lpage>26</lpage>
            <xrefbib>
               <pubid idtype="pmpid">10326871</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Are patients' office visits with physicians getting shorter?</p>
            </title>
            <aug>
               <au>
                  <snm>Mechanic</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>McAlpine</snm>
                  <fnm>DD</fnm>
               </au>
               <au>
                  <snm>Rosenthal</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>New England Journal of Medicine</source>
            <pubdate>2001</pubdate>
            <volume>344</volume>
            <fpage>223</fpage>
            <lpage>225</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1056/NEJM200101183440307</pubid>
                  <pubid idtype="pmpid" link="fulltext">11172147</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Collaborative management of chronic illness</p>
            </title>
            <aug>
               <au>
                  <snm>Von Korff</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Gruman</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Schaefer</snm>
                  <fnm>JK</fnm>
               </au>
               <au>
                  <snm>Curry</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Wagner</snm>
                  <fnm>EH</fnm>
               </au>
            </aug>
            <source>Annals of Internal Medicine</source>
            <pubdate>1997</pubdate>
            <volume>127</volume>
            <fpage>1097</fpage>
            <lpage>1102</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9412313</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Improving outcomes in chronic illness</p>
            </title>
            <aug>
               <au>
                  <snm>Wagner</snm>
                  <fnm>EH</fnm>
               </au>
               <au>
                  <snm>Austin</snm>
                  <fnm>BT</fnm>
               </au>
               <au>
                  <snm>Von Korff</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Managed Care Quarterly</source>
            <pubdate>1996</pubdate>
            <fpage>12</fpage>
            <lpage>25</lpage>
            <xrefbib>
               <pubid idtype="pmpid">10157259</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>A randomized trial of relapse prevention of depression in primary care</p>
            </title>
            <aug>
               <au>
                  <snm>Katon</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Rutter</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Ludman</snm>
                  <fnm>EJ</fnm>
               </au>
               <au>
                  <snm>Von Korff</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Simon</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Bush</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Walker</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Unutzer</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Archives of General Psychiatry</source>
            <pubdate>2001</pubdate>
            <volume>58</volume>
            <fpage>241</fpage>
            <lpage>247</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1001/archpsyc.58.3.241</pubid>
                  <pubid idtype="pmpid" link="fulltext">11231831</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Managing depression as a chronic disease: A randomised trial of ongoing treatment in primary care</p>
            </title>
            <aug>
               <au>
                  <snm>Rost</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Nutting</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Elliot</snm>
                  <fnm>CE</fnm>
               </au>
               <au>
                  <snm>Dickinson</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>British Medical Journal</source>
            <pubdate>2002</pubdate>
            <volume>325</volume>
            <fpage>934</fpage>
            <lpage>939</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1136/bmj.325.7370.934</pubid>
                  <pubid idtype="pmpid" link="fulltext">12399343</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Manual for the Beck Depression Inventory</p>
            </title>
            <aug>
               <au>
                  <snm>Beck</snm>
                  <fnm>AT</fnm>
               </au>
               <au>
                  <snm>Steer</snm>
                  <fnm>RA</fnm>
               </au>
            </aug>
            <publisher>San Antonio, TX, Psychological Corporation</publisher>
            <pubdate>1993</pubdate>
         </bibl>
         <bibl id="B13">
            <title>
               <p>The cost-utility of screening for depression in primary care</p>
            </title>
            <aug>
               <au>
                  <snm>Valenstein</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Vijan</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Zeber</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Boehm</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Buttar</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Annals of Internal Medicine</source>
            <pubdate>2001</pubdate>
            <volume>134</volume>
            <fpage>345</fpage>
            <lpage>360</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11242495</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>CIDI-Auto Version 2.1: Administrator's Guide and Reference</p>
            </title>
            <aug>
               <au>
                  <cnm>World Health Organization</cnm>
               </au>
            </aug>
            <publisher>Sydney, Training and Reference Centre for WHO CIDI</publisher>
            <pubdate>1997</pubdate>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Are computerized interviews equivalent to human interviewers? CIDI-Auto versus CIDI in anxiety and depressive disorders</p>
            </title>
            <aug>
               <au>
                  <snm>Peters</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Clark</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Carroll</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Psychological Medicine</source>
            <pubdate>1998</pubdate>
            <volume>28</volume>
            <fpage>893</fpage>
            <lpage>901</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1017/S0033291798006655</pubid>
                  <pubid idtype="pmpid">9723144</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Computerized adaptive testing: A primer</p>
            </title>
            <aug>
               <au>
                  <snm>Wainer</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <publisher>Hillsdale, NJ, Erlbaum Associates</publisher>
            <edition>2nd</edition>
            <pubdate>2000</pubdate>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Generic health measurement: Past accomplishments and a measurement paradigm for the 21st century</p>
            </title>
            <aug>
               <au>
                  <snm>McHorney</snm>
                  <fnm>CA</fnm>
               </au>
            </aug>
            <source>Annals of Internal Medicine</source>
            <pubdate>1997</pubdate>
            <volume>127</volume>
            <fpage>743</fpage>
            <lpage>750</lpage>
            <xrefbib>
               <pubid idtype="pmpid">9382391</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Item Response Theory and health outcomes measurement in the 21st century</p>
            </title>
            <aug>
               <au>
                  <snm>Hays</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Morales</snm>
                  <fnm>LS</fnm>
               </au>
               <au>
                  <snm>Reise</snm>
                  <fnm>SP</fnm>
               </au>
            </aug>
            <source>Medical Care</source>
            <pubdate>2000</pubdate>
            <volume>38</volume>
            <fpage>II</fpage>
            <lpage>28-II-42</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1097/00005650-200009002-00007</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Introduction to classical and modern test theory</p>
            </title>
            <aug>
               <au>
                  <snm>Crocker</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Algina</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <publisher>New York, Holt, Rinehart, &amp; Winston</publisher>
            <pubdate>1986</pubdate>
         </bibl>
         <bibl id="B20">
            <title>
               <p>An evaluation of a computer assisted telephone interview for screening for mental disorders among primary care patients</p>
            </title>
            <aug>
               <au>
                  <snm>Leon</snm>
                  <fnm>AC</fnm>
               </au>
               <au>
                  <snm>Kelsey</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Pleil</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Burgos</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Potera</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Lowell</snm>
                  <fnm>KN</fnm>
               </au>
            </aug>
            <source>Journal of Nervous and Mental Diseases</source>
            <pubdate>1999</pubdate>
            <volume>187</volume>
            <fpage>308</fpage>
            <lpage>311</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1097/00005053-199905000-00008</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>A computer-administered telephone interview to identify mental disorders</p>
            </title>
            <aug>
               <au>
                  <snm>Kobak</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Dottl</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Griest</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Jefferson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Burroughs</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Mantle</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Katzelnick</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>R</snm>
                  <fnm>Norton.</fnm>
               </au>
               <au>
                  <snm>Henk</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Serlin</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Journal of the American Medical Association</source>
            <pubdate>1997</pubdate>
            <volume>278</volume>
            <fpage>905</fpage>
            <lpage>910</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1001/jama.278.11.905</pubid>
                  <pubid idtype="pmpid">9302242</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Equivalence of computerized and paper-and-pencil cognitive ability tests:  A meta-analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Mead</snm>
                  <fnm>AD</fnm>
               </au>
               <au>
                  <snm>Drasgow</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Psychological Bulletin</source>
            <pubdate>1993</pubdate>
            <volume>114</volume>
            <fpage>449</fpage>
            <lpage>458</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1037//0033-2909.114.3.449</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Acceptability of computer-acquired sexual histories in adolescent girls</p>
            </title>
            <aug>
               <au>
                  <snm>Millstein</snm>
                  <fnm>SG</fnm>
               </au>
               <au>
                  <snm>Irwin</snm>
                  <fnm>CE</fnm>
               </au>
            </aug>
            <source>Journal of Pediatrics</source>
            <pubdate>1983</pubdate>
            <volume>103</volume>
            <fpage>815</fpage>
            <lpage>819</lpage>
            <xrefbib>
               <pubid idtype="pmpid">6631616</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Patient reactions to computer-based medical interviewing</p>
            </title>
            <aug>
               <au>
                  <snm>Slack</snm>
                  <fnm>WV</fnm>
               </au>
               <au>
                  <snm>Van Cura</snm>
                  <fnm>LJ</fnm>
               </au>
            </aug>
            <source>Computers in Biomedical Research</source>
            <pubdate>1968</pubdate>
            <volume>1</volume>
            <fpage>527</fpage>
            <lpage>531</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/0010-4809(68)90018-9</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Psychology of computer use:  XX.  Sexual abuse recalled:  Evaluation of a computerized questionnaire in a population of young adult males</p>
            </title>
            <aug>
               <au>
                  <snm>Bagley</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Genuis</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Perceptual and Motor Skills</source>
            <pubdate>1991</pubdate>
            <volume>72</volume>
            <fpage>287</fpage>
            <lpage>288</lpage>
            <xrefbib>
               <pubid idtype="pmpid">2038524</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>A computer interview for suicide-risk prediction</p>
            </title>
            <aug>
               <au>
                  <snm>Greist</snm>
                  <fnm>JH</fnm>
               </au>
               <au>
                  <snm>Gustafson</snm>
                  <fnm>DH</fnm>
               </au>
               <au>
                  <snm>Strauss</snm>
                  <fnm>FF</fnm>
               </au>
               <au>
                  <snm>Rowse</snm>
                  <fnm>GL</fnm>
               </au>
               <au>
                  <snm>Laughren</snm>
                  <fnm>TP</fnm>
               </au>
               <au>
                  <snm>Chiles</snm>
                  <fnm>JA</fnm>
               </au>
            </aug>
            <source>American Journal of Psychiatry</source>
            <pubdate>1973</pubdate>
            <volume>130</volume>
            <fpage>1327</fpage>
            <lpage>1332</lpage>
            <xrefbib>
               <pubid idtype="pmpid">4585280</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Psychiatrists and a computer as interrogators of patients with alcohol-related illnesses:  A comparison</p>
            </title>
            <aug>
               <au>
                  <snm>Lucas</snm>
                  <fnm>RW</fnm>
               </au>
               <au>
                  <snm>Mullin</snm>
                  <fnm>PJ</fnm>
               </au>
               <au>
                  <snm>Luna</snm>
                  <fnm>CB</fnm>
               </au>
               <au>
                  <snm>McInroy</snm>
                  <fnm>DC</fnm>
               </au>
            </aug>
            <source>Br J Psychiatry</source>
            <pubdate>1977</pubdate>
            <volume>131</volume>
            <fpage>160</fpage>
            <lpage>167</lpage>
            <xrefbib>
               <pubid idtype="pmpid">334310</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Health status assessment for the twenty-first century:  Item response theory, item banking and computerized adaptive testing</p>
            </title>
            <aug>
               <au>
                  <snm>Revicki</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Cella</snm>
                  <fnm>DF</fnm>
               </au>
            </aug>
            <source>Quality of Life Research</source>
            <pubdate>1997</pubdate>
            <volume>6</volume>
            <fpage>595</fpage>
            <lpage>600</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1023/A:1018420418455</pubid>
                  <pubid idtype="pmpid">9330558</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Adaptive EAP estimation of ability in a microcomputer environment</p>
            </title>
            <aug>
               <au>
                  <snm>Bock</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Mislevy</snm>
                  <fnm>RJ</fnm>
               </au>
            </aug>
            <source>Applied Psychological Measurement</source>
            <pubdate>1982</pubdate>
            <volume>6</volume>
            <fpage>431</fpage>
            <lpage>444</lpage>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Structured Clinical Interview for DSM-IV Axis I Disorders (SCID-I)</p>
            </title>
            <aug>
               <au>
                  <snm>First</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Spitzer</snm>
                  <fnm>RL</fnm>
               </au>
               <au>
                  <snm>Gibbon</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Williams</snm>
                  <fnm>JBW</fnm>
               </au>
            </aug>
            <publisher>Washington, DC, American Psychiatric Press</publisher>
            <pubdate>1997</pubdate>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Adaptive testing by computer</p>
            </title>
            <aug>
               <au>
                  <snm>Weiss</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Journal of Consulting and Clinical Psychology</source>
            <pubdate>1985</pubdate>
            <volume>53</volume>
            <fpage>774</fpage>
            <lpage>789</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1037//0022-006X.53.6.774</pubid>
                  <pubid idtype="pmpid">3841355</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Multidimensional adaptive testing for mental health problems in primary care</p>
            </title>
            <aug>
               <au>
                  <snm>Gardner</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Kelleher</snm>
                  <fnm>KJ</fnm>
               </au>
               <au>
                  <snm>Pajer</snm>
                  <fnm>KA</fnm>
               </au>
            </aug>
            <source>Medical Care</source>
            <pubdate>2002</pubdate>
            <volume>40</volume>
            <fpage>812</fpage>
            <lpage>823</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1097/00005650-200209000-00010</pubid>
                  <pubid idtype="pmpid" link="fulltext">12218771</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Pediatric symptom checklist: Screening school-age children for psychosocial dysfunction</p>
            </title>
            <aug>
               <au>
                  <snm>Jellinek</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Murphy</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Robinson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Feins</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lamb</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Fenton</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Journal of Pediatrics</source>
            <pubdate>1988</pubdate>
            <volume>112</volume>
            <fpage>201</fpage>
            <lpage>209</lpage>
            <xrefbib>
               <pubid idtype="pmpid">3339501</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Anger attacks: Correlates and significance of an underrecognized symptom</p>
            </title>
            <aug>
               <au>
                  <snm>Mammen</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Shear</snm>
                  <fnm>MK</fnm>
               </au>
               <au>
                  <snm>Pilkonis</snm>
                  <fnm>PA</fnm>
               </au>
               <au>
                  <snm>Kolko</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Thase</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Greeno</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Journal of Clinical Psychiatry</source>
            <pubdate>1999</pubdate>
            <volume>60</volume>
            <fpage>633</fpage>
            <lpage>642</lpage>
            <xrefbib>
               <pubid idtype="pmpid">10520986</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Development of a rating scale for primary depressive illness</p>
            </title>
            <aug>
               <au>
                  <snm>Hamilton</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>British Journal of Social and Clinical Psychology</source>
            <pubdate>1967</pubdate>
            <volume>6</volume>
            <fpage>278</fpage>
            <lpage>296</lpage>
            <xrefbib>
               <pubid idtype="pmpid">6080235</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Item response theory for psychologists</p>
            </title>
            <aug>
               <au>
                  <snm>Embretson</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Reise</snm>
                  <fnm>SP</fnm>
               </au>
            </aug>
            <publisher>Mahwah, NJ, Lawrence Erlbaum</publisher>
            <pubdate>2000</pubdate>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Fundamentals of item response theory</p>
            </title>
            <aug>
               <au>
                  <snm>Hambleton</snm>
                  <fnm>RK</fnm>
               </au>
               <au>
                  <snm>Swaminathan</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Rogers</snm>
                  <fnm>HJ</fnm>
               </au>
            </aug>
            <publisher>Newbury Park, CA, Sage</publisher>
            <pubdate>1991</pubdate>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Handbook of modern item response theory</p>
            </title>
            <aug>
               <au>
                  <snm>van der Linden</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Hambleton</snm>
                  <fnm>RK</fnm>
               </au>
            </aug>
            <publisher>New York, Springer Verlag</publisher>
            <pubdate>1997</pubdate>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Estimating the error rate of a prediction rule: Improvements on cross-validation</p>
            </title>
            <aug>
               <au>
                  <snm>Efron</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Journal of the American Statistical Association</source>
            <pubdate>1983</pubdate>
            <volume>78</volume>
            <fpage>316</fpage>
            <lpage>331</lpage>
         </bibl>
         <bibl id="B40">
            <title>
               <p>The jackknife, the bootstrap, and other resampling plans</p>
            </title>
            <aug>
               <au>
                  <snm>Efron</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <publisher>Philadelphia, SIAM</publisher>
            <pubdate>1982</pubdate>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Computerized adaptive testing with polytomous items</p>
            </title>
            <aug>
               <au>
                  <snm>Dodd</snm>
                  <fnm>BG</fnm>
               </au>
               <au>
                  <snm>De Ayala</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Koch</snm>
                  <fnm>WR</fnm>
               </au>
            </aug>
            <source>Applied Psychological Measurement</source>
            <pubdate>1995</pubdate>
            <volume>19</volume>
            <fpage>5</fpage>
            <lpage>22</lpage>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Item response theory: Principles and applications</p>
            </title>
            <aug>
               <au>
                  <snm>Hambleton</snm>
                  <fnm>RK</fnm>
               </au>
               <au>
                  <snm>Swaminathan</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <publisher>Boston, Kluwer-Nijhof</publisher>
            <pubdate>1985</pubdate>
         </bibl>
         <bibl id="B43">
            <title>
               <p>A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators</p>
            </title>
            <aug>
               <au>
                  <snm>Muth&#233;n</snm>
                  <fnm>BO</fnm>
               </au>
            </aug>
            <source>Psychometrika</source>
            <pubdate>1984</pubdate>
            <volume>49</volume>
            <fpage>115</fpage>
            <lpage>132</lpage>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Mplus user's guide</p>
            </title>
            <aug>
               <au>
                  <snm>Muth&#233;n</snm>
                  <fnm>LK</fnm>
               </au>
               <au>
                  <snm>Muth&#233;n</snm>
                  <fnm>BO</fnm>
               </au>
            </aug>
            <publisher>Los Angeles, CA, Muthen &amp; Muthen</publisher>
            <pubdate>1998</pubdate>
         </bibl>
         <bibl id="B45">
            <title>
               <p>The core symptoms of depression in medical and psychiatric patients.</p>
            </title>
            <aug>
               <au>
                  <snm>Clark</snm>
                  <fnm>DC</fnm>
               </au>
               <au>
                  <snm>vonAmmon</snm>
                  <fnm>Cavansugh S</fnm>
               </au>
               <au>
                  <snm>Gibbons</snm>
                  <fnm>RD</fnm>
               </au>
            </aug>
            <source>J Nerv &amp; Ment Dis</source>
            <pubdate>1983</pubdate>
            <volume>171</volume>
            <fpage>705</fpage>
            <lpage>713</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid">6644280</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Application of modern psychometric theory in psychiatric research</p>
            </title>
            <aug>
               <au>
                  <snm>Gibbons</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Clark</snm>
                  <fnm>DC</fnm>
               </au>
               <au>
                  <snm>Cavanaugh</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Journal of Psychiatric Research</source>
            <pubdate>1985</pubdate>
            <volume>19</volume>
            <fpage>43</fpage>
            <lpage>55</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0022-3956(85)90067-6</pubid>
                  <pubid idtype="pmpid">3989737</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Confirmatory hierarchical factor analyses of psychological distress measures</p>
            </title>
            <aug>
               <au>
                  <snm>Tanaka</snm>
                  <fnm>Jeffrey S</fnm>
               </au>
               <au>
                  <snm>Huba</snm>
                  <fnm>George J</fnm>
               </au>
            </aug>
            <source>Journal of Personality &amp; Social Psychology</source>
            <pubdate>1984</pubdate>
            <volume>46</volume>
            <fpage>621</fpage>
            <lpage>635</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1037//0022-3514.46.3.621</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Structures of psychological distress: Testing confirmatory hierarchical models</p>
            </title>
            <aug>
               <au>
                  <snm>Tanaka</snm>
                  <fnm>Jeffrey S</fnm>
               </au>
               <au>
                  <snm>Huba</snm>
                  <fnm>George J</fnm>
               </au>
            </aug>
            <source>Journal of Consulting &amp; Clinical Psychology</source>
            <pubdate>1984</pubdate>
            <volume>52</volume>
            <fpage>719</fpage>
            <lpage>721</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1037//0022-006X.52.4.719</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Fitting a polytomous item response model to Likert-type data</p>
            </title>
            <aug>
               <au>
                  <snm>Muraki</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Applied Psychological Measurement</source>
            <pubdate>1990</pubdate>
            <volume>14</volume>
            <fpage>59</fpage>
            <lpage>71</lpage>
         </bibl>
         <bibl id="B50">
            <title>
               <p>PARSCALE: IRT item analysis and test-scoring for rating-scale data</p>
            </title>
            <aug>
               <au>
                  <snm>Muraki</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Bock</snm>
                  <fnm>RD</fnm>
               </au>
            </aug>
            <publisher>Chicago, Scientific Software International</publisher>
            <pubdate>1997</pubdate>
         </bibl>
         <bibl id="B51">
            <title>
               <p>Evaluating medical tests: Objective and quantitative guidelines</p>
            </title>
            <aug>
               <au>
                  <snm>Kraemer</snm>
                  <fnm>HC</fnm>
               </au>
            </aug>
            <publisher>Newbury Park, CA, Sage Publications</publisher>
            <pubdate>1992</pubdate>
         </bibl>
         <bibl id="B52">
            <title>
               <p>A coefficient of agreement for nominal scales</p>
            </title>
            <aug>
               <au>
                  <snm>Cohen</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Educational and Psychological Measurement</source>
            <pubdate>1960</pubdate>
            <volume>20</volume>
            <fpage>37</fpage>
            <lpage>46</lpage>
         </bibl>
         <bibl id="B53">
            <title>
               <p>Comparing correlated correlation coefficients</p>
            </title>
            <aug>
               <au>
                  <snm>Meng</snm>
                  <fnm>X-L</fnm>
               </au>
               <au>
                  <snm>Rosenthal</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>DB</fnm>
               </au>
            </aug>
            <source>Psychological Bulletin</source>
            <pubdate>1992</pubdate>
            <volume>111</volume>
            <fpage>172</fpage>
            <lpage>175</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1037//0033-2909.111.1.172</pubid>
            </xrefbib>
         </bibl>
      </refgrp>
      <sec>
         <st>
            <p>Pre-publication history</p>
         </st>
         <p>The pre-publication history for this paper can be accessed here:</p>
         <p>
            <url>http://www.biomedcentral.com/1471-244X/4/13/prepub</url>
         </p>
      </sec>
   </bm>
</art>
