Abstract
Background
Loglinear association models have been extensively used to investigate the pattern of agreement between ordinal ratings. In 2007, loglinear nonuniform association models were introduced to estimate, from a crossclassification of two independent raters using an ordinal scale, varying degrees of distinguishability between distant and adjacent categories of the scale.
Methods
In this paper, a simple method based on simulations was proposed to estimate the power of nonuniform association models to detect heterogeneities across distinguishabilities between adjacent categories of an ordinal scale, illustrating some possible scale defects.
Results
Different scenarios of distinguishability patterns were investigated, as well as different scenarios of marginal heterogeneity within rater. For sample size of N = 50, the probabilities of detecting heterogeneities within the tables are lower than .80, whatever the number of categories. In additition, even for large samples, marginal heterogeneities within raters led to a decrease in power estimates.
Conclusion
This paper provided some issues about how many objects had to be classified by two independent observers (or by the same observer at two different times) to be able to detect a given scale structure defect. Our results also highlighted the importance of marginal homogeneity within raters, to ensure optimal power when using nonuniform association models.
Background
Initially developped in psychometrics to assess the severity of behavioral troubles or disturbances [13], ordinal rating scales (ORS) are now essential tools in health research and health care: for example to measure clinical outcomes such as symptom grading [4], pathologists finding [5], disease severity [6], treatment response [79], as well as healthrelated quality of life [10,11]. When the same objects are classified twice on a scale, differences in perception of one observer to another, or of the same observer at two successive times, lead to interrater and intrarater variability. For patients, reproducibility of ratings made using an ORS is a major issue because their classification into one of the different categories may have important consequences on their therapeutic followup and possibly on their quality of life. There are two main components of reproducibility. The first component is marginal homogeneity between raters, which corresponds to the differences in raters marginal distributions and refers to the tendencies of a rater to make classifications higher or lower than those of the other rater. The second component is category distinguishability, that is to say the ability for observers to distinguish between categories. Recently, nonuniform association models (NUA) were proposed by Valet et al. [12] to estimate degrees of distinguishability between adjacent categories of an ORS. These models allowed to test different patterns of distinguishability and then to give information of the scale structure quality.
When designing a reproducibility study with two observers (or one observer at two different times) assessing the same objects on an ORS, two major questions have to be solved: How many objects has to be classified by the two observers to be able to detect a given heterogeneous pattern of distinguishability between adjacent categories? Is it important to select these objects in an attempt to approximate some marginal distributions? In this study, simulations were used to estimate the power of nonuniform association models to detect heterogeneities across distinguishabilities between adjacent categories as a function of typical distinguishability patterns and total number of objects classified, assuming homogeneous marginal distribution within reader and between readers. Then, for the same numbers of objects classified twice, the influence of different patterns of marginal heterogeneity within reader on power estimate was studied.
Methods
Loglinear nonuniform association models
Loglinear modelling and parameters interpretation
Classifications of N objects by two independent raters A and B (or by the same rater at two different times) using an ORS with I categories can be summarized in a I × I contingency table. In this table, let us define counts n_{ij }as the numbers of objects rated i (i = 1,..., I) by observer A and j (j = 1,..., I) by observer B, and suppose that these counts have a full multinomial distribution with expected mean m_{ij }= N × π_{ij }, where N is the sample size, and π_{ij }is a probability distribution on the cells of the I × I table. Loglinear modelling expresses the logarithm of these m_{ij }as a linear combination of parameters that illustrates raters effects on categories, as well as sources of agreement and disagreement. For the independence model, which assumes that ratings are statistically independent, the model is written as:
where μ is the overall effect and and are A and B effects on category i and j, respectively. For this model, agreement between raters is expected to be due to chance only.
When analyzing agreement in ordered contingency table, we can usually expect an association between ratings due to the natural ordering of the scale. As described by several authors [1215], this association between rating is expected to increase as the distance between categories increases. For instance on a fivelevel severity scale, if an object is rated "1" by A, the probability for this object to be rated "5" by B is very low [16]. This association can be expressed through odds ratio τ_{ij }= m_{ii}m_{jj}/m_{ij}m_{ji}. An odds ratio value equal to 1 indicates that the two ratings are independent. From odds ratio τ_{ij}, Darroch and McCloud defined as the degree of distinguishability (DD) between two categories of an ORS, that is to say the readers' ability to distinguish between these two categories [17]. A DD value close to 1 indicates an almost perfect distinguishability between the two corresponding categories whereas a DD value close to 0 indicates that these two categories are very hard to distinguish.
Uniform Association (UA) and NonUniform Association (NUA) models
In order to take into account this association, Goodman introduced the uniform association (UA) model. In 2007, Valet et al. [12] proposed an equivalent but simpler parameterization of the UA model as:
where i = 1,..., I and j = 1,..., I. From the UA model, odds ratio are written as . Hence, DDs between two categories i and j are written as assuming that the DDs between categories vary according to the distance between them. However, as pointed out by Valet et al. [12] the DDs between adjacent categories are supposed to be constant which can be a limiting a priori hypothesis, since it assumes that the categories of the scale are regularly spaced in terms of distinguishabilities; a rather satisfying property for an ORS. They proposed loglinear nonuniform association (NUA) models to take into account the variations of the DDs between both distant and adjacent categories of an ORS. For ORS with I ≥ 3, NUA models are defined by:
For this model, DDs are written as:
illustrating the possible DDs variations between categories, even between adjacent ones. NUA models are a generalization of UA models. Indeed, UA model is a particular case of a NUA model where parameters β_{k}, _{k+1 }are all equal (do not depend on k). Comparison of loglikelihood of data when using UA and NUA models allows us to test DDs homogeneity between adjacent categories and can provide useful information on scale structure. See Valet et al. [12,16] for a complete description of the NUA models and the possible patterns of distinguishability that can be tested.
Power estimation of tests in NUA models
To investigate the ability of NUA models to detect heterogeneities within the DDs between adjacent categories, a simple method was proposed to simulate ordered contingency tables resulting from the use of ORS having different patterns of distinguishability between their adjacent categories. Hereafter, tests were defined for a null hypothesis H_{0 }corresponding to the UA model defined by equation (2), and alternative hypotheses H_{1 }corresponding to NUA models defined by equation (3). Different scenarios of DDs heterogeneity were proposed to illustrate different typical scale structures. In all situations, marginal homogeneity between readers was assumed, which can be expressed as: .
Simulation of I × I contingency tables from the NUA models
The total sample size N was fixed, but the row and column totals were not. Counts n_{ij }were drawn from a full multinomial distribution M(π_{ij},N). In order to simulate different patterns of DDs heterogeneity between adjacent categories, theoretical probabilities π_{ij }were defined, using equation (3), as a function of the parameters of the NUA model:
When N and the association parameters β_{k, k+1 }(k = 1,..., I  1) are fixed, it is obvious that probabilities π_{ij }only depend on the unknown parameters μ and λ_{i }(i = 1,..., I). These I + 1 unknown parameters can be defined as the solutions of the following nonlinear system of I + 1 equations:
The first set of equations of the system defined by (6) allows us to control the marginal probabilities distribution during simulations, i.e. to control marginal probabilities . (upperscript "S" stands for simulations). The second condition of the system ensures that μ remains the overall effect [18]. As the number of equation is equal to the number of unknown parameters, the system can be easily solved using classical algorithm that can find roots of nonlinear systems, as the wellknown NewtonKrylov method for example [19,20]. However, in this paper, a new method proposed by Lacruz et al. [21] was used. This "nonmonotone spectral residual" method can find roots of nonlinear systems, by working without gradient information and it was shown to be competitive and frequently better than usual algorithms.
Many different scenarios of distinguishability patterns can be simulated, using different sets of {β_{k,k+1}; k = 1,..., I  1} in the NUA model. Suppose we aim to test all possible patterns of distinguishability, we will have to compare the null UA model (all β_{k, k+1 }are equal) and NUA models with all possible combinations of association parameters, i.e. to test all possible equalities between association parameters. For example, testing equality of exactly B (B = 2,..., I  1) association parameters in a NUA model with I  1 association parameters would already yield to comparisons. However, our aim was not to simulate exhaustively all possible patterns of distinguishability but credible patterns corresponding to typical scale structures in inter or intraobserver variation study. Therefore, as defined in Valet et al. [12] only combinations of "symmetric" and "close" association parameters were considered, that is to say NUA models where equality of some symmetric and close association parameters was assumed, respectively.
Definition of alternative hypotheses
For simplicity, we will consider hereafter contingency tables resulting from the use of ORS with I = 5 categories. The generalization to I × I contingency table is obvious. To exemplify our simulation scenarios, examples of the different values of association parameters that can be simulated in the case of a 5 × 5 contingency table, were described in table 1.
Table 1. Examples of association parameters and distinguishability patterns between adjacent categories from NUA models in a 5 × 5 contingency table
From the UA model where all association parameters are equal (H_{0 }hypothesis), a different value just for one association parameter ( hypotheses) can be used, to account for a scale defect between two categories only (categories are regularly spaced along the scale in terms of distinguishabilities, except two). Equal values for symmetric (for instance it is easier to distinguish extreme categories than to distinguish intermediate categories) or close (for instance it is easier to distinguish lower categories on the scale than upper categories) association parameters can also be used as described by hypotheses . Finally, taking different values for all association parameters ( hypothesis) illustrates an ORS where all categories are irregularly spaced in terms of distinguishabilities.
Distribution of marginal probabilities
In addition to the different sets of distinguishabilities values, i.e. different sets {β_{k,k+1}; k = 1,..., 4} illustrating the different alternative hypotheses that can be tested, different sets of marginal probabilities were assumed for each alternative hypothesis, to investigate the possible effects of marginal distribution heterogeneity within reader on NUA models' ability to detect significant DDs heterogeneities. These distributions were chosen in order to illustrate different realistic marginal distributions that can be observed in contingency table resulting from the classification of objects on an ORS. These different sets of marginal probabilities are described in table 2. The first set corresponds to homogeneous distribution of marginal probabilities. Then, the next three sets corresponds to homogeneous distributions except for one category with a low prevalence. The fourth and the fifth sets corresponds to homogeneous distributions except for two extreme or intermediate categories with low prevalences. The last set corresponds to an heterogeneous marginal distribution.
Table 2. Sets of marginal theoretical probabilities in a 5 × 5 contingency table used in our simulations
Power and Type I error estimation
For each specific set of {β_{k, k+1}; k = 1,..., 4} and , parameters μ and λ_{i }were calculated using the nonlinear system defined by (6). Probabilities π_{ij }of the multinomial distribution were calculated from equation (5), using the specific set of {β_{k, k+1}; k = 1,..., 4} and the previously calculated values of μ and λ_{i}. Then, 10000 simulations of 5 × 5 contingency tables summarizing classifications of N objects were drawn. The same null hypothesis of equal DDs between all adjacent categories was used. For this null hypothesis, a common value β_{1,2 }= β_{2,3 }= β_{3,4 }= β_{4,5 }= log(3) was chosen, corresponding to similar association between adjacent ratings (τ_{1,2 }= τ_{2,3 }= τ_{3,4 }= τ_{4,5 }= 3) and hence similar DDs between all adjacent categories. To account for different null hypotheses, we also proposed a common value of β_{1,2 }= β_{2,3 }= β_{3,4 }= β_{4,5 }log(2) and β_{1,2 }= β_{2,3 }= β_{3,4 }= β_{4,5 }= log(4). For each simulation, the loglikelihood of UA model (H_{0}) and NUA models defined by H_{1 }were calculated. As proposed by several authors [12,18], the G^{2 }likelihood ratiostatistic was used to compare these two models. Indeed, we used the difference statistics , which are chisquared distributed, with Δdf = df_{UA } df_{NUA }degrees of freedom. For the different tests corresponding to hypotheses , and , differences Δdf were equal to 1, 1 and 3, respectively. For each scenario, power was estimated as the proportion of significant NUA models when applied on contincency tables simulated under the same alternative hypothesis. Type one error α was estimated as the proportion of significant NUA models when applied on contingency tables simulated under the null hypothesis.
Results
All simulations and power estimations were performed using R software [22]. Association parameters were equal to log(3) under the null hypothesis (i.e. OR equal to 3) and for each alternative hypothesis, the values K of the tested OR ranged from 1 to 16, which corresponds to association parameters ranging from log(1) = 0, to log(16) = 2.77. Thus, for a specific alternative hypothesis, each specific set of association parameters {β_{k, k+1}; k = 1,..., 4} contained some fixed parameters equal to log(3) depicting the null hypothesis, and some varying parameters ranging from 0 to 2.77 depicting the alternative hypotheses. Simulations results were firstly displayed on Figure 1, illustrating for each simulated scenario, the power estimates of tests with alternative hypotheses corresponding to the different NUA models tested. In others words, this figure represents the probability of finding significant heterogeneities within the DDs between adjacent categories, according to the total sample size N, three different alternative hypotheses, and for different values K of tested OR. Left panel (Figure 1, examples a. to c.) corresponds to simulated scenarios with homogeneous marginal distributions within rater, whereas right panel (Figure 1, examples d. to f.) corresponds to simulated scenarios with three different sets of heterogeneous marginal distributions. We can observe that power estimates were constantly lower in scenarios with heterogeneous marginal distributions (right panel) as compared to those with homogeneous marginal distributions (left panel). In some cases, influence of marginal distributions heterogeneity was even drastic and strongly penalized NUA models ability in detecting significant heterogeneities within DDs between adjacent categories (Figure 1, example d.). For total sample sizes of N ≤ 100, we can also note that none of the simulated scenarios provided power estimates greater than 80%. Conversely, except for example given in Figure 1, example d., power estimates were greater than 80% for tested OR K ≥ 12, for all the tested hypotheses. Then, power estimates were given in table 3. Like in Figure 1, this table shows power estimates as a function of N, the three different alternative hypotheses, and the different values K of the tested OR. In a similar way, left panel corresponds to simulated scenarios with homogeneous marginal distribution, whereas right panel corresponds to different situations of heterogeneity within marginal distributions. For example, from the null hypothesis that all OR are equal to 3, i.e. DDs between all adjacent categories equal to 2/3, the power estimates of test corresponding to i) an alternative given by : β_{1,2 }≠ β_{2,3 }= β_{3,4 }= β_{4,5}, ii) an homogeneous marginal distribution, and iii) a total sample size equal to N = 250, are greater than 80% for OR greater or equal to 10. In others words, for N = 250, NUA models are able to detect with a probability greater than 80%, DD between adjacent categories 1 and 2, greater than 11/10=.90. For the left panel of this table and for the hypothesis of a different DD between the first two adjacent categories as compared to the others, NUA models are able to detect with a probability greater than 80%: a null DD or DDs greater than .92 for N ≥ 200, and DDs greater than .94 for N ≥ 150. In a similar way, for N = 200, NUA models are able to detect different DD between close and symmetric adjacent categories ( and , respectively) with a probability greater than 80% for null DD or DDs greater than .90.
Figure 1. Power estimates of tests with alternative hypotheses given by : β_{1,2 }≠ β_{2,3 }= β_{3,4 }= β_{4,5 }= log(3), : β_{1,2 }= β_{2,3 }≠ β_{3,4 }= β_{4,5 }= log(3), : β_{1,2 }= β_{4,5 }≠ β_{2,3 }= β_{3,4 }= log(3) for (a, d), (b, e) and (c, f) respectively. Marginal probabilities are given by .
It is clear that table 3 does not provide power estimates for all possible values of association parameters tested and hence for all decimal values between K = 0 and K = 2.77. However, interpolation of power estimate for a specific value of association parameter is straightforward. From table 3 suppose for example that we want to calculate the required sample size for a common value β_{1,2 }= β_{2,3 }= 2.25. From power estimates corresponding to β_{1,2 }= β_{2,3 }= 2.20 (namely .32, .53, .69, .81 and .89) and those corresponding to β_{1,2 }= β_{2,3 }= 2.30 (namely .35, .57, .76, .87 and .92), we can interpolate those corresponding to 2.25 = 2.20 + (2.30  2.20)/2 as (0.32 + (0.35  0.32)/2,..., 0.89 + (.92  .89)/2. The corresponding new values are then equal to .34, .55, .73, .84 and .91 respectively for N equal to 50, 100, 150, 200 and 250. Then, for a probability equal to .80, that is to say between .73 (N = 150) and .84 (N = 200), the required sample size can be interpolated as N = 150 + (200  150)/C, where C can be calculated from the following equation: 0.80 = 0.73 + (0.84  0.73)/C. For this example N has to be greater than 182.61, that is to say greater or equal to 183.
Table 3. Power estimates of tests in a 5 × 5 table, as a function of N, with three different alternative hypotheseses , with homogeneous (left column) and heterogeneous (right column) marginal theoretical distributions described by . Estimates greater than 80% are in bold
In a similar way, tables 4 and 5 provided power estimates for the same three different alternative hypotheses, considering at this time the null hypotheses that all OR are equal to 2 and 4, respectively. These tables allow the reader to estimate power for different null hypotheses through interpolation. Supplementary tables were also proposed to account for 4 × 4 (Additional file 1: table S1) and 6 × 6 (Additional file 1: table S2) contingency tables. In addition, results for different alternative hypotheses as well as different scenarios and sample sizes can be easily provided on simple request to the authors.
Table 4. Power estimates of tests in a 5 × 5 table, as a function of N, with three different alternative hypotheseses , with homogeneous (left column) and heterogeneous (right column) marginal theoretical distributions described by . Estimates greater than 80% are in bold
Table 5. Power estimates of tests in a 5 × 5 table, as a function of N, with three different alternative hypotheseses , with homogeneous (left column) and heterogeneous (right column) marginal theoretical distributions described by . Estimates greater than 80% are in bold
Additional file 1. Power estimates of tests in a 4 × 4 table, as a function of N, with three different alternative hypotheseses , with homogeneous (left column) and heterogeneous (right column) marginal theoretical distributions described by {}. Estimates greater than 80% are in bold. This table provided in the case of 4 × 4 contingency tables, power estimates, with three different alternative hypotheses and considering homogeneous (left column) and heterogeneous (right column) marginal theoretical distributions.
Format: PDF Size: 50KB Download file
This file can be viewed with: Adobe Acrobat Reader
Discussion
Results given by Figure 1 andtables 3 to 5 highlighted the strong influence that marginal heterogeneity within reader may have on power estimates of tests in NUA models. Conversely, when assuming marginal homogeneity within reader, NUA models are able to detect, from a null hypothesis of a DD equal to 2/3 between all adjacent categories and for a reasonable value of N = 200, null DD (between two or three categories with a probability greater than 80%. For a fivelevel scale, with an equal DD of 2/3 between its adjacent categories, NUA models are hence able to detect two or more confusing categories with a satisfying power. In the same way, for N = 200, NUA models are able to detect with a good power two or more adjacent categories (close or symmetric) for which the DDs are greater or equal to .92.
In our simulations of contingency tables resulting from crossclassifications of the same objects twice on an ordinal rating scale, the assumption of marginal homogeneity between readers was assumed, which can be seen as a limiting constraint. However, as described by the authors [12,16], NUA models are based on the assumption that in agreement studies, high values of counts are expected on the diagonal of the contingency table, and on the parallels immediately over and below this diagonal, whereas low values of counts are expected in others parts of this contingency table. Thus defined, NUA models are suitable for contingency tables with marginal homogeneities and may not be adapted for contingency tables showing others patterns of marginal distribution. In addition, it should be noticed that such patterns of contingency tables usually show a baseline non null association between adjacent ratings, what may consolidate the choice of OR = 3 under the null hypothesis.
For each simulations, the algorithm of Lacruz et al. [21] was used to estimate parameters μ and λ_{i}. Like many others systems, this system of nonlinear equations appeared to be very sensitive to initial values. In order to handle this problem and to avoid local maximums, solutions μ and λ_{i }of each system associated to a specific value K of the tested OR were used as initial parameters of the following system with the next tested K value.
In this simulation study we presented three alternative hypotheses illustrating different patterns of distinguishability between adjacent categories. The first tested hypothesis (DD between categories 1 and 2 different from the others), the corresponding symmetric hypothesis (DD between categories 4 and 5 different from the others), and the last hypothesis (DDs between extreme adjacent categories different from the others) allow to detect significant differences between extreme adjacent categories (1 and 2, 4 and 5 or both) and others intermediate ones. This is a usual pattern in ordinal rating scales, as the first category often corresponds to "no intensity" and the last one often corresponds to the "highest intensity" of the measured phenomenon. These two extreme adjacent categories are more likely to be distinguishable than the others because they correspond to extreme situations. Finally, the second hypothesis (DDs between close adjacent categories from 1 to 3 different then the others) and the corresponding symmetric one (DDs between close adjacent categories from 3 to 5 different from the others) allow to detect higher or lower DDs between some close adjacent categories of the scale. This can also be a typical pattern corresponding for example to ordinal scale where some consecutive grades shows many similarities and may be hard to distinguish.
Conclusions
In this paper we proposed a new simple method based on simulations, to estimate power of tests in loglinear nonuniform association models. To this aim, we first presented a method to simulate contingency tables resulting from crossclassifications of the same objects, using ordinal rating scales having different patterns of distinguishability between their adjacent categories. Then, taking typical situations of scale structures, we proposed a table summarizing the main effects of sample size, alternative hypotheses and marginal distributions on power estimates for the detection of DDs heterogeneities within the scale structure. Results were given for three typical alternative hypotheses, and in the case of an 5 × 5 contingency tables.
In healthresearch assessment of disease severity or patients' well being are more and more performed using ordinal rating scales. One of the major component of an ordinal scale is category distinguishability between its adjacent categories. Using a simple method based on simulations, this paper provided some issues about how many objects has to be classified by two observers to be able to detect a given scale structure defect, what may be of prime interest to improve ordinal scale quality and then others assessments made using this scale.
Competing interests
The authors declare that they have no competing interests
Authors' contributions
FV and JYM developed the method, performed all statistical analyses and participated to article writing. FV and JYM read and approved the final manuscript.
Acknowledgements
The authors would like to thank Pr. Sylvie Chevret for her great interest and support of this work.
References

Biggs JT, Wylie LT, Ziegler VE: Validity of the Zung Selfrating Depression Scale.
British Journal of Psychiatry 1978, 132:381385. PubMed Abstract  Publisher Full Text

Goga JA, Hambacher WO: Psychologic and behavioral assessment of geriatric patients: a review.
Journal of the American Geriatrics Society 1977, 25:232237. PubMed Abstract

Endicott J, Spitzer RL, Fleis JL, Cohen J: The global assessment scale. A procedure for measuring overall severity of psychiatry disturbance.
Archives of General Psychiatry 1976, 33:766771. PubMed Abstract  Publisher Full Text

Mortimer AM: Symptom rating scales and outcome in schizophrenia.

Le T, Williams K, Senterman M, Hopkins L, Faught W, FungKeeFung M: Histopathologic assessment of chemotherapy effects in epithelial ovarian cancer patients treated with neoadjuvant chemotherapy and delayed primary surgical debulking.
Gynecologic Oncology 2007, 106:160163. PubMed Abstract  Publisher Full Text

Mahler DA, Ward J, Waterman LA, McCusker C, Zuwallack R, Baird JC: Patientreported dyspnea in COPD reliability and association with stage of disease.
Chest 2009, 136:14739. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Chevallier B, Roche H, Olivier JP, Chollet P, Hurteloup P: Inammatory breast cancer. Pilot study of intensive induction chemotherapy (FECHD) results in a high histologic response rate.
Journal of Clinical Oncology 1993, 16:223228. Publisher Full Text

Nurnberg HG, Hensley PL, Heiman JR, Croft HA, Debattista C, Paine S: Sildenafil treatment of women with antidepressantassociated sexual dysfunction: a randomized controlled trial.
Journal of the American Medical Association 2008, 300:395404. PubMed Abstract  Publisher Full Text

Kappos L, Freedman MS, Polman CH, Edan G, Hartung H, Miller DH, Montalban X, Barkhof F, Radu EW, Metzig C, Bauer L, Lanius V, Sandbrink R, Pohl C: Longterm effect of early treatment with interferon beta1b after a first clinical event suggestive of multiple sclerosis: 5year active treatment extension of the phase 3 BENEFIT trial.

Bowling A: Measuring Health: A review of Quality of Life Measurement Scales. Philadelphia: Open University Press, Inc; 1991.

McDowell I, Newell C: Measuring Health: A guide to Rating Scales and Questionnaires. New York: Open University Press, Inc; 1996.

Valet F, Guinot C, Mary JY: Loglinear nonuniform association models for agreement between two ratings on an ordinal scale.

Goodman LA: Simple models for the analysis of association in crossclassifications having ordered categories.
Journal of the American Statistical Association 1979, 74:537552. Publisher Full Text

Becker MP: Using association models to analyze agreement data: two examples.
Statistics in Medicine 1989, 8:11991207. PubMed Abstract  Publisher Full Text

Agresti A: A model for agreement between ratings on an ordinal scale.
Biometrics 1988, 44:539548. Publisher Full Text

Valet F, Guinot C, Ezzedine K, Mary JY: Quality assessment of ordinal scale reproducibility: loglinear models provided useful information on scale structure.
Journal of Clinicel Epidemiology 2008, 61:983990. Publisher Full Text

Darroch JN, McCloud PI: Category distinguishability and obersver agreement.
Australian Journal of Statistics 1986, 28:371388. Publisher Full Text

Agresti A: Categorical Data analysis. In Wiley series in probability and methematical statistics. New York: John Wiley and Sons; 2002.

Brown PN, Saas Y: Hybrid Krylov methods for nonlinear systems of equations.
SIAM Journal of Scientific Computing 1990, 11:450481. Publisher Full Text

Brown PN, Saas Y: Convergence theory of nonlinear NewtonHybrid Krylov algorithms.

Lacruz W, Martinez JM, Raydan M: Spectral residual method without gradient information for solving largescale nonlinear systems of equations.
Mathematics of Computation 2006, 75:14291448. Publisher Full Text

R Development Core Team: [http://www.Rproject.org] webcite
R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria; 2009.
[ISBN 3900051003]
Prepublication history
The prepublication history for this paper can be accessed here: