Abstract
Background
The therapeutic efficacy of an intervention is often assessed in clinical trials by scales measuring multiple diverse activities that are added to produce a cumulative global score. Medical communities and health care systems subsequently use these data to calculate pooled effect sizes to compare treatments. This is done because major doubt has been cast over the clinical relevance of statistically significant findings relying on p values with the potential to report chance findings. Hence in an aim to overcome this pooling the results of clinical studies into a metaanalyses with a statistical calculus has been assumed to be a more definitive way of deciding of efficacy.
Methods
We simulate the therapeutic effects as measured with additive scales in patient cohorts with different disease severity and assess the limitations of an effect size calculation of additive scales which are proven mathematically.
Results
We demonstrate that the major problem, which cannot be overcome by current numerical methods, is the complex nature and neurobiological foundation of clinical psychiatric endpoints in particular and additive scales in general. This is particularly relevant for endpoints used in dementia research. 'Cognition' is composed of functions such as memory, attention, orientation and many more. These individual functions decline in varied and nonlinear ways. Here we demonstrate that with progressive diseases cumulative values from multidimensional scales are subject to distortion by the limitations of the additive scale. The nonlinearity of the decline of function impedes the calculation of effect sizes based on cumulative values from these multidimensional scales.
Conclusions
Statistical analysis needs to be guided by boundaries of the biological condition. Alternatively, we suggest a different approach avoiding the error imposed by overanalysis of cumulative global scores from additive scales.
Keywords:
dementia; neurodegeneration; clinical studies; metaanalysis; effect sizes; Cohen's dBackground
Analysis of treatment efficacy is warranted to guarantee the quality of medical treatment and effective spending of resources. Across diseases, metaanalyses are assumed to be one the major tools to achieve this [14]. Metaanalyses are performed to come to an overall conclusion on clinical studies with different numerical results or using different assessment methods. One critical step in performing metaanalyses is to calculate the effect sizes for the studies to be included in the metaanalysis [5].
Degenerative diseases are of long duration and the diversity of their symptoms pose methodological difficulties not known in other fields of medicine: symptoms vary over time, fluctuate for random reasons, and may be replaced by new and different ones. To illustrate the reasoning on whether effect sizes and metaanalyses are suited to resolve the ambiguity of clinical study results in degenerative disease one of the most prevalent degenerative diseases, Alzheimer's disease (AD), will be used.
AD is the most frequent cause of dementia in old age and typifies the variability in clinical presentation and symptom changes over time that occurs in a degenerative disease. At onset of AD the medial temporal lobe is affected [6]. This results in the episodic memory deficit wihich is an early clinical hallmark of the disease [7]. As the disease spreads, other brain regions such as the frontal and parietal cortex are affected as well. The parietal cortex mediates activities such as spatial orientation and visuospatial functions [8,9]; the frontal cortex mediates executive functions, planning, attention, and working memory [1012]. Spread of AD beyond the temporal lobe thus is characterized in functional terms by accruing deficits of spatial orientation, attention and executive functions as well as working memory and language [7]. This affliction of different brain regions and functions can be visualized using advanced imaging methods [1315]. Despite an overall progress, symptoms may also fluctuate over the course of progressing dementia for random reasons. Apathy may turn to agitation which may disappear and followed by apathy, again. Regardless of this complexity, effect size calculation and metaanalyses of different studies use the addition of scores from many disparate functions to provide a global score for problems like 'cognition', 'behavior', or 'activities of daily living'. 'Cognition' comprises a multitude of activities such as episodic or working memory, attention, calculation, cognitive flexibility, praxis; 'behavior' comprises affect and emotion, delusion, agitation, irritability, and 'activities of daily living' comprise a wide variety of tasks for which the performance not only depends on the actual capabilities of the patient but also on her or his prior habits. Over the whole course of the disease, 'cognition' or 'behaviour' may be appropriate to assess overall dementia but over the time frame of clinical studies, usually one to two years, individual cognitive functions need to be focused on as the disease process over such short time spans is confined to specific functions and specific regions rather than the whole brain. At present, however, and for the last 30 years, clinical studies in AD have used global scales, i.e. multidimensional scales, to appraise the efficacy of interventions using instruments such as the Alzheimer's Disease Assessment Scale (ADAS) [16], the MiniMentalStatus Examination (MMSE) [17], the Severe Impairment Battery (SIB) [18], the Neuropsychiatric Inventory (NPI) [19], the Katz activities of daily living scale (KatzADL) [20] amongst others.
Physicians and statisticians not well acquainted with the administration of neuropsychological tests neglect the impact of test difficulty on neurobiological associations. Task difficulty has a profound impact on the neural substrates engaged to solve the task. It was shown recently, that task difficulty is associated with recruitment of different neural patterns even in healthy subjects [21]. Thus, despite being similar activities, two tasks may rely on the integrity of different brain areas if the tasks vary in difficulty. Clearly then, the likelihood of maintaining performance on a specific task being measured with a particular instrument is dependent on disease severity and on time since diagnosis. The task may rely on different areas of the brain being recruited as degeneration reduces the relative amount of input from areas normally engaged in that function and showing a nonlinear decline in dementia patients [22,23].
Multidemensional clinical scales combine different tasks, i.e. different activities, to assess overall severity of brain dysfunction. The cumulative score for these multidimensional scales results from summation of subscores representing specific activities. The relative contribution of the subscores to the total score, however, is variable, as is the task difficulty to assess specific activities in the different scales (e.g. the MMSE has a total score of 30 and scores 3 points for the recall of three words on single presentation and that task which is preserved till very late in the disease carries the same weight as the three points that could be obtained from recalling those words 5 minutes later a task that is very often one of the earliest signs of impairment, the ADAScog asks for recall of ten words on threefold presentation of the test and together with other memory items the function memory is represented with 27 points out of 70).
It was our goal to address the impact of nonlinearity of disease progression and construction of multidimensional scales on the analysis of these additive global scales.
Methods
Basic model for the representation of function
Modeling the decline of function needs to reflect that tasks that are easy show a ceiling effect in assessment in early disease (i.e. the task is so easy or the underlying brain circuits are so insensitive to the disease process that the score does not decline over the initial time of the degenerative process) and in the later stages a floor effect (i.e. the task is so difficult or the underlying brain circuits are so severely affected from the disease process that the score is not sensitive enough to pick up further decline). Such a pattern was demonstrated for the items of the MiniMentalStatus Examination [23,24], repeating of words is task with an early ceiling effect and delayed recall of memorized words is a task with an early floor effect. Accordingly we used an inverse exponential rule for modeling the decline of function with progressing disease: , where i = 1, 2, t_{min }≤ t ≤ t_{max}, c_{i }< 0.
Different f_{i }represent different symptoms (e.g. memory, praxis, and so forth) declining over time according to parameters a_{i}, b_{i}, and c_{i}, accessible by empirical studies, and t indicating time. Qualitatively, the arguments outlined below are also valid for various other functions than the inverse exponential function.
Results
Vulnerability and difficulty
Two examples for the decline of performance over time using the basic model are shown in Figure 1.
Figure 1. Selective vulnerability and task difficulty. , i = 1, 2, t_{min }≤ t ≤ t_{max},.For the orange curve the parameters of the formula in Figure 1 are: a_{1 }= b_{1 }= 1, c_{1 }= 1/6. For the green curve the parameters are: a_{2 }= b_{2 }= 1, c_{2 }= 1/20. The orange curve represents a symptom with a ceiling effect at the beginning of clinical disease (e.g. praxia in Alzheimer's disease), the green curve represents a symptom with a floor effect early during progression of disease (e.g. episodic memory in Alzheimer's disease).
These curves can be interpreted in two different ways: I) function f_{1 }and f_{2 }represent different tasks, e.g. memory and praxis. In this interpretation, f_{1 }represents an activity that early and rapidly declines with progression of disease (e.g. episodic memory in patients with Alzheimer's disease). The function f_{2 }represents an activity that is upheld early during progression of disease with decline only occurring later (e.g. praxis in patients with Alzheimer's disease). Within this framework the neurobiological reason for the distinct time course of decline of function is selective vulnerability of brain regions. II) Alternatively, it may be assumed that the two curves represent the same task (e.g. spatial orientation). With this interpretation f_{1 }represents measurement of the task with an instrument without a ceiling effect but with an early floor effect (e.g. spatial orientation in an unknown environment in patients with Alzheimer's Disease). The function f_{2 }in this interpretation represents an instrument with an early ceiling effect and a late floor effect (e.g. spatial orientation in a known environment in patients with Alzheimer's Disease). In other words, f_{1 }has a high task difficulty (reflecting disease progression or design of instrument) and f_{2 }has a low task difficulty (reflecting disease progression or design of instrument).
Multidimensional additive scales
We now assume two scales (e.g. the MMSE and the ADAScog), one scale represented by F_{A }and another scale represented by F_{B}, both comprised of two tasks following functions f_{1 }(a task that declines early and rapidly over the course of disease) and f_{2 }(a task that declines later during the course of disease) but weighted differently in F_{A }and F_{B}:
F_{j }(t; a_{i}^{,}, b_{i}, c_{i}, λ_{ji}, i = 1, 2) = λ_{j1}f_{1}(t; a_{1}^{,}, b_{1}, c_{1}) + λ_{j2}f_{2 }(t; a_{2}^{,}, b_{2}, c_{2}) for j ∈ {A, B} where λ_{j1 }λ_{j2 }with which the functions f_{1 }and f_{2 }are weighted in the scales F_{A }and F_{B}, respectively. Without loss of generality: λ_{j1 }+ λ_{j2 }= for j ∈{A, B}.
To illustrate it: the cognitive part of the Alzheimer's Disease Assessment Scale (ADAScog) weights 'memory' with 27 out of 70 points: (word recall (max. 10), word recognition (max. 12), remembering test instructions (max. 5)). The Severe Impairment Battery (SIB) weights 'memory' with a maximum of 14 out of 100 points. The Mini Mental State Examination weights 'memory' with 6 out of 30 points. In contrast, 'orientation' is reflected in these scales with a maximum of 8 out of 70, 6 out of 100, and 10 out of 30, respectively.
How combination of assessment of different tasks into one scale affects assessment of disease progression as measured with these scales is shown in Figure 2.
Figure 2. Composite Scales. Functions f_{1 }and f_{2 }as in Figure 1. Scale F_{A}: F_{A}(t; a_{i}, b_{i}, c_{i}, i = 1, 2) = 3/8 f_{1}(t; a_{1}, b_{1}, c_{1}) + 5/8 f_{2 }(t; a_{2}, b_{2}, c_{2}). Scale F_{B}: F_{B }(t; a_{i}, b_{i}, c_{i}, i = 1, 2) = 2/3 f_{1}(t; a_{1}, b_{1}, c_{1}) + 1/3 f_{2 }(t; a_{2}, b_{2}, c_{2}):. Hence, the scale F_{A }is dominated by function f2 and scale F_{B }is dominated by function f1. The graph shows normalized scores over time.
Treatment effects
We now assume treatment affects by scaling factors 1 + δ_{i}, i = 1, 2, such that reflecting a purely symptomatic treatment effect on the progression of the disease for the treated group is described as (1 +δ_{i }) f_{i }(t; a_{i}, b_{i}, c_{i}) for i = 1, 2, t_{min }≤ t ≤ t_{max }Comparison of effect sizes or calculation of a common effect size in a metaanalysis naturally has to assume timeindependence of the effect size  otherwise the result of bringing together results from multiple studies would strongly depend on how many studies with milder or more advanced severity of patients, respectively, are brought together in the analysis. The mathematical analysis below shows that a sufficient condition in the mathematical sense to achieve time independent effects is to assume that the standard deviation is proportional to the mean of the observed data. From a practical point of view this can be interpreted as a constant relative deviation. More precisely, Theorem 1 states that the effect size Cohen's d of both measurements is independent of the time of observation, i.e., d_{i}(t) ≡ d_{i }Hence, the necessary condition for applying for applying metaanalysis is satisfied. However, in general metaanalyses can also be performed with cumulative values of multidimensional scales and the question of timeindependent effects have to be answered again. For this consider the additive scales F_{j }(t, a_{i}, b_{i}, c_{i}, λ_{ji}, i = 1, 2) = λ_{j1 }f_{1}(t; a_{1}, b_{1}, c_{1}) + λ_{j2 }f_{2}(t; a_{2}b_{2}c_{2}) for j ∈ {A, B} introduced before. Timeindependence would follow if the effect sizes needs to be calculated in the intuitive way as d_{j}(t) = λ_{j1 }d_{1}(t) + λ_{j2 }d_{2}(t). "Unfortunately", mathematical analysis (see below for more details) yields in that the effects size is a function depending on the weights λ_{j1}, λ_{j2 }j ∈ {A, B} of the functions f_{1 }and f_{2 }in the composite scales F_{A }and F_{B}, the treatment effects δ_{1}, δ_{2}, and in contrast to the intuition in general on the functions f_{i}, i = 1, 2, and  most important  the time t (Figure 3).
Figure 3. Treatment effects. Functions f_{1 }and f_{2 }and Scales F_{A }and F_{B }as in Figure 2. A treatment effect of 30% is assumed for f_{1 }(Treatment 1) or f_{2 }(Treatment 2). Upper panel) Effects on Scale F_{A }at early and late time points. Lower panel) Effects on Scale F_{B }at early and late time points. The graph shows normalized scores over time. The graph shows that the size of the treatment effect depends on the scale that is used.
It is natural to ask, under which assumptions we can get rid of the general statement on timedependence and still can guarantee timeindependence for additive scales. The mathematical analyses shows that this is the case if we assume that over time the observed data are perfectly correlated with respect to the different scales and in addition if δ_{1 }= δ_{2 }(this means that the treatment effect is identical for both functions f_{i}, i = 1, 2, representing different cognitive functions) or λ_{i }= 0, i ∈ {1, 2} The latter assumption means that function of interest is no longer multidimensional. Whether these assumption are either realistic or of relevant interest has to be decided in a preprocessing step.
However, in order to be able to calculate the timedependent scaling factor in the general case, this would require to know the treatment effect on individual functions with given task difficulty and the exact weights of the individual functions in the composite scales as well as the timedependency of the individual functions.
For example: a treatment effect of 30% improvement in function f_{1 }or function f_{2 }yields quite different effect sizes for early and late patients as assessed with scales F_{A }or F_{B }with results between 0.4624 and 0.6039 (Table 1).
Table 1. Calculation of effect sizes (Cohen's d) for early and late treatment as assessed with scale F_{A }and F_{B}.
Inductive mathematical proof
If we assume the average progression of a disease with regard to two instruments within some specified period of time can be described by , i = 1, 2, t_{min }≤t ≤ t_{max}, and that for any time t the underlying distribution of the random variable X_{i}(t) is a normal one with mean μ_{i}(t): = f_{i}(t).
For its standard deviation σ_{i}(t) we assume that always a percentage of 1 α of the distribution has a relative deviation from the mean from at most β percent. To be more precise, if z_{α/2 }denotes the (1α/2) quantile of the standard normal distribution, then σ_{i}(t) can be determined by the equations
whence for any time t we have.
While the above models the case of untreated patients the effect of a proper medication is expressed by scaling factors 1 + δ_{i}, i = 1, 2, i.e., on the average the progression of the disease for the treated group is described by (1 + δ_{i}) f_{i}(t; a_{i}, b_{i}, c_{i}), i = 1, 2, t_{min }≤ t ≤ t_{max}, where we assume like before that for any time t the random variable that describes the observed data at time t is again normally distributed with mean (1 + δ_{i}) μ_{i}(t) and, since the calculation of Cohen's d requires unchanged standard deviations, the same standard deviation like before, i.e.,
Accepting the assumptions made above we obtain the following result for the effect size "Cohen's d" d_{i}(t) of the treatment at time t for instrument i, i = 1,2.
Theorem 1
The effect size Cohen's d is independent of the time of observation, i.e., d_{i}(t) ≡ d_{i}.
Proof 1
From the definition of Cohen's d we straightforward obtain
Next consider the case that we are interested in the composed function
where λ_{1}, λ_{2 }are nonnegative scaling factors with, say, λ_{1 }+ λ_{2 }= 1. From an intuitive point of view we expect
for the effect size d(t) of the composed scale. And in case our intuition is correct, timeindependence as a desirable prerequisite for metaanalysis on, say, additive scales would immediately follow.
To compute d(t) for f(t; a_{i}, b_{i}, c_{i}, λ_{i}, i = 1, 2) we have to consider the random variable X(t) = λ_{1}X_{1}(t) + λ_{2}X_{2}(t) for untreated patients and for treated patients. Obviously, both variables are normally distributed with mean μ(t) = λ_{1}μ_{1}(t) + λ_{2}μ_{2}(t) and μ^{δ }(t) = λ_{1}(1 + δ_{1})μ_{1}(t) + λ_{2 }(1 + δ_{2})μ_{2}(t) respectively. For the variance σ^{2}(t) of X(t) and hence by assumption also of X^{δ}(t), we have the basic formula
where cor (X_{1}(t), X_{2}(t)) denotes the correlation of X_{1}(t), and X_{2 }(t),.
In the general case, i.e. without any restrictions on the correlation we obtain timedependence on the effect size d(t) of the composed scale. To be more precise, we have
To become more specific and to answer the question, whether timeindependence can be guaranteed also for composed scales under special assumptions we consider as a simple example the case cor (X_{1}(t), X_{2}(t)) = 1 This assumption yields
hence σ(t) = (λ_{1}σ_{1}(t) + λ_{2}σ_{2}(t)) and we can calculate Cohen's d: Using we finally obtain, which is in general still not independent of the time t.
In order to further analyze the dependence of the "composed Cohen's d" on the involved parameters we rewrite its formula. Under the assumption on standard deviations and correlation made above we obtain for the effect size:
Theorem 2
Proof 2
We calculate
From a theoretical point of view we can now observe the following:
1) If δ_{1 }= δ_{2}, then Cohen's d of the composed measure is independent of the time and in particular equals the weighted sum of the effect sizes d_{1 }and d_{2}, i.e.,
2) If λ_{i }= 0, i ∈ {1, 2}, then Cohen's d of the composed measure is independent of the time, to be more precise . (Actually this reflects that the choice of parameter implies that the function of interest is no longer a composed one.)
The second observation straightforward leads to the question whether the choices of λ_{i }= 0, i ∈{1, 2} are the extreme ones concerning d(t) over the domain D: = {λ: = (λ_{1}, λ_{2})λ_{1}, λ_{2 }≥ 0; λ_{1 }+ λ_{2 }= 1}?
Theorem 3
Proof 3
Without loss of generality assume that δ_{1 }≤ δ_{2}. Then it follows on the one side
and on the other side
Note that we have always equality if δ_{1 }= δ_{2 }which reflects the first observation made above, hence scaling cannot change the effect size. However, if, say, δ_{1 }< δ_{2}, then Cohen's d can be changed by a factor of up to δ_{2}/δ_{1 }by choosing different scales.
Next let us consider the situation that either δ_{1 }= 0 or δ_{2 }= 0.
Corollary 1
Under the assumption made above on standard deviations and correlation we obtain for the effect size
Proof
First note that is equivalent to . Hence, using Theorem 2 we obtain
for i ∈ {1, 2}.
If δ_{1 }= 0 ≠ δ_{2 }we conclude
If δ_{1 }≠ 0 = δ_{2 }we conclude
Finally let us compare in the situations δ_{1 }= 0 or δ_{2 }= 0 the composed Cohen's d with the intuitive choice d(t) = λ_{i}d_{i}.
Corollary 2
Under the assumption made above on standard deviations and correlation and assuming μ_{1}(t) < μ_{2}(t) für t ∈ {t_{min}, t_{max}} we obtain for the effect size
Proof
Using Corollary 1 for the case δ_{1 }≠ 0 = δ_{2 }we obtain
And in the case δ_{1 }= 0 ≠ δ_{2 }we obtain
Discussion
Rather than drawing conclusions from clinical trials via the differences in the cumulative scores of clinical scales it has become a custom to calculate effect sizes. The intention being to allow comparison of the effect of treatments in the same indication but whilst using different instruments. Using metaanalytic procedures a pooled effect size then is calculated. Metaanalyses are assumed to be the tools to achieve an unbiased analysis of disease severity and the efficacy of treatments [14]. Metaanalyses thus are used to summarize results across studies and even across different indications. Considering the multitude of clinical trials and the multitude of treatments such methods are urgently needed and with certain study designs and endpoints this may be an appropriate procedure. It is one limitation of the present study that modulation of effect size calculation by instruments applied and disease stages analyzed applies only to additive scales. These, however, are used frequently in neurodegenerative disease and it is therefore necessary to be aware of the methodological boundary conditions for calculation of effect sizes for additive scales.
Simulation of decline of function in neurodegenerative disease with a nonlinear representation of function demonstrates that calculation of effect sizes for early and late patients is subject to distortion by differences in the vulnerability of brain tissue or task difficulty and scale construction, respectively. Effect sizes are not inert to disease progression and the instruments used to detect it and therefore do not replace experienced clinical assessment of disease impact and treatment effect. Metaanalyses must not pool effect sizes from clinical trials in patients with different severity of disease. Clearly the use of the same scales across the whole disease process is not possible for reasons of differences in task difficulty creating floor and ceiling effects.
It has already been reported that the ADAScog and its subscales provide maximum information at moderate levels of cognitive dysfunction [25,26]. Raw score differences toward the lower and higher ends of the scale corresponded to large differences in cognitive dysfunction, whereas raw score differences toward the middle of the scale corresponded to smaller differences [25]. In more severe stages of dementia the ADAScog loses its sensitivity of change so much that the SIB was developed to assess patients who are unable to complete tests such as the ADAScog [18]. However, use of different composite scales is not possible since the subscales are not scaled according to task difficulty, are not balanced across different neuropsychological functions, and are weighted differently in different composite scales. A recent posthoc analysis of published data is in good harmony with the conclusions from the simulation provided here and the mathematical analysis [27]. In that study [27] it was shown that effect size calculation is subject to an interaction of cognitive domain, disease severity, and instruments used for assessment.
In principle, these distortions by disease stage and treatments affecting different functions within a given scale could be measured and mathematical analysis (above and appendix) shows a way to estimate a scaling factor that needed to be introduced. Analysis of current shortcomings then needs to be extended. In the present model we only assume two functions representing two activities, which yields a scaling factor of up to δ_{2}/δ_{1 }(cf. above). Clinical scales such as the MMSE or the ADAScog are composed of a multitude of functions. When analyzing the ADAScog, for instance, at least four functions need to be considered: memory, orientation, language, and praxis. Therefore, in order to be able to estimate the relative scaling factors would require a very large population.
It has been suggested to call effect sizes of below 0.2 as 'small and above 0.5 as 'medium' [28]. The above analysis demonstrates that the naïve analysis of composite measures may bring about a false categorization of effect size. Effect size calculation of composite endpoints therefore cannot be used as a guideline for the judgment on therapeutic efficacy for neurobiological and statistical reasons. The numerical value of the analysis depends on the choice of the instrument and is subject to distortion by disease progression. Calculation of effect sizes, therefore, can not substitute for clinical assessment. Clinical expertise determines the choice of the instrument  the results therefore need to be interpreted with clinical expertise. Overall, statistical measures and metaanalyses of additive scales obfuscate, rather than clarify, the evidence on therapeutic efficacy in neurodegenerative disease.
In the past, clinical global assessments were the gold standard by which assessment scales were validated. In other words, scales were devised to act as a good proxy for clinical judgment which could be administered by less experienced clinicians. However, these scales clearly have great difficulties when extended over the range and time course of a degenerative disease. What may be a more satisfactory method of measuring change than combining many less than satisfactory study results would be to design a more sensitive way of capturing the clinical assessment. Clinical assessment uses parallel processing and multiple inputs which can account for variations in severity or even input of carepersons. Perhaps devising a more detailed global assessment with maybe 10  15 anchor points on a Likert scale that allows clinicians to provide a far more nuanced assessment than the present 7 (often then condensed to 5) point scale. For example it requires much greater evidence and confidence to move from minimal to major improvement than from no change to minimal improvement in most clinicians view and yet they represent similar degrees of improvement on the typical current global assessment scales. This tendency to conservative no change assessments caused by the lack of sensitivity of the scale may be why in the past the clinicians global assessment, whilst being the standard by which all patients in the real world and all other scales are assessed has not been regarded as a useful tool in clinical trials.
Conclusions
In the face of the clear lack of credibility in pooling effect size calculations on grouped and yet disparate studies for metaanalysis it may be time to put the clinical appraisal that has served for generations back where it belongs as cornerstone of our efficacy assessments and decision making about the utility of treatments in neurodegenerative diseases.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
MWR, DW, and HF raised the ideas and elaborated the medical content. AB performed the mathematical proof. All authors read and approved the final manuscript.
Acknowledgements
The research was performed without external funding.
References

Chalmers TC, Sacks H: Randomized clinical trials in surgery.
N Engl J Med 1979, 301:1182. PubMed Abstract

Chalmers TC: Metaanalysis in clinical medicine.
Trans Am Clin Climatol Assoc 1988, 99:144150. PubMed Abstract  PubMed Central Full Text

Lau J, Chalmers TC: The rational use of therapeutic drugs in the 21st century. Important lessons from cumulative metaanalyses of randomized control trials.
Int J Technol Assess Health Care 1995, 11:509522. PubMed Abstract  Publisher Full Text

Sacks HS, Berrier J, Reitman D, nconaBerk VA, Chalmers TC: Metaanalyses of randomized controlled trials.
N Engl J Med 1987, 316:450455. PubMed Abstract  Publisher Full Text

Field AP, Gillett R: How to do a metaanalysis.
Br J Math Stat Psychol 2010, 63:665694. PubMed Abstract  Publisher Full Text

Hyman BT, Van Horsen GW, Damasio AR, Barnes CL: Alzheimer's disease: cellspecific pathology isolates the hippocampal formation.
Science 1984, 225:11681170. PubMed Abstract  Publisher Full Text

Hodges JR: Memory in the dementias. In The Oxford Handbook of Memory. Edited by Tulving E, Craik FIM. Oxford, New York: Oxford University Press; 2000:441459.

Marshall JC, Fink GR: Spatial cognition: where we were and where we are.
Neuroimage 2001, 14:S2S7. PubMed Abstract  Publisher Full Text

Save E, Poucet B: Hippocampalparietal cortical interactions in spatial cognition.
Hippocampus 2000, 10:491499. PubMed Abstract  Publisher Full Text

Godefroy O, Cabaret M, PetitChenal V, Pruvo JP, Rousseaux M: Control functions of the frontal lobes. Modularity of the centralsupervisory system?
Cortex 1999, 35:120. PubMed Abstract  Publisher Full Text

Nagahama Y, Okada T, Katsumi Y, Hayashi T, Yamauchi H, Oyanagi C, et al.: Dissociable mechanisms of attentional control within the human prefrontal cortex.
Cereb Cortex 2001, 11:8592. PubMed Abstract  Publisher Full Text

Rowe JB, Toni I, Josephs O, Frackowiak RS, Passingham RE: The prefrontal cortex: response selection or maintenance within working memory?
Science 2000, 288:16561660. PubMed Abstract  Publisher Full Text

Gron G, Bittner D, Schmitz B, Wunderlich AP, Riepe MW: Subjective memory complaints: objective neural markers in patients with Alzheimer's disease and major depressive disorder.
Ann Neurol 2002, 51:491498. PubMed Abstract  Publisher Full Text

Gron G, Riepe MW: Neural basis for the cognitive continuum in episodic memory from health to Alzheimer disease.
Am J Geriatr Psychiatry 2004, 12:648652. PubMed Abstract

Bittner D, Gron G, Schirrmeister H, Reske SN, Riepe MW: [18F]FDGPET in patients with Alzheimer's disease: marker of disease spread.
Dement Geriatr Cogn Disord 2005, 19:2430. PubMed Abstract  Publisher Full Text

Rosen WG, Mohs RC, Davis KL: A new rating scale for Alzheimer's disease.
Am J Psychiatry 1984, 141:13561364. PubMed Abstract  Publisher Full Text

Folstein MF, Folstein SE, McHugh PR: "Minimental state". A practical method for grading the cognitive state of patients for the clinician.
J Psychiatr Res 1975, 12:189198. PubMed Abstract  Publisher Full Text

Saxton J, Swihart AA: Neuropsychological assessment of the severely impaired elderly patient.
Clin Geriatr Med 1989, 5:531543. PubMed Abstract

Cummings JL, Mega M, Gray K, RosenbergThompson S, Carusi DA, Gornbein J: The Neuropsychiatric Inventory: comprehensive assessment of psychopathology in dementia.
Neurology 1994, 44:23082314. PubMed Abstract

Katz S, Ford AB, Moskowitz RW, Jackson BA, Jaffe MW: Studies of illness in the aged. The index of adl: A standardized measure of biological and psychosocial function.
JAMA 1963, 185:914919. PubMed Abstract  Publisher Full Text

Ulrich M, Jonas C, Gron G: Functional compensation of increasing memory encoding demands in the hippocampus.
Neuroreport 2010, 21:5963. PubMed Abstract  Publisher Full Text

Mendiondo MS, Ashford JW, Kryscio RJ, Schmitt FA: Modelling mini mental state examination changes in Alzheimer's disease.
Stat Med 2000, 19:16071616. PubMed Abstract  Publisher Full Text

Ashford JW, Kolm P, Colliver JA, Bekian C, Hsu LN: Alzheimer patient evaluation and the minimental state: item characteristic curve analysis.

Ashford JW, Shan M, Butler S, Rajasekar A, Schmitt FA: Temporal quantification of Alzheimer's disease severity: 'time index' model.
Dementia 1995, 6:269280. PubMed Abstract

Benge JF, Balsis S, Geraci L, Massman PJ, Doody RS: How well do the ADAScog and its subscales measure cognitive dysfunction in Alzheimer's disease?
Dement Geriatr Cogn Disord 2009, 28:6369. PubMed Abstract  Publisher Full Text

Panisset M, Roudier M, Saxton J, Boller F: Severe impairment battery. A neuropsychological test for severely demented patients.
Arch Neurol 1994, 51:4145. PubMed Abstract  Publisher Full Text

Riepe MW, Janetzky W, Lemming OM: Measuring therapeutic efficacy in patients with Alzheimer's disease: role of instruments.
Dement Geriatr Cogn Disord 2011, 31:233238. PubMed Abstract  Publisher Full Text

Cohen J: Statistical power analysis for the behavioral sciences. 2nd edition. Lawrence Erlbaum Associates; 1988.
Prepublication history
The prepublication history for this paper can be accessed here: