Academic Cardiology, University of Hull, Kingston-upon-Hull, UK

NAPP Pharmaceuticals Research Limited, Cambridge, UK

Medical Statistics Unit, University of Sheffield, Sheffield, UK

Division of Primary Care & Psychological Medicine, University of Hull, Kingston-upon-Hull, UK

Abstract

Background

Many medical specialities have reviewed the statistical content of their journals. To our knowledge this has not been done in general practice. Given the main role of a general practitioner as a diagnostician we thought it would be of interest to see whether the statistical methods reported reflect the diagnostic process.

Methods

Hand search of three UK journals of general practice namely the

Results

A wide variety of statistical techniques were used. The most common methods included t-tests and Chi-squared tests. There were few articles reporting likelihood ratios and other useful diagnostic methods. There was evidence that the journals with the more thorough statistical review process reported a more complex and wider variety of statistical techniques.

Conclusions

The

Background

"Diagnosis is the keystone of good medical practice"

General practitioners (GPs) are primarily diagnosticians

From a statistical viewpoint the binary decision making process has a lot of appeal. For example, the use of the naïve Bayes' discriminant function (and from it the derivation of likelihood ratios) is appropriate. Proponents of Bayes' argue for its simplicity and ease of interpretation

Many medical journals, both generalist

Methods

Three statisticians (MJC, ASR and GKA) (two of them holding Chartered status of the Royal Statistical Society) including one Professor, one Senior Lecturer and one Lecturer each reviewed one leading UK journal in general practice. The fourth author (NS) is a Primary Care Physician. The journals chosen were the

Classification of design methods (after Wang and Zhang, 1988) [19]

Design method

Case report

Cross-sectional survey

Retrospective study

Prospective study

Clinical trial

basic science study

Classification of statistical methods (after Emerson and Colditz, 1983) [10]

Category

Brief description

No statistical methods or descriptive statistics

No statistical content, or descriptive statistics only (e.g., percentages, means Standard deviations, standard errors, histograms

Contingency tables

Chi-square tests, Fisher's test, McNemar's test

Multiway tables

Mantel-Haenszel procedure, log-linear models

Epidemiological studies

Relative risk, odds ratio, log odds, measures of association, sensitivity, specificity

t-tests

One-sample, matched pair, and two sample t- tests

Pearson correlation

Classic product-moment correlation

Simple linear regression

Least-squares regression with one predictor and one response variable

Multiple regression

Includes polynomial regression and stepwise regression

Analysis of variance

Analysis of variance, analysis of covariance, and F-tests

Multiple comparisons

Procedures for handling multiple inferences on same data sets (e.g., Bonferroni techniques, Scheffe's contrasts, Duncan's multiple range procedures, Newmann-Keuls procedure)

Non-parametric tests

Sign test, Wilcoxon signed ranks test, Mann- Whitney test, Spearman's rho, Kendall's tau, test for trend

Life table

Actuarial life table, Kaplan-Meier estimates of survival

Regression for survival

Includes Cox regression and logistic regression

Other survival analysis

Breslow's Kruskal Wallis, log rank, Cox model for comparing survival

Adjustment & standardisation

Pertains to incidence rates and prevalence rates

Sensitivity analysis

Examines sensitivity of outcome to small changes in assumptions

Power

Loosely defined, includes use of the size of detectable (or useful) difference in determining sample size

Transformation

Use of data transformation (e.g., logs) often in regression

Cost-benefit analysis

The process of combining estimates of cost and health outcomes to compare policy alternatives

Other

Anything not fitting the above headings includes cluster analysis, discriminant analysis, and some mathematical modelling

The main study was preceded by a pilot phase in which a random sample of 10 articles was classified both by statistical content and study design by the three statisticians. Where there were differences of opinion, consensus was reached by discussion. We met once to discuss our classification system, and to iron out differences of opinion. One problem lay in how we actually classified study design. For example, one of use used the phrase 'cross-sectional survey' while another used the phrase 'questionnaire survey' when both meant the same in terms of study design. Another problem was that we missed some of the statistical techniques (where there were many) and this required much more careful reading of the articles when we carried out the main survey. We did not carry out a formal reliability study of the pilot phase but instead relied on our experiences both as statisticians, and as journal reviewers. Similarly we chose not to carry out a formal reliability analysis in the main study.

Results

The total number of articles reviewed over a one year period was as follows:

Study design

The most common design was that of a cross-sectional survey being found in 24.1%, 39.3% and 35.1% of articles in the

Design methods

BMJ

BJGP

Family Practice

Overall

Designs

n

(%)

n

(%)

n

(%)

n

(%)

Cross-sectional survey

19

(24.1)

57

(39.3)

31

(34.8)

107

(35.1)

Qualitative study

3

(3.8)

16

(11.0)

17

(21.0)

36

(11.8)

Cohort study

8

(10.1)

21

(14.5)

4

(4.9)

33

(10.8)

RCT

14

(17.7)

7

(4.8)

8

(9.9)

29

(9.5)

Reviews

4

(5.1)

8

(5.5)

2

(2.5)

14

(4.6)

Reliability/diagnostic

2

(2.5)

8

(5.5)

1

(1.2)

11

(3.6)

Case-control study

4

(5.1)

1

(0.7)

3

(3.7)

8

(2.6)

Cluster RCT

4

(5.1)

1

(0.7)

2

(2.3)

7

(2.3)

Other

21

(26.6)

26

(17.9)

13

(16.0)

60

(19.7)

Total articles

79

145

81

305

Note

RCT = randomised controlled trial.

Proportion of papers ranked by a qualitative design

Proportion of papers ranked by a qualitative design

Statistical methods

The range of statistical methods reported can be seen in Table

Statistical methods

BMJ

BJGP

Family Practice

Overall

Methods

n

(%)

n

(%)

n

(%)

n

(%)

No statistics or simple summaries

23

(29.1)

47

(32.4)

33

(40.7)

103

(33.8)

Chi-squared tests

13

(16.5)

40

(27.6)

19

(23.5)

72

(23.6)

t-tests

7

(8.9)

22

(15.2)

17

(21.0)

46

(15.1)

Logistic regression

14

(17.7)

19

(13.1)

11

(13.6)

44

(14.4)

Nonparametric

11

(13.9)

24

(16.6)

4

(4.9)

39

(12.8)

Odds ratios/relative risks

11

(13.9)

13

(9.0)

14

(17.3)

38

(12.5)

Regression

9

(11.4)

10

(6.9)

11

(13.6)

30

(9.8)

Sample size/power

6

(7.6)

17

(11.7)

3

(3.7)

26

(8.5)

Summaries with CIs

9

(11.4)

3

(2.1)

6

(7.4)

18

(5.9)

Kappa

2

(2.5)

9

(6.2)

4

(4.9)

15

(4.9)

Sensitivity/specificity

4

(5.1)

10

(6.9)

1

(1.2)

15

(4.9)

Pearson correlation

2

(2.5)

6

(4.1)

6

(7.4)

14

(4.6)

Multiple comparisons

2

(2.5)

4

(2.8)

4

(4.9)

10

(3.3)

ANOVA

5

(6.3)

4

(2.8)

9

(3.0)

Mantel-Haenszel

1

(1.3)

5

(3.4)

2

(2.5)

8

(2.6)

Random effects models

4

(5.1)

4

(2.8)

8

(2.6)

Cronbach's alpha

1

(1.3)

5

(3.4)

1

(1.2)

7

(2.3)

Fisher's exact test

7

(4.8)

7

(2.3)

Likelihood ratio

3

(3.8)

3

(2.1)

6

(2.0)

Survival analysis

6

(7.6)

6

(2.0)

Other

4

(5.1)

37

(25.2)

10

(12.3)

51

(16.7)

Total articles

79

145

81

305

Notes

CIs = confidence intervals.

ANOVA = analysis of variance.

One-third of all articles reported no statistics or simple summaries (for example, mean, median, percentage, standard deviation, interquartile range). No journal article with a qualitative design had any statistical content.

A large number of articles reported other statistical methods, in particular the

Table

Ranking of statistical techniques

BMJ

BJGP

Family Practice

Methods

Rank

Rank

Rank

Chi-squared tests

2

1

1

t-tests

7

3

2

Logistic regression

1

4

4.5

Nonparametric

3.5

2

9

Odds ratios/relative risks

3.5

6

3

Regression

5.5

7.5

4.5

Sample size/power

8.5

5

11

Summaries with CIs

5.5

17

6.5

Kappa

15

9

9

Sensitivity/specificity

11.5

7.5

13.5

Pearson correlation

15

11

6.5

Multiple comparisons

15

15

9

ANOVA

10

15

Mantel-Haenszel

17.5

12.5

12

Random effects models

11.5

15

Cronbach's alpha

17.5

12.5

13.5

Fisher's exact test

10

Likelihood ratio

13

17

Survival analysis

8.5

Notes

Excluding other methods and no statistics/simple summaries.

CI = confidence interval.

ANOVA = analysis of variance.

Discussion

Two-thirds of all journal articles relied on some type of statistical analysis beyond descriptive statistics (Table

Although these three journals publish a large proportion of the research in general practice within the UK, they by no means represent 100% of it. To look at this further we examined the year 2000 and undertook a MEDLINE search using the key indexing phrase 'General Practice'. We found over 800 articles in a diversity of journals. Articles were published in the fields of rheumatology, medical ethics, obstetrics, public health, clinical pharmacology, clinical neurology and telemedicine to name but a few.

We chose to look at the year 2000. Would our results be different had we selected a different year? The published literature suggests otherwise. In a 20 year old study, Emerson and Colditz

Now let us turn to study design. The gold standard research design is considered to be the randomised controlled trial (RCT). It has been acknowledged that carrying out RCTs in general practice are difficult

What are the issues here? Are they really that different from secondary care? A recent publication posed the question 'What do residents really need to know about statistics?'

Conclusions

For all three journals there was a dearth of articles reflecting the diagnostic process. Why is this? It has already said that diagnosis is the Achilles Heel of GPs

Competing interests

The author(s) declare that they have no competing interests.

Authors' contributions

Three authors (ASR, GKA and MJC) carried out the literature review while all four authors contributed to the writing.

Acknowledgements

We wish to thank the referees for their constructive comments.

Pre-publication history

The pre-publication history for this paper can be accessed here: