Southampton Statistical Sciences Research Institute, Mathematics and Medicine, University of Southampton, Southampton SO17 1BJ, UK

Institute of Statistics, Graz University of Technology, Kopernikusgasse 24/III, 8010 Graz, Austria

, Landeszentrum Gesundheit Nordrhein-Westfalen / NRW Centre for Health, Westerfeldstr. 35/37, 33609 Bielefeld, Germany

Abstract

Background

Life expectancy is of increasing prime interest for a variety of reasons. In many countries, life expectancy is growing linearly, without any indication of reaching a limit. The state of North Rhine–Westphalia (NRW) in Germany with its 54 districts is considered here where the above mentioned growth in life expectancy is occurring as well. However, there is also empirical evidence that life expectancy is not growing linearly

Methods

To explore this situation further a likelihood-based cluster analysis is suggested and performed. The modelling uses a nonparametric mixture approach for the latent random effect. Maximum likelihood estimates are determined by means of the EM algorithm and the number of components in the mixture model are found on the basis of the Bayesian Information Criterion. Regions are classified into the mixture components (clusters) using the maximum posterior allocation rule.

Results

For the data analyzed here, 7 components are found with a spatial concentration of lower life expectancy levels in a centre of NRW, formerly an enormous conglomerate of heavy industry, still the most densely populated area with Gelsenkirchen having the lowest level of life expectancy growth for both genders. The paper offers some explanations for this fact including demographic and socio-economic sources.

Conclusions

This case study shows that life expectancy growth is widely linear, but it might occur on different levels.

Background

Life expectancy in Germany is increasing unbrokenly at linear rate. This corresponds to a world–wide trend – despite controversial statements (see also Oeppen and Vaupel

In a nutshell, the approach is as follows. For each of the 54 regions a straight line model _{
t
} = _{
t
} at year

Life expectancy for men resp women for 10 randomly selected regions; the lines are regression lines fitted separately for each of the 10 regions.

**Life expectancy for men resp women for 10 randomly selected regions; the lines are regression lines fitted separately for each of the 10 regions.**

Data

NRW is the most populous state of Germany, with four of the country’s ten largest cities. Its capital is Düsseldorf. The state consists of five provinces (Regierungsbezirke), until 2010 divided into 31 rural districts (Kreise) and 23 urban districts (kreisfreie Städte), forming the above mentioned total of 54 districts which is the basis of our analysis. The underlying dataset ‘LifeexpectancyNRW.xls’ consists of two sheets, separately aggregated according to gender, each with

–

–

–

–

Life expectancy is an important demographic indicator which is computed on basis of the life-table technique. A birth cohort is followed over time and, on the basis of the number of persons that died in every life year, mortality rates are determined which allow the computation of life expectancy. Life expectancy can be calculated conditional upon having reached any given age though it is typically considered from birth as done here. To provide timely life expectancy the current force of mortality (here for NRW) is applied to a hypothetical cohort and provides the data used in this study. Life expectancy computed in this way has to be interpreted as the expected life time for a newborn for the period in which the life table used was computed. For more details on life table techniques see Hinde

The Table

**Number**

**Name**

**Number**

**Name**

1

Düsseldorf

28

Bottrop

2

Duisburg

29

Gelsenkirchen

3

Essen

30

Münster

4

Krefeld

31

Borken

5

Mönchengladbach

32

Coesfeld

6

Mülheim a.d. Ruhr

33

Recklinghausen

7

Oberhausen

34

Steinfurt

8

Remscheid

35

Warendorf

9

Solingen

36

Bielefeld

10

Wuppertal

37

Gütersloh

11

Kleve

38

Herford

12

Mettmann

39

Höxter

13

Neuss

40

Lippe

14

Viersen

41

Minden-Lübbecke

15

Wesel

42

Paderborn

16

Aachen (city)

43

Bochum

17

Bonn

44

Dortmund

18

Köln

45

Hagen

19

Leverkusen

46

Hamm

20

Aachen (rural)

47

Herne

21

Düren

48

Ennepe-Ruhr-Kreis

22

Erftkreis

49

Hochsauerlandkreis

23

Euskirchen

50

Märkischer Kreis

24

Heinsberg

51

Olpe

25

Oberbergischer Kreis

52

Siegen-Wittgenstein

26

Rheinisch-Bergischer Kreis

53

Soest

27

Rhein-Sieg-Kreis

54

Unna

Life expectancy is linearly growing in all regions in NRW as Figure

Methods

Model and associated likelihoods

We assume that the life expectancy _{
i
t
} in region

and that within this component

where _{
i
} = (_{
i1}, …, _{
i
T
})^{
T
} is the ^{2} is the variance parameter of the meanzero normal random error _{
i
t
}.

We should point out that we assume here that repeated observations of life expectancy are independent for the 21 observation years _{
i
} (and ultimately to mixtures of multivariate normals). However, we prefer to remain in the spirit of random effects modelling where we assume that covariance structures are coped with by the introduction of random effects.

Since we do not observe component membership we only take the marginal distribution as a nonparametric mixture

where the _{
j
} represents the unknown weights of the components in the population.

Consequently, the observable mixture model likelihood is

which needs to be maximized in _{
j
}, _{
j
} for ^{2}. Note the special form of the likelihood in its hierarchical structure. Conditional upon component membership it assumes

Note that this form of random effects modelling is not uncommon for this situation (Arminger

Since the observed likelihood function is difficult to maximize in the parameters we consider the unobserved likelihood typical for mixture problems of this kind. Let _{
i
j
} denote the unobserved indicator informing about component membership. In other word, _{
i
j
} = 1 if the

showing that the likelihood can now be separately maximized in _{
j
}, for ^{2} on one hand, and _{
j
} for

The problem is well-posed in the sense that if ^{2} approaches 0 (see also Wang

We use the following definition for the BIC

where

where

Expectation-maximization (EM) algorithm

To estimate the parameters by maximum likelihood we will use the EM algorithm (Dempster, Laird, and Rubin

E-step

In the E-step the unobserved indicator variables _{
i
j
} are replaced by their expected values conditional upon the current parameter estimates and the data _{
i
t
}

These expected values can be easily computed using Bayes theorem as

and can be interpreted as the posterior probability that region _{
i
j
} ≥ 0 and

M-step

It is easy to see that the likelihood (5) is maximized for _{
j
} as

For the remainder we concentrate on

Setting the partial derivative

Similarly, setting all other partial derivatives to 0 we achieve

and

Here _{
i
j
}, _{
j
}, _{
j
}, ^{2} refer to the values of these parameters in the previous cycle of the EM algorithm.

The EM algorithm toggles between E- and M-step until convergence, say until

Initial values

We need to compute initial values for the variables _{
i
j
}, _{
j
}, _{
j
}, ^{2}. Only for this purpose we fit the following linear model:

for each region _{
i
}, _{
i
} and

Additionally we initialize _{
i
j
}, _{
j
}, _{
j
}, ^{2}. With these values we compute the (maximized) likelihood (4).

Results

Cluster structure

Table _{2} show the same behaviour but have the minimum at ^{2}. The choice of ^{2} in dependence of

**
J
**

**
β
**

**
σ
**

**
BIC
**

**
BIC
**

**Men**

**Women**

**Men**

**Women**

**Men**

**Women**

**Men**

**Women**

1

0.2560

0.1673

0.8969

0.5147

1503.98

1497.21

1513.12

1506.34

2

0.2560

0.1673

0.4407

0.2593

1499.24

1359.64

1514.46

1374.86

3

0.2560

0.1673

0.3079

0.2088

1455.63

1294.24

1476.94

1315.56

4

0.2560

0.1673

0.2651

0.1882

1434.90

1264.26

1462.30

1291.66

5

0.2560

0.1673

0.2308

0.1760

1396.11

1246.16

1429.60

1279.65

6

0.2560

0.1673

0.2114

0.1655

1369.02

1239.95

**1408.60**

1279.53

7

0.2560

0.1673

0.2045

0.1594

**1364.20**

1232.36

1409.87

**1278.02**

8

0.2560

0.1673

0.2014

0.1566

1365.88

**1229.81**

1417.63

1281.57

9

0.2560

0.1673

0.2014

0.1547

1373.86

1232.48

1431.70

1290.33

10

0.2560

0.1673

0.2012

0.1545

1381.61

1240.11

1445.55

1304.05

**
J
**

**
p
**

**
α
**

**Men**

**Women**

**Men**

**Women**

7

0.0556

0.0185

70.32

77.27

0.1841

0.1112

71.19

78.08

0.1529

0.2655

71.76

78.54

0.2711

0.1978

72.23

78.84

0.1700

0.1684

72.72

79.29

0.0922

0.2016

73.07

79.63

0.0740

0.0371

73.71

80.22

Maximum posteriori classification

Men

Since each _{
i
j
} describes the probability that region

The classification tables are given in Table _{
i
j
} (rounded to 2 digits after the decimal point). Note that in all cases the classification is unique in the sense that there is a high classification probability for a particular component. Now we are able to construct a graph wherein the datapoints are coloured by the different components where they belong to (Figure

**Region**

**Name**

**Class**

**
e
**

**
e
**

**
e
**

**
e
**

**
e
**

**
e
**

**
e
**

1

Düsseldorf

2

0

0.95

0.05

0

0

0

0

2

Duisburg

1

1

0

0

0

0

0

0

3

Essen

2

0

1

0

0

0

0

0

4

Krefeld

4

0

0

0

1

0

0

0

5

Mönchengladbach

2

0

1

0

0

0

0

0

6

Mülheim a.d. Ruhr

4

0

0

0.01

0.99

0

0

0

7

Oberhausen

1

1

0

0

0

0

0

0

8

Remscheid

2

0

1

0

0

0

0

0

9

Solingen

4

0

0

0

1

0

0

0

10

Wuppertal

3

0

0

1

0

0

0

0

11

Kleve

3

0

0

1

0

0

0

0

12

Mettmann

6

0

0

0

0

0.03

0.97

0

13

Neuss

6

0

0

0

0

0.01

0.99

0

14

Viersen

4

0

0

0

1

0

0

0

15

Wesel

4

0

0

0

1

0

0

0

16

Aachen (city)

6

0

0

0

0

0

1

0

17

Bonn

7

0

0

0

0

0

0

1

18

Köln

4

0

0

0.03

0.97

0

0

0

19

Leverkusen

5

0

0

0

0

1

0

0

20

Aachen (rural)

4

0

0

0.16

0.84

0

0

0

21

Düren

4

0

0

0

1

0

0

0

22

Erftkreis

5

0

0

0

0

1

0

0

23

Euskirchen

3

0

0

1

0

0

0

0

24

Heinsberg

4

0

0

0

1

0

0

0

25

Oberbergischer Kreis

4

0

0

0

1

0

0

0

26

Rheinisch-Bergischer Kreis

7

0

0

0

0

0

0

1

27

Rhein-Sieg-Kreis

7

0

0

0

0

0

0

1

28

Bottrop

2

0

1

0

0

0

0

0

29

Gelsenkirchen

1

1

0

0

0

0

0

0

30

Münster

7

0

0

0

0

0

0

1

31

Borken

4

0

0

0

0.96

0.04

0

0

32

Coesfeld

6

0

0

0

0

0.14

0.86

0

33

Recklinghausen

2

0

1

0

0

0

0

0

34

Steinfurt

5

0

0

0

0

1

0

0

35

Warendorf

5

0

0

0

0

0.96

0.04

0

36

Bielefeld

5

0

0

0

0

1

0

0

37

Gütersloh

6

0

0

0

0

0

1

0

38

Herford

5

0

0

0

0

1

0

0

39

Höxter

5

0

0

0

0

0.99

0.01

0

40

Lippe

5

0

0

0

0

0.92

0.08

0

41

Minden-Lübbecke

4

0

0

0.01

0.99

0

0

0

42

Paderborn

5

0

0

0

0

1

0

0

43

Bochum

2

0

1

0

0

0

0

0

44

Dortmund

2

0

1

0

0

0

0

0

45

Hagen

2

0

1

0

0

0

0

0

46

Hamm

3

0

0

1

0

0

0

0

47

Herne

2

0

1

0

0

0

0

0

48

Ennepe-Ruhr-Kreis

3

0

0

1

0

0

0

0

49

Hochsauerlandkreis

4

0

0

0

0.88

0.12

0

0

50

Märkischer Kreis

3

0

0

1

0

0

0

0

51

Olpe

4

0

0

0

1

0

0

0

52

Siegen-Wittgenstein

4

0

0

0

1

0

0

0

53

Soest

3

0

0

1

0

0

0

0

54

Unna

3

0

0

0.99

0.01

0

0

0

Life expectancy for men, coloured by the component membership.

**Life expectancy for men, coloured by the component membership.**

In addition to the data points we have included in Figure

Women

In Table _{
i
j
} for

**Region**

**Name**

**Class**

**
e
**

**
e
**

**
e
**

**
e
**

**
e
**

**
e
**

**
e
**

1

Düsseldorf

3

0

0

0.98

0.02

0

0

0

2

Duisburg

2

0

1

0

0

0

0

0

3

Essen

3

0

0.05

0.95

0

0

0

0

4

Krefeld

4

0

0

0.01

0.99

0

0

0

5

Mönchengladbach

2

0

1

0

0

0

0

0

6

Mülheim a.d. Ruhr

4

0

0

0

1

0

0

0

7

Oberhausen

2

0

1

0

0

0

0

0

8

Remscheid

3

0

0

1

0

0

0

0

9

Solingen

4

0

0

0

0.96

0.04

0

0

10

Wuppertal

4

0

0

0.12

0.88

0

0

0

11

Kleve

3

0

0

1

0

0

0

0

12

Mettmann

5

0

0

0

0

1

0

0

13

Neuss

5

0

0

0

0

1

0

0

14

Viersen

3

0

0

0.98

0.02

0

0

0

15

Wesel

4

0

0

0.13

0.87

0

0

0

16

Aachen (city)

5

0

0

0

0

1

0

0

17

Bonn

7

0

0

0

0

0

0

1

18

Köln

3

0

0

0.88

0.12

0

0

0

19

Leverkusen

5

0

0

0

0

1

0

0

20

Aachen (rural)

3

0

0

1

0

0

0

0

21

Düren

3

0

0

0.97

0.03

0

0

0

22

Erftkreis

3

0

0

0.83

0.17

0

0

0

23

Euskirchen

3

0

0

1

0

0

0

0

24

Heinsberg

3

0

0

0.79

0.21

0

0

0

25

Oberbergischer Kreis

4

0

0

0

1

0

0

0

26

Rheinisch-Bergischer Kreis

6

0

0

0

0

0

1

0

27

Rhein-Sieg-Kreis

5

0

0

0

0

0.74

0.26

0

28

Bottrop

3

0

0.01

0.99

0

0

0

0

29

Gelsenkirchen

1

1

0

0

0

0

0

0

30

Münster

7

0

0

0

0

0

0

1

31

Borken

5

0

0

0

0

0.98

0.02

0

32

Coesfeld

6

0

0

0

0

0

1

0

33

Recklinghausen

2

0

0.94

0.06

0

0

0

0

34

Steinfurt

6

0

0

0

0

0

1

0

35

Warendorf

6

0

0

0

0

0.44

0.56

0

36

Bielefeld

6

0

0

0

0

0

1

0

37

Gütersloh

6

0

0

0

0

0

1

0

38

Herford

6

0

0

0

0

0

1

0

39

Höxter

6

0

0

0

0

0.09

0.91

0

40

Lippe

6

0

0

0

0

0

1

0

41

Minden-Lübbecke

6

0

0

0

0

0.01

0.99

0

42

Paderborn

6

0

0

0

0

0.02

0.98

0

43

Bochum

3

0

0

1

0

0

0

0

44

Dortmund

2

0

1

0

0

0

0

0

45

Hagen

4

0

0

0

1

0

0

0

46

Hamm

4

0

0

0.27

0.73

0

0

0

47

Herne

2

0

1

0

0

0

0

0

48

Ennepe-Ruhr-Kreis

4

0

0

0

1

0

0

0

49

Hochsauerlandkreis

5

0

0

0

0

1

0

0

50

Märkischer Kreis

3

0

0

1

0

0

0

0

51

Olpe

5

0

0

0

0

0.84

0.16

0

52

Siegen-Wittgenstein

5

0

0

0

0.06

0.94

0

0

53

Soest

4

0

0

0.32

0.68

0

0

0

54

Unna

4

0

0

0.07

0.93

0

0

0

Life expectancy for women, coloured by component membership.

**Life expectancy for women, coloured by component membership.**

Geographical map of the classification of the 54 regions of NRW into the 7 components of life expectancy, 1 (red) low to 7 (green) high, men.

**Geographical map of the classification of the 54 regions of NRW into the 7 components of life expectancy, 1 (red) low to 7 (green) high, men.**

Geographical map of the classification of the 54 regions of NRW into the 7 components of life expectancy, 1 (red) low to 7 (green) high, women.

**Geographical map of the classification of the 54 regions of NRW into the 7 components of life expectancy, 1 (red) low to 7 (green) high, women.**

Explaining the cluster structure

The data contain also a variable characterizing each area as rural (= 0) or urban (= 1). The results can be summarized into the following cross-classified tables (Table

**1**

**2**

**3**

**4**

**5**

**6**

**7**

Rural

0

1

6

11

7

4

2

31

Urban

3

9

2

4

2

1

2

23

3

10

8

15

9

5

4

54

**1**

**2**

**3**

**4**

**5**

**6**

**7**

Rural

0

1

8

5

7

10

0

31

Urban

1

5

6

6

2

1

2

23

1

6

14

11

9

11

2

54

A simple chi-square test investigates the relation between these two categorical variables: classification using the performed cluster analysis and the binary variable rural/urban. We find for men: ^{2} = 18.4645 by 6 DF and p-value = 0.00517, which is highly significant. For women we find: ^{2} = 15.3361 by 6 DF and p-value = 0.00178, clearly significant.

We conclude the results section with a final analysis as follows. We have done a separate cluster analysis for men and women. For men, a particular region will be classified into a component, but for women this region might be classified into a different component. To provide a consistent analysis both classifications should be correlated. This is what the last graphic is about. Figure

Scatterplot of life-expectancy level ** i**,

**Scatterplot of life-expectancy level **
**for women against the life-expectancy level**
**for men of region **
**
i
**

Discussion and conclusion

The normal density (2) is frequently used as mixture kernel and appropriate for our application. However, if necessary it allows easy extensions either in the mean structure or the variance-covariance structure. For one, one could allow component–specific variances leading to _{
j
}. For two, instead of using a common slope model this could be generalized to component-specific slopes _{
j
} leading _{
i
}|_{
j
}, _{
j
}, ^{2}) or ^{2} leading to

as before, and

which reflects the fact that there are component–specific slopes and

However, for our data constellation the proposed model (2) is appropriate as also Figure

Also, we have looked at the potential for curvature. This would correspond to an asymptotic change in life expectancy growth and relates to the discussion in Oeppen and Vaupel

A qualitatively different approach follows a conditional autoregressive model (CAR) which was originally suggested by Clayton and Kaldor

In North Rhine-Westphalia (NRW), there is an apparent continuous rise in life expectancy at birth in men and women within the last twenty years. However, this pattern needs to be contemplated differentially. Our analysis shows that in North Rhine-Westphalia, life expectancy is predominantly higher in rural than in urban districts and differs considerably by region. Within the observed period from 1990 to 2010, levels of growths of life expectancy ranged from 70.3 to 73.7 years in men and from 77.3 to 80.2 in women. Life expectancy in the 54 districts was influenced by a latent categorical variable, which consists of seven categories or clusters. Each of the 54 districts is allocated into one of the seven clusters. This latent variable might be a surrogate variable for socio-economic factors. Life expectancy, as well as its counterpart mortality, strongly depends on factors like education, income, occupational status in addition to the factors sex and age. Most recent analyses of the European Prospective Investigation into Cancer and Nutrition (EPIC) showed that total mortality among men with highest education level is reduced by 43% compared to men with the lowest (hazard ratio (HR): 0.57, 95% confidence interval (CI) 0.52 – 0.61). Among women, the reduction was 29% (HR 0.71, 95% CI 0.64 – 0.78). In men, social inequalities were highly statistically significant for all causes of death examined. In women, the authors found a less strong, but statistically significant association with social inequalities for all causes of death except for cancer mortality and injuries (Gallo

These findings support results of a socio-spatial cluster analysis conducted in 2007 by Strohmeier

In relation to the NRW health indicators the authors found a significantly lower male and female average life expectancy in the poverty pole. In our analyses also more cities, especially of the Ruhr area, are categorized into the clusters of lower life expectancy. The Ruhr area is an urbanized, high density area comprising 11 cities and 4 counties with about 5 million inhabitants, formerly characterized by heavy industry and now undergoing a structural change towards e.g. information technology and health care industry. An additional underlying cause for lower life expectancy in this area might still be environmental. The Heinz Nixdorf RECALL study (Fuks

In a subgroup of the RECALL study population, participants residing in Essen (n=1,641) and Mülheim (n=1,742) for which digitized information on inner city roads was available, prevalence of coronary heart disease at high traffic exposure showed significantly elevated OR=1.85 (95% CI 1.21 – 2.84, adjusted for cardiovascular risk factors and background air pollution) (Hoffmann

The cluster analysis of life expectancy once more stresses the differences between urban and rural regions in North Rhine-Westphalia. The latent component categorizing the 54 districts into seven categories can be interpreted as a surrogate comprising several underlying factors. The results point to districts where an accumulation of problems has negative impact on health. For males, only three cities are classified into the lowest cluster category, with 5.4% of the total NRW population living there. For women, only Gelsenkirchen is classified into this cluster. Given the emerging insight into possible underlying causes, chances for these cities to improve their outcome may come into closer reach.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

All authors have made substantial individual contributions to the manuscript. All authors read and approved the final manuscript.

Authors’ information

Claudia Terschüren, Rolf Annuß and Rainer Fehr are public health scientists working for the Landeszentrum Gesundheit NRW, Germany. Sarah Karasek is a postgraduate student of statistics at the University of Graz and Dankmar Böhning is Chair in Medical Statistics at the University of Southampton.

Acknowledgements

The authors are grateful to Dr Antonello Maruotti for valuable input to the paper.

Pre-publication history

The pre-publication history for this paper can be accessed here: