Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK

Department of Social and Environmental Health Research, London School of Hygiene and Tropical Medicine, London, UK

Abstract

Background

The two-stage time series design represents a powerful analytical tool in environmental epidemiology. Recently, models for both stages have been extended with the development of distributed lag non-linear models (DLNMs), a methodology for investigating simultaneously non-linear and lagged relationships, and multivariate meta-analysis, a methodology to pool estimates of multi-parameter associations. However, the application of both methods in two-stage analyses is prevented by the high-dimensional definition of DLNMs.

Methods

In this contribution we propose a method to synthesize DLNMs to simpler summaries, expressed by a reduced set of parameters of one-dimensional functions, which are compatible with current multivariate meta-analytical techniques. The methodology and modelling framework are implemented in

Results

As an illustrative application, the method is adopted for the two-stage time series analysis of temperature-mortality associations using data from 10 regions in England and Wales.

Discussion and Conclusions

The methodology proposed here extends the use of DLNMs in two-stage analyses, obtaining meta-analytical estimates of easily interpretable summaries from complex non-linear and delayed associations. The approach relaxes the assumptions and avoids simplifications required by simpler modelling approaches.

Background

Research on the health effects of environmental stressors, such as air pollution and temperature, often relies on time series analysis using data from multiple locations, usually cities

Recently, the first-stage modelling approaches have been extended with the introduction of

In this contribution we propose a method to reduce estimates from DLNMs to summaries defined in only one dimension of predictor or lags, re-expressing the fit in terms of reduced parameters for the related uni-dimensional basis functions. This step decreases the number of parameters to be pooled in the second stage, offering a method to meta-analyse estimates from richly parameterized non-linear and delayed exposure-response relationships.

In the next section, we provide a brief recap of the algebraic development of DLNMs and multivariate meta-analysis, and then describe the main statistical development, establishing a method to reduce the fit of a DLNM to summaries expressed in a single dimension. A motivating example with an analysis of the relationship between temperature and all-cause mortality is used throughout the paper to illustrate the statistical framework. We finally note some limitations and indicate future directions for research. Supplementary online material provides information on algebraic notation and software (Additional file

**Online appendix.** This pdf document provides additional information on the algebraic notation, on the software and

Click here for file

**Data.** This csv file includes the time series data for the 10 regions of England and Wales during the period 1993–2006, used in the example.

Click here for file

**R scripts.** This.zip file contains 6

Click here for file

Methods

The two-stage time series design can be applied to series of observations collected at each time _{
i
}, in each location _{
i
} observations, obtaining location-specific estimates of the association of interest. These estimates are then pooled across locations in the second stage, with the aim to estimate an average exposure-response relationship and inspect heterogeneity across locations.

An illustrative example

As an illustration, we describe an analysis of the relationship between temperature and all-cause mortality using daily series of _{
i
} =

Distributed lag non-linear models

The DLNM framework has been extensively described

The DLNM modelling class

Distributed lag linear and non-linear models are expressed through a lag-basis and cross-basis function _{
t
}), respectively, of the **x** =[_{1},…,_{
t
},…,_{
N
}]^{T}
_{
t
}) first requires the derivation of the **Q** of lagged exposures, so that **q**
_{
t·} =[_{
t
},…,_{
t−ℓ
},…,_{
t−L
}]^{T}
**
ℓ
** =[0,…,

where different models are specified with different choices of the basis to derive **C**. The transformed variables in **W** = **QC** can be included in the design matrix of the first-stage regression model, in order to estimate the _{
ℓ
}-length parameter vector **
η
**, with

The non-linear extension to DLNMs requires the choice of a second basis with dimension _{
x
} to model the relationship along the space of the predictor _{
x
} basis matrix **Z** from the application of the related functions to **x**. Applied together with the transformation which defines the matrix of lagged exposures **Q** above, this step produces a three-dimensional _{
x
} × (_{
t
}) for DLNMs is then given by:

The simpler lag-basis for DLMs in (1) is a special case of the more complex cross-basis for DLNMs in (2). These models may be fitted through common regression techniques with the inclusion of cross-basis matrix **W** in the design matrix. The vector
_{
x
} × _{
ℓ
} is equal to the product of the dimensions of the bases for the two spaces. In completely parametric models as those described here, this dimensionality is directly associated with the notion of **Z** and **C**. The software implementation of this methodology in the

Summarizing the results from a DLNM

Fitted bi-dimensional cross-basis functions from DLNMs can be interpreted by deriving predictions over a grid of predictor and lag values, usually computed relative to a reference predictor value. As a first example, we show the results of a single-location analysis, using data from the North-East region of England. The temperature-mortality relationship is modelled through the same cross-basis used for the full two-stage analysis illustrated later, composed by two B-spline bases.

The results are shown in Figure

Temperature-mortality association in the North-East region of England, 1993–2006

**Temperature-mortality association in the North-East region of England, 1993–2006.**

This bi-dimensional representation contains details not relevant for some interpretative purposes, and does not easily allow presentation of confidence intervals. The analysis therefore commonly focuses on three specific uni-dimensional _{0} can be defined along the lag space. As an example, this is reproduced in the top-right panel for temperature _{0} = 22°C, together with 95% confidence intervals (CI), and corresponds to the red line parallel to the reference in the 3-D graph. Second, similarly, a _{0} can be defined along the predictor space. This is shown in the bottom-left panel for lag _{0} = 4, and coincides with the red line in the 3-D graph perpendicular to the reference. Third, the sum of the lag-specific contributions provides the

Multivariate meta-analysis

The framework of multivariate meta-analysis has been previously described

The multivariate extension of meta-analysis

Specification of the model assumes that a **S**
_{
i
} have been estimated in each of the

where the location-specific estimated outcome parameters
**u**
_{
i
} = _{1},…,_{
p
}
^{T}
**Ψ** and **S**
_{
i
} represent the between and within-location (co)variance matrices, respectively. This multivariate meta-regression model is applied to estimate the parameter vectors **
β
** and

Limitations of multivariate meta-analysis

In theory, the _{
x
} × _{
ℓ
} parameters
**
ξ
** defining the true between-location variability, composed by

This limitation is one of the main reasons which have prevented so far the full application of DLNMs in two-stage analysis. The modelling approach has often required the simplification of the first-stage model, for the second-stage multivariate meta-analysis to be feasible. For example, investigators have assumed a linear relationship in the dimension of the predictor

Reducing DLNMs

Predictions from DLNMs as those shown in Figure
**x**
_{[p]} and **
ℓ
**

The definition of the reduced parameters depends on the specific summary among those listed above. They can be obtained by applying a related dimension-reducing matrix **M**, expressed as:

for predictor-specific summary association at _{0}, for for lag-specific summary association at _{0}, and for overall cumulative summary association, respectively. Here
_{0}and _{0} obtained by the application of the sets of basis functions for predictor and lags, respectively. The reduced parameter vector

with _{0},_{0},_{0}, defined on the lag space for values in **
ℓ
**

Results

The analysis is now extended to the full set of 10 regions, with the aim to produce pooled estimates of the overall cumulative association, and to compare the results with those obtained by simpler approaches, applying a moving average to the daily exposure series. Also, we investigate the lag structure for exposure to cold and hot temperatures through predictor-specific estimates. Finally, we assess heterogeneity and then the role of meta-variables through multivariate meta-regression.

Modelling strategy

The first-stage region-specific model is specified by adopting a standard analytical approach for time series environmental data

In the main first-stage model, the temperature-mortality association is estimated by a flexible cross-basis defined by a quadratic B-spline for the space of temperature, centered at 17°C, and a natural cubic B-spline with intercept for the space of lags, with maximum lag _{
x
} = 4 and _{
ℓ
} = 5 for temperature and lag spaces, respectively. The same specification was previously applied for the single-region analysis.

The set of _{
x
} × _{
ℓ
} = 20 coefficients of the cross-basis variables with associated (co)variance matrices, estimated in each region, are then reduced. Specifically, for region **Z**
_{[p]} for the overall cumulative summary association, and two vectors
**C**
_{[p]} for predictor-specific summary associations at 0°C and 22°C. These temperatures correspond approximately to the 1^{st} and 99^{th} of the pooled temperature distribution, respectively. These effects along lags are interpreted using the reference of 17°C.

For comparison with methods not requiring dimensionality reduction, in two alternative first-stage models we simplify the lag structure by fitting one-dimensional splines to the moving average of the temperature series over lag 0–3 and 0–21, respectively. Such moving average models have been commonly used in weather and air pollution epidemiology
_{
x
} × _{
ℓ
} = 4 × 1 = 4 parameters re-scaled by the number of lags, giving a dimension-reducing matrix **M**
_{[c]}, as described in (4c), composed in this case by a diagonal matrix with entries corresponding to a constant equal to the number of lags.

The coefficients for each of the three summary associations from the main model are estimated in the 10 regions and then independently included as outcomes in three multivariate meta-analytical second-stage models. The ten estimated sets of coefficients from the two alternative models (equivalent to the overall cumulative summary) were directly meta-analysed. All the second-stage models are fitted here through restricted maximum likelihood (REML) using the ^{th} and 75^{th} percentiles of its distribution, using the same baseline reference of 17°C. The significance of such an effect is assessed through a Wald test, given a likelihood ratio test cannot be applied to compare model fitted with REML and different fixed-effects structures

Two-stage analysis

The overall temperature-mortality associations in the 10 regions of England and Wales are illustrated in Figure
^{th} percentile of the pooled temperature distribution. The multivariate Cochran Q test for heterogeneity is highly significant (^{2} statistic indicates that 63.7% of the variability is due to true heterogeneity between regions.

Pooled overall cumulative temperature-mortality association in 10 regions of England and Wales, 1993–2006

**Pooled overall cumulative temperature-mortality association in 10 regions of England and Wales, 1993–2006.**

The right panel of Figure

Figure
_{
ℓ
} = 5 reduced coefficients. Consistently with previous research, the effect of hot temperature is immediate and disappears after 1–2 days, while cold temperatures are associated with mortality for a long lag period, after an initial protective effect. This complex lag pattern can explain the different results provided by the less flexible alternative models. The pooled overall RR estimated by the main model, cumulated along lags for these specific summaries and reported graphically in Figure
^{2} of 63.4% and 16.0%, respectively.

Pooled predictor-specific temperature-mortality association in 10 regions of England and Wales, 1993–2006

**Pooled predictor-specific temperature-mortality association in 10 regions of England and Wales, 1993–2006.** First-stage region-specific and pooled (95%CI as grey area) summaries at 22°C (left panel) and 0°C (right panel). Reference at 17°C.

It is interesting to note that the second-stage multivariate meta-analytical model for the predictor-specific summary associations at 22°C estimates perfectly correlated random components, with between-study correlations equal to −1 or 1. This is a known phenomenon in multivariate meta-analysis, frequently occuring in the presence of a small number of studies and/or a high within-study uncertainty relative to the between-study variation

The heterogeneity across regions can be partly explained as effect modification by region-specific variables. The results of the example of meta-regression with latitude are illustrated in Figure
^{th} and 75^{th} percentiles of latitude, respectively, while the same estimates are 1.106 (95%CI: 1.079–1.133) and 1.104 (95%CI: 1.059–1.150) for 22°C. Overall, the evidence for an effect modification is substantial, with a highly significant Wald test (^{2} reduced to 18.7% and a non-significant Cochran Q test (

Pooled temperature-mortality association by latitude in 10 regions of England and Wales, 1993–2006

**Pooled temperature-mortality association by latitude in 10 regions of England and Wales, 1993–2006.** Predictions for the 25^{th} (dot-dashed line) and 75^{th} (dashed line) percentiles of latitude from meta-regression for overall cumulative summary (top panel), and predictor-specific summaries at 22°C (bottom-left panel) and 0°C (bottom-right panel). Reference at 17°C. The 95%CI are reported as shaded areas.

Discussion

In this contribution we describe a method to re-express the bi-dimensional fit of DLNMs in terms of uni-dimensional summaries, involving reduced sets of modified parameters of the basis functions chosen for the space of predictor or lags. This development, in addition to simplifying the algebraic definition of the methodology, offers a more compact description of the bi-dimensional association modelled by DLNMs. In particular, the dimension of the sets of reduced parameters is usually compatible with the application of multivariate meta-analytical techniques in a two-stage framework, allowing the analysis of complex non-linear and delayed associations in multi-location studies.

Previous applications of the two-stage design for multi-location time series studies are based on simplified functions for modelling the association of interest at the first stage. In particular, the analyses are usually limited to splines or other non-linear functions of simple moving average of the exposure series

Most of the limitations of DLNMs and multivariate meta-analysis of multi-parameter associations, previously discussed

The problem of estimating perfectly correlated random components in the second-stage meta-analytical model, as described in the example, can bias upward the standard errors of the pooled estimates. This problem occurs in likelihood-based and method of moments estimation procedures of multivariate meta-analysis, as these estimators truncate the between-study correlations on the boundary of their parameter space

The definition of identical cross-basis functions in all the locations can be problematic in the presence of substantially different exposure ranges. In our example, the temperature distribution was similar across regions, and the placements of common knots was straightforward. However, this can be hardly generalized. The issue was previously discussed, and an alternative approach based on relative scale was proposed for pooling one-dimensional functions

Estimation methods for DLNMs not requiring the completely parametric approach proposed here seems attractive and possible, in particular based on penalized likelihood

Potentially, the number of parameters of the second-stage multivariate meta-analysis can also be decreased by structuring the between-study (co)variance matrix of random effects. However, the extent to which such a choice can bias the estimates of fixed-effects parameters is not known. Moreover, this option is not yet available in the

Conclusions

The extension of the DLNM framework presented here, involving the reduction of the complex two-dimensional fit to one-dimensional summaries, provides an improved method to study complex non-linear and delayed associations in two-stage analyses. Unlike previous approaches proposed so far, this method requires less simplification of the exposure-response shape or lag structure. This framework may be applied in any setting where non-linear and delayed relationships needs to be investigated in different populations or groups.

Abbreviations

DLM: distributed lag model.

Competing interests

The authors declare that they have no competing interests.

Author’s contributions

BA firstly conceived the idea of re-expressing summaries of DLNMs in terms of one-dimensional functions. AG then derived the algebraic expression. AG and BA contributed to the structure of the manuscript and the design of the analysis in the examples. AG implemented the methodology in the

Acknowledgements

Antonio Gasparrini is currently funded by a Methodology Research Fellowship from Medical Research Council UK (grant ID G1002296). Ben Armstrong and Antonio Gasparrini were supported by a grant from Medical Research Council UK during the preliminary stage of the project (grant ID G0701030).

Pre-publication history

The pre-publication history for this paper can be accessed here: