Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China

Department of Geography, San Diego State University, San Diego, California 92182-4493, USA

Beijing Institute of Pediatrics, Beijing, 100012, China

Institute of Population Science, Peking University, Beijing 100871, China

Abstract

Background

Neural tube defect (NTD) prevalence in northern China is among the highest worldwide. Dealing with the NTD situation is ranked as the number one task in China's scientific development plan in population and health field for the next decade. Physical and social environments account for much of the disease's occurrence. The environmental determinants and their effects on NTD vary across geographical regions, whereas factors that play a significant role in NTD occurrence may be buried by global statistics analysis to a pooled dataset over the entire study area. This study aims at identification of the local determinants of NTD across the study area and exploration of the epidemiological implications of the findings.

Methods

NTD prevalence rate is represented in terms of the random field theory, and Rushton's circle method is used to stabilize NTD rate estimation across the geographical area of interest; NTD determinants are represented by their measurable proxy variables and the geographical weighted regression (GWR) technique is used to represent the spatial heterogeneity of the NTD determinants.

Results

Informative maps of the NTD rates and the statistically significant proxy variables are generated and rigorously assessed in quantitative terms.

Conclusions

The NTD determinants in the study area are investigated and interpreted on the basis of the maps of the proxy variables and the relationships between the proxy variables and the NTD determinants. No single determinant was found to dominate the NTD occurrence in the study area. Villages where NTD rates are significantly linked to environmental determinants are identified (some places are more closely linked to certain environmental factors than others). The results improve current understanding of NTD spread in China and provide valuable information for adequate disease intervention planning.

Background

Birth defects --in particular, neural tube defects (NTD) - refer to any anomaly (functional or structural) that occurs in infancy or later in life. These are birth defects primarily of the brain and spinal cord (NTDs are comprised mainly of anencephaly, spina bifida, and encephalocele

Important birth defect factors include heredity, environment (physical and psychological conditions, socioeconomic status, health etc.) and their interactions. The impact of a risk factor varies with the type of NTD and the presence or absence of other defects. Several studies have investigated the role that genetic and environmental factors play in triggering NTD cases

China has been acknowledged as a geographical region with high NTD occurrences. Based on data collected from a hospital-based surveillance system, the average NTD prevalence rate during 1987 was calculated to be approximately 27.4 per 10,000 births (with considerably higher rates observed in certain regions). Infant deaths caused by birth defects have increased constantly in China since the end of the last century

Taking into consideration the fact that the Heshun county (Shanxi province, China) is one of the areas with the highest NTD prevalence in the world, the main objective of this paper is to accurately map the geographical distribution of NTD cases in that county and identify the corresponding NTD determinants. Important issues to consider include the assessment of the specific factors causing NTD in a given region or population group in the Heshun county, and the investigation whether the NTD determinants apply globally or locally in space

The above considerations make it appropriate that in the present Heshun NTD study we implement an adequate synthesis of quantitative techniques including spatial NTD analysis. The results of the spatial NTD analysis could be used to accurately identify intervention targets and offer valuable input to the systematic development of prevention strategies. This is an important matter, since it is widely accepted that the accurate identification of NTD determinants allows early intervention, which is a crucial component of any effort to minimize the consequences of birth defects.

Methods

Study area

Heshun county (Fig. ^{2}. Most of the people in this county are farmers and their living environment seldom changes. There is no large-scale human immigration in the region's history. Remarkably, most kinds of birth defects designated by the WHO (World Health Organization) are found in Heshun, and among them the defects linked to NTD are the predominant ones

Location of Heshun County (the dots on the map denote the 326 villages in which data were collected)

**Location of Heshun County (the dots on the map denote the 326 villages in which data were collected)**.

Spatial random field theory

Let the geographical distribution of the epidemic attribute (number of NTD cases) be represented mathematically by the spatial random field (SRF), _{s}, in the sense of Christakos **s **= (_{1}, _{2}) denotes the location of the attribute, where _{1}, _{2 }are the associated spatial coordinates of the location. Also, let _{s }= _{s}] be a non-random quantity that represents the average value of all possible SRF realizations at the location **s**, where the

The observed population rate (OPR) of the NTD cases _{s }over an area ℜ is defined as

where **s **varies within ℜ, and |ℜ| denotes the total number of births in the area. The _{ℜ }is a random quantity, i.e. even when considering the same area, one may get different results if the _{ℜ }is computed over different SRF realizations. The superpopulation rate (SPR), also called the stochastic rate, of the NTD cases _{s }over an area ℜ is defined as

where _{y, s }is the probability density function of _{s}, and ψ_{s }denotes a realization of _{s }at s (for the underlying mathematical details the readers are referred to Christakos and Hristopulos

The OPR is directly observable and expresses the "here-and-now" crude disease rate, which makes _{ℜ }a useful study parameter when the objectives include the study of infectious disease outbreaks and the assessment of emergency health services. The SPR, on the other hand, expresses an essential property of epidemiological phenomena _{ℜ }a useful tool in the study of the relationship between an epidemic and its determinants. The SPR is rarely observed directly, but it can be approximated in terms of the available observations and by incorporating neighboring samples in a Bayesian context _{ℜ }will be the prime focus of the present study, which means that in the following the term "NTD rate" refers to SPR values.

Prevalence rates

In order to investigate possible NTD determinants, the prevalence rates need to be estimated across space. In this study, determinants were considered at the village scale (Fig.

Population and NTD cases in Heshun county

**Population and NTD cases in Heshun county**.

In spatial analysis one assumes that events are spatially correlated (dependent); see

The Rushton circle

In this study, all 270 villages that have a number of births equal or higher than 5 ("≥ 5 " rule) were used to predict the "NTD_rate circle". Another 56 villages that did not satisfy the " ≥ 5 " rule were left to be predicted. The method used to predict the NTD rate across space involved the assumption that the villages had the same rate as the "NTD_rate circle" to which they belonged. It should be noticed that the "NTD_rate circles" may overlap each other when the distance between two centers is less than 6 Kms. Therefore, the villages to be estimated might not be included only in one circle, in which case the NTD rates of these villages were taken to be equal to the average values of the "NTD_rate circles" to which they belonged. Fig.

Rushton circles to adjust NTD rates

**Rushton circles to adjust NTD rates**. (a) Village centers; and (b) Village centers with "NTD_rate circles".

Determinants of NTD and their proxies

Three prime types of factors are suspected to cause NTD: (1) environment (physical, social, economic etc.); (2) hereditary (genetic, pre-existing conditions etc.); and (3) synthetic (interaction between (1) and (2)). Recent studies show that most NTD cases are the result of environment-gene interactions

• Physical NTD determinants that are spatially distributed. Potential NTD hazards include surface and subsurface water contaminated by insufficiently oxygenized ancient geological media; also, radiation emissions from certain rocks or along faults

• Man-made pollution that is spatially distributed. Hazards of this kind include pesticides and chemical fertilizes spread over crop fields. Also, polluted air and water emission from workshops and electromagnetic radiation in the workplace

• Nutrition processes that are spatially distributed. For example, nutrition strongly depends on spatially varying residential income. Hence, it is usually proportional to the GDP that is regularly surveyed across space and published in the government's annual statistics/census reports

• Heredity and habits that are spatially distributed. Ethnic groups have specific genetically transmitted habits and behavioral patterns (e.g., related to food consumption), some of which are hazardous to health

The explicit physical and human geographical proxies of the NTD determinants are collected: elevation, accessibility (e.g., road buffer), geological background (fault buffer), water conditions (e.g., river buffer), per-capita income (per-capita net income), medical conditions (e.g., number of doctors), crop yield (e.g., vegetable and fruit production), agricultural chemical exposure (fertilizer and pesticide use), land cover, lithology, watershed and soil conditions in every village. The socioeconomic factors were measured in terms of averaged annual levels during the period 1998-2005. Fumonisins in maze or other grains could be an important NTD factor

Inference approach

A main objective of the present study is to identify possible NTD determinants. Fig. _{s }and their explicit geographical proxies _{s}. The latter are inserted into a GIS, which is then regressed with the NTD rate _{s }by means of the GWR technique _{s }according to the conceptual framework of Fig. _{s}, _{s }and _{s }are represented in terms of the SRF theory, see above. In symbolic terms, we seek to calculate the conditionals (_{s }|_{s}) and (_{s }| _{s}), which logically infer the direct NTD determinant _{s }given the NTD rate _{s }and the proxies _{s }given the NTD rate _{s}, respectively. In terms of Bayesian inference we can write

where the (_{s}|_{s}) is estimated by means of GWR, the (_{s}) is known from GIS, (_{s}) is known from the corresponding survey, and (_{s}| _{s}) is calculated on the basis of physical and human geographical processes. The Bayesian equations above allow logistic inference even when not all included variables and relationships are computable. In other words, the logical framework of Fig.

Relationship between direct determinants and their proxy variables

**Relationship between direct determinants and their proxy variables**.

When implementing the GWR technique, categorical variables (including land cover, lithology, watershed and soil conditions) should be distinguished from explanatory variables. Typically, the former variables appear in terms of the Ordinary Least Squares (OLS) technique by introducing dummy variables. However, this would result in what is technically termed a "sever model design" in GWR analysis (i.e., an explanatory variable is perfectly collinear with the intercept, especially when a village and all its neighbors have the same values for one or more explanatory variables). As a consequence, we only regressed non-categorical variables and NTD rates using GWR. The categorical variables were checked by comparing the spatial patterns of the GWR outputs vs. those of the categorical variables.

Local regression by GWR

In standard regression applications, the elastic coefficients of the NTD factors are assumed to be constant over space, which is the case of "one model fits all". The GWR technique, however, properly extends the traditional regression framework by allowing local rather than global parameters to be estimated. In this study, therefore, we implemented the GWR technique to identify local relationships between NTD and environmental factors. The GWR software tool used was GWR3.0

Note that the basic idea of GWR is that observations near a specified point **s**_{i }have more influence in the prediction of the disease parameters associated with i than do observations farther away from **s**_{i}. Accordingly, data close to **s**_{i }are weighted more than data that are farther away from **s**_{i}, in which case the geographical weights of observed data as far as prediction at point **s**_{i }is concerned are as follows

where _{ij }represents the weight of any datum at point **s**_{j }(**s**_{i}. Normally, each _{ij }is a continuous function of _{ij}, the distance between **s**_{i }and **s**_{j }(i.e. _{ij }= |**s**_{i }- **s**_{j}|). One possible choice is

The village centroids in the Heshun county are distributed unevenly: some are densely distributed, whereas some others are sparsely distributed. This means that local regression may rely on relatively few data points in areas where these points are sparsely distributed. To address this potential problem we used a spatially adaptive weighting technique, which involves the experimental calculation of the bandwidth rather than assigning it directly in the GWR context. The bandwidths are relatively small in areas where the data points are densely distributed and they are rather large in areas where the data points are sparely distributed. Better results, measured in terms of the global ^{2 }of GWR, were obtained when more points were involved in bandwidth calculation.

Results

Maps of GWR performance

Fig. ^{2 }maps of local regression are presented in Fig.

NTD rates (NTDR) in Heshun County

**NTD rates (NTDR) in Heshun County**.

Performance of the GWR technique

**Performance of the GWR technique**.

Global performance of GWR.

**Parameter**

**P-value**

Intercept

0.00000

***

Elevation

0.00000

***

Riverbuffer

0.94000

n/s

Roadbuffer

0.00000

***

Faultbuffer

0.02000

*

Doctor

0.13000

n/s

Fertilizer

0.00000

***

Fruit

0.00000

***

Net_income

0.00000

***

Pestcide

0.00000

***

Vegetable

0.00000

***

*** = significant at higher than 1% level

* = significant at 5% level

Maps of GWR coefficients

The local coefficients for every variable (elevation, riverbuffer, number of doctors, net income, vegetable production, pesticide use etc.) together with the associated significant levels are plotted in Fig.

Local GWR coefficients for every proxy variable together with the associate significant levels

**Local GWR coefficients for every proxy variable together with the associate significant levels**.

Discussion and Conclusions

The series of maps in Figs.

Nonlinearity and multi factors

The local GWR coefficients of most environment factors exhibit significant positive and negative associations with the NTD rates at different sites simultaneously. This cannot be explained by assuming a linear relationship between NTD and its possible determinants (which is the case, since GWR is essentially a linear regression technique). The situation may imply that a nonlinear association exists between NTD and the relevant environmental factors or that not a single determinant dominates the NTD occurrence in the study area.

Elevation

NTD rates increase with increasing elevation in two village aggregations (the green color denotes significant positive association; Fig.

Distance from faults

As is shown in Fig.

Distance from roads

The sites with significant road buffer coefficients (Fig.

Health services

Fig.

Net income

Fig.

Fruit production

At its current level, fruit production (Fig.

Vegetable production

As one can see in Fig.

Baseline

The intercept (Fig.

Soil and lithology

The categorical variables are visually compared with Fig.

In summary, the local statistics approach identifies the villages where NTD rates are significantly linked to environmental determinants. In several village aggregations the NTD are found to be significantly associated with the proxy variables of radiation and ancient water released from the faults. Soil and lithlogy, river and road, health service, food production, pesticides and fertilizer are significantly related with NTD in some places, which can be interpreted by etiology or by social behaviors. Some places are more active than others as far as coefficient significance of the GWR is concerned, whereas most villages are always insignificant with respect to the different variables. This means that the NTD situation in these places may be more complicated than the variables and the linearity assumption considered by the GWR technique. In such cases, a composite space-time analysis involving nonlinear predictors

Abbreviations

NTD: neural tube birth defects; NTDR: neural tube birth defect rate; GWR: geographically weighted regression; SRF: spatial random field; OPR: observed population rate; SPR: superpopulation rate; ARV: average root variance; GIS: geographical information system.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

This study was conceived and completed by JW. XL and YL assisted with calculation. GC assisted with the analyses and revision of the manuscript. XG and XZ assisted with the medical analysis of birth defects. All authors read and approved the final manuscript.

Acknowledgements

This study was supported by the NSF China (40471111, 70571076), MOST, China (2006AA12Z215, 2007DFC20180; 2007AA12Z233), CAS, China (KZCX2-YW-308), and the California Air Resources Board, USA (55245A).

Pre-publication history

The pre-publication history for this paper can be accessed here: