Abstract
Background
Many epidemiologic studies report the odds ratio as a measure of association for crosssectional studies with common outcomes. In such cases, the prevalence ratios may not be inferred from the estimated odds ratios. This paper overviews the most commonly used procedures to obtain adjusted prevalence ratios and extends the discussion to the analysis of clustered crosssectional studies.
Methods
Prevalence ratios(PR) were estimated using logistic models with random effects. Their 95% confidence intervals were obtained using delta method and clustered bootstrap. The performance of these approaches was evaluated through simulation studies. Using data from two studies with healthrelated outcomes in children, we discuss the interpretation of the measures of association and their implications.
Results
The results from data analysis highlighted major differences between estimated OR and PR. Results from simulation studies indicate an improved performance of delta method compared to bootstrap when there are small number of clusters.
Conclusion
We recommend the use of logistic model with random effects for analysis of clustered data. The choice of method to estimate confidence intervals for PR (delta or bootstrap method) should be based on study design.
Background
While the odds ratio (OR) is one of the most frequently used measures of association between a risk factor and an outcome in epidemiology, the risk ratio(RR) and prevalence ratio (PR) are important indices to quantify the strength of association between a given disease and a suspected risk factor [1]. The main reason for the popularity of the OR is because the OR is the measure of association usually provided by logistic regression models. There is a large body of literature discussing the relationship between OR and RR(or PR) [2,3] and an ongoing debate on the appropriateness of odds ratios versus prevalence ratios as measures of effect in crosssectional studies [49]. It is known that the OR overestimates the RR(or PR) [3,10,11] when the outcome of interest is common (larger than 10%, for instance). The major limitation of using OR under such circumstances is related to its misinterpretation as PR [11].
The use of odds ratios in crosssectional studies, a common practice among epidemiologists, has been criticized because prevalence odds ratios are good estimates of prevalence ratios only under specific circumstances [1214]. More recent studies examining the differences between OR and PR, according to variations in the prevalence of exposure and disease, have shown that differences between odds ratios and proportions ratios, relative risks or prevalence ratios increase with increasing disease frequency [15]. There are several statistical models that can provide adjusted estimates for PR, including the logistic model, Poisson regression and logbinomial regression [3,10,1618]. However, there is no consensus about the best approach to obtain the adjusted PR and these methods may lead to different conclusions. The main appeal of estimating PR as a measure of association is that PR is more easily interpreted than the OR in crosssectional studies with common outcomes. For instance, a PR of 2 means that the proportion of cases among exposed is 2 times higher than among unexposed subjects, while an OR of 2 does not necessarily have the same meaning. Previous reports have estimated the PR in the context of simple random samples, where the responses of distinct individuals can be considered independent to each other [10], but in many situations this assumption may not be satisfied. Clustered crosssectional studies have become increasingly popular in epidemiology, especially when the use of simple random sample designs is not feasible. In such case, the analysis must take into account the degree of similarity between subjects within clusters [19]. In the present paper we have evaluated methods for estimating adjusted PR in clustered crosssectional studies using randomeffects models.
The evaluation of these methods has been motivated by data from the SCAALA (Social Changes, Asthma and Allergy in Latin America Programme) studies in Brazil [20] and Ecuador [21], both of which use clustered data. The study in Brazil was conducted in Salvador, located in the Northeast of the country, and evaluated associations between the prevalence of asthma and other allergic diseases in children and potential risk factors, including living conditions and exposure to infections [20]. The participants of this study were recruited from 24 small geographical areas selected to represent the population without sanitation in the city of Salvador in 1997. The clustered study design could lead to dependence on asthma occurrence in the children living in the same geographical area. The Ecuadorian study was conducted in the province of Esmeraldas, one of the poorest regions of the country, to investigate the impact of long term treatment with the broadspectrum anthelmintic drug, ivermectin, used for the control of onchocerciasis, on the prevalence and intensity of soiltransmitted helminth infections in schoolage and preschool children [21]. The data from this study was used to compare the prevalence of Trichuris infection between children living in treated and non treated communities.
In this paper, we have evaluated modeling strategies for the estimation of adjusted PRs in crosssectional studies with common outcomes through simulation studies in the settings of clustered data.
Methods
Logistic regression is the most popular model used for the analysis of binary outcomes to estimate adjusted odds ratios. These can be expressed in terms of the estimated effect of the factor of interest on the outcome, or more simply as the exponential of the factor's coefficient (for instance, OR = exp(β_{1}), where β_{1 }denotes this effect). The estimation of the PR, however, requires a more complicated mathematical expression that relates the effects and the values of the factors of interest. For example, suppose it is of interest to evaluate the effect of an exposure (X_{1}) on the occurrence of an outcome while controlling for k  1 confounders (X_{2},..., X_{k}). In such circumstances, the PR between exposed and unexposed subjects could be expressed as:
Note that the PR depends on the values of the covariates in the model. Some alternative models discussed in the literature that allow a simpler approach for the estimation of the PR include the logbinomial, the Poisson with robust variance estimator and the Cox model with the same followup time for all subjects [3,10,1618]. A major limitation of the Poisson and the logbinomial models, however, is that they allow prediction of probabilities out of the interval [0, 1]. The logbinomial model fails to converge when this happens. Moreover confidence intervals obtained using Poisson and Cox models are wider than those obtained from logbinomial model, requiring the use of a robust variance estimator. According to Moineddin, Matheson and Glazier (2007) [22], direct conversion of adjusted OR into PR is impractical because the notions of linearity, confounding and interaction are not equivalent between the different models. Thus, the logistic regression model would appear to be an alternative approach for the estimation of PRs and their confidence intervals.
Standardization Procedures
Several standardization procedures for epidemiologic measures of effect based on regression models have been proposed. Wilcosky and Chambless [23] following Lane and Nelder [24] referred to three approaches for adjusting the prevalence ratio estimated from a logistic regression model: the conditional method where a standard value, usually the mean, is chosen for the covariates and the prevalence is computed for each comparison group; the stratified method, where for each comparison group the prevalence is estimated as a weighted average of the strata defined by combination of covariates values with weights chosen from a standard population; and the marginal method where, for each comparison group, the prevalence is computed for every combination of values of the covariates and averaged over all observations. The stratified and marginal methods will give the same results if the weights are chosen as the relative sizes of strata in the study population. In addition, all observations in every stratum have identical values of the covariates similar to the direct standardization procedure, which is a weighted average of predictions of all strata formed by the covariates where the weights are taken from a reference population. As an example, we have data on n individuals with a dichotomous exposure X_{1 }(1 = exposed, 0 = nonexposed), and one continuous covariate, X_{2}. Using the conditional method, the adjusted PR is given by where represents the mean of X_{2}. For the marginal method the adjusted PR is given by where the summation is over all n individuals. A similar expression is used in the stratified method, with k, the number of strata formed by categorizing X_{2}, replacing n and with the use of weights W_{k }chosen from a reference population
An alternative approach for conditional standardization is by specifying a reference value for each covariate rather than using their mean values [10]. This approach is particularly useful when considering several levels of exposure for a covariate.
Flanders and Rhodes [25] provide formulae for the estimated variance of the adjusted prevalence for all methods. In the next subsection we discuss methods to obtain confidence intervals using random effects logistic model for the setting of clustered data.
Estimating Prevalence Ratio using Logistic Model with Random Effects
Logistic Model with Random Effects
There are two large families of statistical models that account for the correlation in different ways, leading to estimated parameters that have different interpretations, which are denoted as marginal models and random effects models [26]. We will focus on a wellestablished approach for modeling clustered/correlated data that introduces random effects in the model of interest. This approach allows the relationship between the outcome and the covariates to vary from one subject to another. The random effects models take into account adjustment on nonobserved individual characteristics reflecting a natural heterogeneity across subjects. By using this approach, the correlation between the observations from the same analysis unit arises from their sharing specific but unobserved properties of the respective subject. A random effects logistic regression model can be used to predict binary outcomes when observations are correlated or come from clustered data. This method makes possible to deal simultaneously with the problems of correlated observations and measurement error in the dependent variable. For illustration, let Y_{ij }be a dichotomous outcome at cluster j for subject i and X_{1ij }and X_{2ij }two covariates. The random effects logistic regression model can be written as
where u_{oj }~ N (0, σ^{2}) represents a cluster specific random effect, leading to a random intercept logistic regression model, which is the simplest example of a generalized linear mixed model (GLMM). This model describes the combined effect of all omitted subjectspecific covariates that cause some subject to be more prone to disease (for example) than others. It is appealing to model unobserved heterogeneity in the same way as observed heterogeneity by simply adding the random intercept to the equation.
Using the estimates for the effects (β's) of covariates on the outcome obtained with the random effects logistic model, we can choose a standardization procedure and estimate PR using the formulae presented previously relating the values of covariates and their effects to the prevalence ratio. Several investigators interpret the regression coefficients (or the odds ratios) obtained from the logistic model with random effects in the same way as in the usual logistic regression model, by conditioning on the random effects [2729]. According to Hardin and Hilbe (2003), when explicitly modeling the source of heterogeneity in the logistic regression with random effects, the fixed regression parameters have an interpretation for individuals, which is subject specific [30].
Confidence Intervals for Prevalence Ratios
Methods for obtaining large sample confidence intervals for prevalence ratios include the delta method and the bootstrap. The delta method is a general technique for asymptotic distributions of functions of random variables, based on Taylor series approximation [31], and the bootstrap is based on resampling the data with replacement and using the bootstrap replications to estimate the functions of interest [32]. Both methods are used to estimate the standard error of PR from the random effects logistic regression model. For the delta method, adjusted confidence intervals are given by , where is the estimate of the adjusted log(PR), the estimate of the standard error of log(PR) and z_{α/2 }is the quantile of the standard normal distribution. In the bootstrap estimation 1000 bootstrap replications are used to produce the bootstrap distribution of PR. The confidence interval is based on normal theory, assuming that log(PR) is normally distributed, which is often approximately the case in sufficiently large samples, and uses the bootstrap estimate of sampling variance. The bootstrap confidence intervals are given by , where is the bootstrap estimate of the adjusted log(PR) and is the bootstrap estimate of the standard error of log(PR). An alternative approach, called bootstrap percentile interval [33], uses the empirical quantiles of the bootstrap estimates to form the interval. The limits of the interval are given by the 2.5 and 97.5 percentiles, for example, if we consider a 95% confidence interval. Previous simulation studies have pointed out an equivalence between the delta and bootstrap methods in the analysis of independent observations [10]. We used a cluster bootstrap procedure in which clusters are selected by simple random sampling with replacement and there is no subsequent permutation [34]. The behavior of both methods for the clustered data setting is compared here via simulation described in a following subsection. We are going to integrate these approaches for the estimation of PR using random effects logistic regression.
Epidemiological Studies
A brief description of two epidemiological studies whose data are used to illustrate the methods discussed in the paper is presented next. Both studies have outcomes with prevalence greater than 10% and are related to relevant health problems in children in developing countries.
SCAALASalvador Study
An epidemiological study is being conducted in the city of Salvador, in the Northeast of Brazil, to study the association between life conditions, immunological profile and occurrence of allergic diseases. The research project is called Social Changes and Asthma and Allergy in Latin America Programme (SCAALA). Information about asthma was obtained through the use of a Portuguese version of the questionnaire used by the International Study of Allergy and Asthma in Childhood (ISAAC) [35]. The design of this study has been reported elsewhere [20]. Briefly, children were recruited from 24 areas scattered around the city making this a clustered study. In this study, information of 1445 children aged 4 to 11 yearsold was collected.
Because causes of asthma are incompletely understood and there has been a recent interest in the relationship between psychosocial factors and asthma [36,37], the aim of the analyses presented here is to investigate the impact of maternal mental health status on the occurrence of asthma in their children. The data used was collected in 2005 and included information on maternal mental health status and other maternal characteristics, such as educational level, smoking status and history of asthma, as well as child's characteristics, such as age, gender, and occurrence of asthma. The definition of maternal mental health status has been reported elsewhere [38]. Briefly, a selfreported questionnaire (SRQ) of 20 items was used for psychiatric screening of common minor mental disorders (depression, anxiety and other psychosomatic dysfunctions) [39]. A cutoff point for the definition of probable cases of common minor mental disorders was defined as 8 or more positive answers, a definition that although not representing psychiatric diagnosis does indicate significant psychiatric suffering. For the analysis presented here, we considered data from 758 children and evaluated the impact of maternal mental health status in the occurrence of childhood asthma, controlling for child's age and gender, and maternal educational level.
SCAALAEcuador Study
Another important health problem throughout developing countries is parasite infections [40]. The National Program for Elimination of Onchocerciasis in Ecuador distributes ivermectin in endemic areas with the aim of eventually eliminating the infection from Ecuador. Ivermectin is a broadspectrum anthelmintic drug that is efficacious for the treatment of geohelminth infections, including Ascaris lumbricoides, Trichuris trichiura and Strongyloides stercoralis [41]. To evaluate the effect of ivermectin on the epidemiology of these infections, a study was conducted with 3705 children aged 6–16 from rural afroEcuadorian communities in the province of Esmeraldas, Ecuador. The children were selected from 31 communities that have been treated with ivermectin and from other 27 adjacent villages, which were matched with ivermectintreated communities by ethnicity, social and economic activities but have never received treatment [21]. This study forms part of a larger study called SCAALAEsmeraldas, which is examining the risk factors associated with differences in the prevalence of asthma and other allergic diseases in children from rural and migrant urban populations in Esmeraldas Province [42].
To evaluate the methods discussed in this paper, we analyzed data from a simple random sample of 2000 children from the original study. Here we are interested in investigating the effect of ivermectina on the prevalence of Trichuris trichiura after adjusting for children's age and gender.
Data analysis was done using STATA v.8 and R v.2.6.0 software [43].
Simulation Studies
To compare different methods for estimating confidence intervals for PR in clustered data using logistic regression with random effects, simulation studies were conducted with varying degrees of dependency, through the intraclass correlation coefficient (ICC), and levels of clustering (given by number and size of clusters). For each configuration, 1,000 samples were generated. We present the coverage probability (CP) of the Wald 95% confidence interval for the corresponding estimation method for each combination of ICC, number and cluster sizes.
The coverage probabilities (CP) represent the percentage of simulated datasets in which the corresponding confidence intervals contain the true PR. For the simulation studies conducted here, CP should be 95% to indicate that the method used for defining the confidence intervals is accurate.
We generated correlated binary outcomes through a random effects logistic model using the algorithm presented by Moineddin, Matheson and Glazier [22]. The following steps were implemented to simulate data sets:
1. Set up values for fixed parameter β (the effect of covariates on the outcome), number and size of the clusters, and ICC.
2. Generate a dichotomous independent variable X_{1j }representing an intervention for each data unit. The number of clusters was the same in each intervention group.
3. Generate a continuous independent variable X_{2ij }from a Normal(0,1) distribution.
4. Generate a normal variable, such that for given cluster j, u_{oj }~ N(0,), where u_{oj }and u_{oj' }are independent for j ≠ j'. The intraclass correlation coefficient (ICC) [22] is defined by
5. Calculate p_{ij }= E(Y_{ij }X_{1j}, X_{2ij}, u_{oj}) using a random effects logistic model, such that
6. The correlated binary outcome (Y_{ij}) for the i^{th }subject of the j^{th }cluster is generated by a Bernoulli distribution with probability p_{ij}
In the simulations, we considered 15, 30 and 100 clusters of sizes(m) 10 and 30. The ICC was defined to be 0.03, 0.29 and 0.71. The bootstrapping procedure took into account the clustering of the data. The simulation studies were implemented using R version 2.6.0 software [43].
Results
Data Analysis
SCAALASalvador Study
This analysis included data from 1087 children, aged 4 to 12 yearsold, with 81.0% being 8 yearsold or younger, 47.1% being girls and 26.7% with asthma. Among the mothers, 30.7% completed high school or college and 37.4% had probable mental health problems. For modeling, age was centered in its mean value. We estimated the effect of maternal mental health status on asthma occurrence using random effects logistic regression, considering two standardization methods and three approaches for getting the confidence intervals for PR. Results are presented in Table 1.
Table 1. Comparison of prevalence ratio (PR) estimates using random effects logistic regression: Impact of maternal mental health on child's asthma in Brazil.
The adjusted odds ratio is 1.87 (95%CI = 1.41, 2.47), which is larger than the estimated prevalence ratio (PR = 1.52–1.54, depending on the standardization procedure). When estimating PR using conditional standardization, we specified a mean age of 6.8 years, the reference groups as boys and educational level of mothers less than elementary school. Based on the results of the conditional standardization, the prevalence of asthma, if the child is a boy aged 6.8 years, with mother with low educational level and mental health problem, is about 52% greater compared to the prevalence of the same boy aged 6.8 years having asthma, with mother with low educational level and no evidence of mental problem. On the other hand, if we choose a marginal standardization we can say that the prevalence of asthma assuming that all children in the study have mothers with mental health problems is 54% larger than the prevalence of asthma assuming that no children in the study have mothers with mental health problems. Note that the 95% bootstrap confidence intervals are wider than those obtained from the delta method when considering conditional standardization.
Robust Poisson regression and robust logbinomial regression were also implemented. The results obtained using the robust Poisson model (PR = 1.54,95%CI = 1.22,1.94) were very close to those obtained from the random effects logistic regression. Convergence was not achieved using the logbinomial model.
SCAALAEcuador Study
We analyzed data from 2000 children aged 6 to 16 yearsold, of which 15.2% were aged 6–7 years, 23.5% 8–9 years, 24.3% 10–11 years, 21.7% 12–13 years, and 15.5% 14–16 years. Fiftyeight percent of the children were boys and 46.5% had received ivermectin. To evaluate the association between infection with Trichuris trichiura and ivermectin, we considered a random effects logistic model. We modelled the occurrence of infection as a function of ivermectin treatment, adjusting by gender and age. Age was centered in its mean value. The prevalence of infection was 57.9%. As expected in such scenarios, the odds ratio overestimated the effect of treatment (OR = 0.07 [95%CI = 0.05; 0.11]) compared to the prevalence ratio (PR = 0.33 [95%CI = 0.27; 0.42], using conditional standardization and delta method). The bootstrap confidence intervals based on normal theory were narrower than those obtained through delta method for the random effects logistic model in this application (Table 2).
Table 2. Estimation of prevalence ratio of Trichuris using standard and random effects logistic regression, and robust Poisson model: Effectiveness of a health program in Ecuador.
The estimated PR for T. trichiura infection using robust Poisson was 0.38 [95%CI = 0.31; 0.47]. These results indicated a reduction of approximately 62% in the prevalence of T. trichiura infection in children treated with ivermectin of the same age and gender compared to untreated children. Convergence was not achieved for analysis of this data using logbinomial models.
For these data, the intraclass correlation coefficient (ICC) was 0.415, indicating an important effect of clustering that should be considered in the analysis. Standard logistic regression models tend to underestimate the standard errors for PRs compared to random effects logistic regression models. In general, the confidence intervals obtained in the random effects logistic regression are wider than those from the standard logistic regression (Table 2).
Results of Simulation Studies
The findings of the simulation studies comparing the coverage probability (CP) of the Wald 95% confidence interval obtained through delta method and clustered bootstrap for random effects logistic model are shown in Table 3. The prevalence of disease for each of the configurations was between 55% and 60%, with a PR of 1.52.
Table 3. Coverage probability of the Wald 95% confidence interval of PR for delta method and bootstrap varying the degree of correlation, number and size of clusters.
The results suggest that the delta method outperforms the bootstrap method, especially when the number of clusters is small. For instance, considering 10 clusters of size 10, the CP's of the Wald 95% confidence interval were 94.7% and 88.3%, respectively, for delta and bootstrap methods, when ICC equals to 0.03, and 95.0% and 87.3% when ICC equals to 0.71.
Standard logistic regression (no adjustment for clustering) performed poorly, particularly when increasing the number of clusters and increasing correlation between individuals within clusters. Considering 50 clusters of size 10 the CP dropped from 93.1–82.3%, when ICC equals 0.03, to 46.3–41.8%, when ICC equals to 0.71, for delta and bootstrap methods, respectively [data not shown].
The results for the comparison of logistic and Poisson random effects models are presented in Table 4. The delta method was used to obtain the 95% confidence intervals for PR for the random effects logistic model. The logistic model generally performed better than the Poisson model. Performance of the random effects Poisson model for estimating PR declined when there was a high degree of withincluster correlation (ICC = 0.71) and with increasing number of clusters (k).
Table 4. Coverage probability of the Wald 95% confidence interval of PR using random effects logistic and Poisson model varying the degree of correlation and number of clusters of size 10.
Discussion
A major advantage of the odds ratio is that it can be estimated for all study types. However, investigators should avoid interpreting odds ratios as an approximation to prevalence ratios when the prevalence of the event of interest is high (greater than 10%). In such situations, the odds ratio generally overestimates the prevalence ratio. The importance of differences in the interpretation of the OR compared to PR/RR, particularly when prevalence is high, has been discussed by others [3,11,16].
If the adjusted prevalence ratio (PR) is the measure of interest, logistic regression is one of the approaches that can be used for its estimation [10,16]. However, the choice of standardization procedure may affect the point estimates and, most importantly, its interpretation. To our knowledge, there are few reports discussing implications of the choice of standardization for the interpretation of PR in the context of logistic regression [16]. The most recent effort to discuss this issue was done by Localio and colleagues (2007), in which the standardization procedure is linked to the question of interest. In contrast to OR, which is computed regardless of the values of other covariates, the calculation of PR using logistic regression is dependent on the fixed levels of covariates included in the model. Thus, a clear interpretation of PR depends on the definition of the reference values used on the computational procedure.
There is also no consensus about the the best way to interpret regression coefficients in the the context of random effects models. Some authors interpret the fixed regression coefficients similarly to the usual logistic regression model, conditioning on the random effects [2729]. When modeling explicitly the source of heterogeneity in the logistic regression with random effects, the fixed regression parameters should be interpreted as effects of covariates on a typical subject in the study [30,44]. Thus, as an illustration using our application regarding impact of ivermectin in the prevalence of Trichuris infection, the estimated PR using logistic model with random effects represents the ratio of the probability of a given child having Trichuris infection if he/she receives ivermectin compared to the probability that the same child having Trichuris infection if he/she does not receive treatment. In this way the PR is adjusted for unobserved individual characteristics.
Alternatively, populationaveraged estimates for the regression coefficients can be obtained using approximate formulae as suggested by Zeger and colleagues (1988), which can be interpreted in terms of the response averaged over the population [45]. In some situations, however, the subject specific interpretation is of more interest than its average effect on a population as a whole [46]. Another approach was proposed by Larsen and colleagues (2000), who discussed the interpretation of both fixed and random effects parameters in the context of logistic regression with random effects [27]. They proposed a measure for the fixed effect called median odds ratio (MOR) in order to take into account the fact that, in practice, the procedure of conditioning in the random effects is unrealistic because the random effects are unobservable.
The confidence intervals for prevalence ratio using logistic regression should be defined using appropriate approaches, such as delta and bootstrap methods. Other methods discussed in the literature, such as the substitution method [2], have been shown to have theoretical limitations leading to unsatisfactory statistical performance [10,16]. The use of delta and bootstrap methods have been discussed in the literature for situations where the observations are uncorrelated. In such cases, the performance of these methods seems to be equivalent.
Other modelbased approaches that have been commonly used to estimate PR are the Poisson and logbinomial models [3,10,1618]. The main advantage of these methods is the direct estimation of PR and its confidence intervals [47]. At the same time, both models can present estimation problems due to restrictions to avoid predicting probabilities out of interval [0,1]. When this happens, the model does not converge. There has been no consensus about the best modelbased approach for estimating PR. Barros and Hirakata (2003) suggested that more than one modeling strategy should be used to evaluate the robustness of the results. A shortcoming of this strategy is that different models imply different relationships between the outcome and covariates, even when the same covariates are included in the model. Furthermore, identification of interaction effects may differ across models.
All previous discussions about the estimation of PRs has been done in the context of independent observations. In this paper we have extended this discussion to include clustered design studies, in which the dependence between observations is taken into account. We used random effects logistic models to deal with intracluster correlation. We evaluated the performance of methods for defining confidence intervals through simulation studies with several levels of correlation between observations in the same cluster. For the scenarios considered here the delta method outperformed the clustered bootstrap method when there are data for a small number of clusters. However, for situations where size and number of clusters are large, they show equivalent performance. We also noticed a poorer performance of the Poisson model with random effects, especially with increasing level of clustering and number of clusters, and there were problems with convergence when the number of clusters was small.
Conclusion
We illustrated the estimation of prevalence ratios using data from two studies with healthrelated outcomes in children and we observed major differences between estimated PR and OR in these studies. Therefore, we highlight the importance of avoiding interpreting odds ratios as prevalence ratios in many situations, particularly when the outcome is not rare. Based on the results of the simulation studies, we recommend the use of the logistic model with random effects for analysis of clustered data when there are at least 30 clusters of size greater or equal to 10. The choice of estimation method for the calculation of confidence intervals for PRs – delta or clustered bootstrap methods – should be based on study design.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
Studies concept and design: MLB, LR, PJC. Analysis and interpretation: MBBC, ALM, SC, CAT, LDA. Development and implementation of statistical methodology: CAT, RLF, NFO, LDA. Simulation Studies: CAT, RLF, NFO, LDA. Drafting of the manuscript: CAT, RLF, NFO, LDA. All authors critically revised and approved the final manuscript.
Acknowledgements
The SCAALA Study is funded by The Wellcome Trust, UK, HCPC Latin America Excellence Centre Programme, Ref 072405/Z/03/Z.
References

Lui KJ: Statistical Estimation of Epidemiological Risk. John Wiley & Sons Ltd; 2004.

Zhang J, Kai YF: What's the Relative Risk? A Method of Correcting the Odds Ratio in Cohort Studies of Common Outcomes.
Journal of American Medical Association 1998, 280:16901691. Publisher Full Text

Barros AJD, Hirakata VN: Alternatives for logistic regression in crosssectional studies: an empirical comparison of models that directly estimate the prevalence ratio.
BMC Medical Research Methodology 2003, 3:2133. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Greenland S: Interpretation and choice of effect measures in epidemiologic analysis.
American Journal of Epidemiology 1987, 125:761768. PubMed Abstract  Publisher Full Text

Lee J, Chia KS: Estimation of prevalence rate ratios for crosssectional data: an example in ocuppational epidemiology.
Br J Ind Med 1993, 50(9):861862. PubMed Abstract  PubMed Central Full Text

Lee J: Odds ratio or relative risks for crosssectional data?
International Journal of Epidemiology 1994, 23:201203. PubMed Abstract  Publisher Full Text

Stromberg U: Prevalence odds ratio versus prevalence ratio.
Occupational Environmental Medicine 1994, 51:143144. Publisher Full Text

Axelson O, Fredri M, Ekberg K: Use of prevalence ratio versus the prevalence odds ratio as a measure of risk in crosssectional studies.
Occupational Environmental Medicine 1994, 51:574. Publisher Full Text

Pearce N: Effect Measures in Prevalence Studies.
Environmental Health Perspectives 2004, 112:10471050. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Localio AR, Margolis DJ, Berlin JA: Relative risks and confidence intervals were easily computed indirectly from multivariate logistic regression.
Journal of Clinical Epidemiology 2007, 60:874882. PubMed Abstract  Publisher Full Text

Newcombe RG: A deficiency of the odds ratio as a measure of effect size.
Statistics in Medicine 2006, 25:42354240. PubMed Abstract  Publisher Full Text

Miettinen O: Estimability and estimation in casereferent studies.
American Journal of Epidemiology 1976, 103:226235. PubMed Abstract  Publisher Full Text

Greenland S, Thomas DC: On the need for rare disease assumption in case control in casecontrol studies.
American Journal of Epidemiology 1982, 116:547553. PubMed Abstract  Publisher Full Text

Rodrigues L, Kirkwood BR: Casecontrol designs in the study of common diseases: updates on the demise of the rare disease assumption and the choice of sampling scheme for controls.
International Journal of Epidemiology 1990, 19:205213. PubMed Abstract  Publisher Full Text

Zochetti C, Consonni D, Bertazzi PA: Relationship between prevalence rate ratios in crosssectional studies.
International Journal of Epidemiology 1997, 26:220223. PubMed Abstract  Publisher Full Text

Greenland S: Modelbased Estimation of Relative Risks and Other Epidemiologic Measures in Studies of Common Outcomes and in CaseControl Studies.
American Journal of Epidemiology 2004, 160:301305. PubMed Abstract  Publisher Full Text

Blizard L, Hosmer DW: Parameter Estimates and GoodnessofFit in Log Binomial Regression.
Biometrical Journal 2006, 48:522. PubMed Abstract  Publisher Full Text

Wacholder S: Binomial Regression in GLIM: Estimating Risk Ratios and Risk Differences.
American Journal of Epidemiology 1986, 123(1):174184. PubMed Abstract  Publisher Full Text

Amorim LD, Bangdiwala SI, McMurray RG, Creighton D, Harrell J: Intraclass correlations among physiologic measures in children and adolescents.
Nurs Res 2007, 56(5):355360. PubMed Abstract  Publisher Full Text

Barreto ML, Cunha SS, AlcântaraNeves N, Carvalho LP, Cruz AA, Stein RT, Gensen B, Cooper PJ, Rodrigues LC: Risk factors and immunological pathways for asthma and other allergic diseases in children: background and methodology of a longitudinal study in a large urban center in Northeastern Brazil (SalvadorSCAALA study).
BMC Pulmonary Medicine 2006, 6:1525. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Moncayo AL, Vaca MG, Amorim L, Rodriguez A, Erazo S, Oviedo G, Quinzo I, Padilla M, Chico M, Lovato R, Gomez E, Barreto ML, Cooper PJ: Impact of longterm treatment with ivermectin on the prevalence and intensity of soiltransmitted helminth infections.
PLoS Negleted Tropical Disease 2:e293. Publisher Full Text

Moineddin R, Matheson FI, Glazier RH: A simulation study of sample size for multilevel logistic regression models.
BMC Medical Research Methodology 2007, 7:3443. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Wilcosky TC, Chambless LE: A comparison of direct adjustment and regression adjustment of epidemiologic measures.
J Chron Dis 1985, 34:849856. Publisher Full Text

Lane PW, Nelder JA: Analysis of covariance and standardization as instances of prediction.
Biometrics 1982, 38:613621. PubMed Abstract  Publisher Full Text

Flanders WD, Rhodes PH: Large sample confidence intervals for regression standardized risks, risk ratios and risk differences.
J Chron Dis 1987, 40:697704. PubMed Abstract  Publisher Full Text

Diggle PJ, Liang KY, Zeger SL: Analysis of Longitudinal Data. New York: Oxford University Press; 1994.

Larsen K, Petersen JH, BudtzJørgensen E, Endahl L: Interpreting Parameters in the Logistic Regression Model with Random Effects.
Biometrics 2000, 56:909914. PubMed Abstract  Publisher Full Text

McCulloch CE, Searle SR: Generalized, Linear, and Mixed Models. New York: John Wiley & Sons Inc; 2001.

Urbach DR, Austin PC: Conventional models overestimate the statistical significance of volumeoutcome associations, compared with multilevel models.
Journal of Clinical Epidemiology 2005, 58:391400. PubMed Abstract  Publisher Full Text

Hardin JW, Hilbe JM: Generalized Estimating Equations. Boca Raton: Chapman & Hall/CRC; 2003.

Bishop YMM, Fienberg SE, Holland PW: Discrete Multivariate Analysis: Theory and Practice. Cambridge, Mass: MIT Press; 1975.

Efron B, Tibshirani RJ: An Introduction to the Bootstrap. New York: Chapman & Hall; 1993.

Fox J: An R and SPlus Companion to Applied Regression. New York: Chapman & Hall; 2002.

Field CA, Welsh AH: Bootstrapping clustered data.
Journal of Royal Statistical Society B 2007, 69:369390. Publisher Full Text

ISAAC steering committee: Worldwide variation in prevalence of symptoms of asthma, allergic rhinoconjunctivitis, and atopic eczema: ISAAC.
Lancet 1998, 351:12251232. PubMed Abstract  Publisher Full Text

Ortega AN, Goodwin RD, McQuaid EL, Canino G: Parental mental health, childhood psychiatric disorders, and asthma attacks in island Puerto Rican youth.
Ambul Pediatr 2004, 4:308315. PubMed Abstract  Publisher Full Text

Weil CM, Wade SL, Bauman LJ, Lynn H, Mitchell H, Lavigne J: The relationship between psychosocial factors and asthma morbidity in innercity children with asthma.
Pediatrics 1999, 104:12741280. PubMed Abstract  Publisher Full Text

Carmo MBB, Santos DN, Amorim LDAF, Fiaccone RL, Cunha SS, Rodrigues LC, Barreto ML: Minor psychiatric disorders in mothers and asthma in children.
Soc Psychiatry Psychiatr Epidemiol 2008.
Epub ahead of print

Mari JJ, Williams P: A validity study of a psychiatric screening questionnaire (SRQ 20) in primary care in the city of São Paulo.
Br J Psychiatry 1986, 148:236. PubMed Abstract  Publisher Full Text

Brooker S, Clements AC, Bundy DA: Global epidemiology, ecology and control of soiltransmitted helminth infections.
Adv Parasitology 2006, 62:221261. Publisher Full Text

Ranque S, Chippaux JP, Garcia A, Boussinesq M: Followup of Ascaris lumbricoides and Trichuris trichiura infections in children living in a community treated with ivermectin at 3monthly intervals.
Annals of Tropical Medicine Parasitology 2001, 95:38993. Publisher Full Text

Cooper PJ, Chico ME, Vaca MG, Rodriguez A, AlcantaraNeves NM, Genser B, Carvalho LP, Stein RT, Cruz AA, Rodrigues LC, Barreto ML: Risk factors for asthma and allergy associated with urban migration: background and methodology of a crosssectional study in AfroEcuadorian school children in Northeastern Ecuador (EsmeraldasSCAALA Study).
BMC Pulmonary Medicine 2006, 6:2445. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

R Development Core Team: [http://www.Rproject.org/] webcite
R: a Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2004.

Andreozzi VL, Bailey TC, Nobre FF, Struchiner DJ, Barreto ML, Assis AMO, Santos LMP: RandomEffects Models in Investigating the Effect of Vitamin A in Childhood Diarrhea.
Annals of Epidemiology 2006, 16:241247. PubMed Abstract  Publisher Full Text

Zeger SL, Liang KY, Albert PS: Models for longitudinal data: A generalized estimating equation approach.
Biometrics 1988, 44:10491060. PubMed Abstract  Publisher Full Text

Lindsey JK, Lambert P: On the appropriateness of marginal models for repeated measurements in clinical trials.
Statistics in Medicine 1998, 17:447469. PubMed Abstract  Publisher Full Text

Petersen MR, Deddens JA: A comparison of two methods for estimating prevalence ratios.
BMC Medical Research Methodology 2008, 8:918. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text
Prepublication history
The prepublication history for this paper can be accessed here: