Abstract
The aim of this paper is to compare design-based with model-based methods for analyzing complex survey data. The analysis of survey data collected using a multi-stage sampling design should account for stratification, clustering and unequal inclusion probabilities. We compared the Rao-Wu bootstrap and Taylor linearization (design-based approaches) with logistic regression analysis based on generalized Estimating Equations (GEE) approach (a model-based method). The design and model based approaches were applied and compared using Wave 5 (2002–2003) of the National Population Health Survey (NPHS) dataset. NPHS based on an initial stratified multi-stage design is a continuing longitudinal study under which data is collected on general health information of the Canadian population. Logistic regression was used, as the variable of interest for this study was binary, namely self-reported physician diagnosed asthma. When the three features of the complex survey design were not overlooked standard errors obtained were underestimated. However, accounting for all three features of survey design, the design-based and model-based methods produced similar parameter estimates, while larger standard errors were obtained for design-based methods than for their model-based counterpart.
Keywords
Get full access to this article
View all access options for this article.
