Abstract
Obtaining reliable and timely estimates of migration flows is critical for advancing the migration theory and guiding policy decisions, but it remains a challenge. Digital data provide granular information on time and space, but do not draw from representative samples of the population, leading to biased estimates. We propose a method for combining digital data and official statistics by using the official statistics to model the spatial and temporal dependence structure of the biases of digital data. We use simulations to demonstrate the validity of the model, then empirically illustrate our approach by combining geo-located Twitter data with data from the American Community Survey (ACS) to estimate state-level out-migration probabilities in the United States. We show that our model, which combines unbiased and biased data, produces predictions that are more accurate than predictions based solely on unbiased data. Our approach demonstrates how digital data can be used to complement, rather than replace, official statistics.
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
