Abstract
Data integration is becoming a crucial task in National Statistical Institutes in order to exploit the information provided by already existing data sources. Here the focus is on statistical matching methods; they are designed to integrate data stemming out from traditional sample surveys referred to the same target population. In particular, this work shows how popular statistical learning techniques can be beneficial for matching purposes. Two proposals are presented, having a different final scope: the creation of a “fused” data set or the assessment of the uncertainty due to the typical statistical matching scenario. The characteristics of these procedures are investigated through a series of simulations and in an application to real survey data. The achieved results are encouraging and show that some statistical learning techniques can be very effective in exploiting the information provided by already existing survey data, permitting a reduction of the uncertainty determined by the typical statistical matching setting.
Keywords
Get full access to this article
View all access options for this article.
