Statistical microdata – confidentiality protection vs freedom of information

Abstract

The paper discusses how a statistical office could strike a satisfactory balance between confidentiality protection and freedom of information. Flexible use of statistical data is of vital interest for researchers and for the democratic process. On the other hand, the willingness of respondents to provide data is dependent on the ability of the statistical office to guarantee their anonymity.

The paper argues that a combination of measures of different kinds is needed: legal, administrative, methodological, and technical. As long as statistical data are at all collected and statistical results are published, the risks of inadvertent disclosures of information about identifiable individuals (persons or enterprises) cannot be completely eliminated. On the other hand, the motivation to spend a lot of efforts to break through protection measures is usually low, especially if such efforts are regarded as criminal and can be punished. Moreover, there are often easier ways to find out sensitive information about individuals than by means of malicious processing of statistical data.

The paper presents two new ideas that are being launched and discussed in Sweden right now: (i) the idea of transforming commonly known identifiers (of persons and other objects) into pseudoidentifiers by means of a table or an algorithm that is known only by the statistical office; (ii) the idea of a statistical firewall, which filters the queries from users of statistical data as well as the statistical outputs resulting from these queries, thus monitoring the traffic between external users and internal databases containing sensitive statistical microdata. It is discussed in the paper how these two ideas can be used in practice, increasing legitimate usage and improving confidentiality protection at the same time.

Get full access to this article

View all access options for this article.