Abstract

1. Foreword
This article reviews the forty-year journey of the Journal of Official Statistics as a front-line international scientific and professional publication in the field of official statistics. Lars Lyberg, founding editor of the journal and a distinguished statistician, was in charge for the first twenty-five years. I appreciate the opportunity to contribute in honor of the fortieth anniversary of the JOS.
2. Publication Policy
The Journal of Official Statistics was established to offer more versatile content than other journals in the field of official statistics. As stated in the Editorial Statement of the inaugural issue of JOS from 1985: “Problems connected with various steps of preparation and use of official statistics are numerous and varied, and thus methodology and policy should be understood in a fairly wide sense.” (Editorial Statement 1985, 3–4). The journal should address not only sound methods from many disciplines (statistics, economics, computer science, social sciences, etc.) but also policy matters faced by statistical offices. As the great visionary, Editor-in-Chief Lars Lyberg and his team summarized: “Therefore the new Journal of Official Statistics will cover wider methodological areas than most other statistical journals.”
Other statistical offices and statistical societies have also published, or considered publishing, journals with a similar focus on government statistics as the Journal of Official Statistics. A progressive exchange on this topic between Lars Lyberg and Leslie Kish, then President of the International Association of Survey Statisticians, was featured in the Letters to the Editor section of the opening issue of JOS in 1985.
It turned out that there was room and international support for a journal with a broad scope. In the 1995 anniversary issue, Pat Dean and Lars Lyberg had to state that “Do we see any strong trends in the types of articles JOS publishes? No, JOS manuscripts seem to be so varied in topic and field that they defy a uniform classification.” The wide range of topics and fields relevant for official statistics seems to have continued in the journal.
Referring to developments in society challenging statistical agencies and organizations, Lars Lyberg wrote in 1993: “These new developments have touched a broad spectrum of operations and issues, including the areas of: dissemination and marketing of statistical products, new policies regarding funding of statistical programs, increased demands on confidentiality safe-guards and privacy protection, the advent of new technology, development of new statistical programs, harmonizing statistical programs in Europe, and increased interest in quality and productivity.” (Lyberg 1993).
The statistical landscape was also in turbulence: model-based and Bayesian approaches challenged the long-dominant design-based inference in official statistics. Improving access to auxiliary information, along with investment in data infrastructures and production systems, became increasingly important. Together, these developments contributed to the publication policies and topics of articles published in JOS.
The editorial board specified the publication policy of JOS in 2010 as follows: “We aim to maintain the journal’s strong focus on methodological and other questions in the production of official statistics, and on survey methodology in general. At the same time we aim to boost even further the role of JOS as a forum where high-quality research meets advanced application.” (Jansson and Lorenc 2010). Research on integration of statistics production processes and new data sources were recommended as important topics for publication. In 2015, submissions of theoretical and applied articles on statistical production methods in the fields of official statistics, policy making, and economic and social sciences were encouraged.
3. Methodology
Methodologies discussed in JOS throughout its history relate to quality and survey error, survey design and standardization of survey operations, register data, privacy protection and disclosure control, and approaches for finite population inference and analysis. I refer to these areas as: (1) Quality, Survey Design, and Survey Error; (2) Privacy, Confidentiality, and Disclosure Control; and (3) Survey Analysis and Inference. These three broad areas form the organization of the materials into chapters in this article.
The area of survey quality and methodology discusses sample and census survey design, multi-source data use, questionnaire design and testing, modes of data collection, interviewer training, and data editing. Among the specific types of survey error, attention has been given to methodologies for dealing with nonresponse and measurement, processing, and coverage errors, as well as their interactions. This area also encompasses quality assessment throughout the survey lifecycle.
The area of respondent privacy and disclosure control concentrates on methods for data swapping, perturbation, cell suppression, statistical approaches, multiple imputation, and the integration of data collection with the estimation phase.
The area of survey statistics covers the inferential and statistical theories and methods and computational techniques of the survey process. Topics include sampling methods, data integration, modeling, survey inference, model-based and design-based approaches, variance estimation, complex survey analysis, time series analysis, and statistical software. Over time, JOS articles have illustrated the interplay between these three areas, reflecting the evolving complexity and integration of survey practices.
Research articles in JOS have been published as peer-reviewed or invited contributions, with some appearing in special issues or thematic sections with discussion. Special issues may be linked to scientific conferences. The journal also features articles from the internationally recognized Morris Hansen Lecture series of the Washington Statistical Society. For the review, I have selected a collection of articles that aim to cover the temporal and subject-specific dimensions sufficiently well.
The Journal of Official Statistics serves as a successor to Statistisk Tidskrift (Statistical Review), a publication of Statistiska centralbyrån (Statistics Sweden) that was discontinued in 1984. The scholarly continuity between these journals is represented, for instance, by Tore Dalenius, who wrote on confidentiality and statistical disclosure control, nonresponse, total survey design, and total survey error. Another diligent contributor in both journals is Ib Thomsen, known for his work in survey methodology, nonresponse treatment, and the integration of administrative data in the production of official statistics (see the reference list).
4. Quality, Survey Design, and Survey Error
The period from 1960 to 1990 is often referred to as the “Era of Expansion” of survey methods (Groves 2011) and sometimes as the “Golden Era” of survey research. Surveys were scientifically designed, employing highly standardized methods that yielded high-quality data at reasonable costs and with low nonresponse rates (Chun et al. 2018). Foundational textbooks on survey sampling and methodology were published, and statistical software for data processing became available on mainframe computers. Journals addressing challenges in the production of official statistics also emerged, with the Journal of Official Statistics among those at the forefront.
4.1. Special Issue of 1987 on Nonsampling Errors
On the other side of the coin, nonsampling errors in surveys began to draw significant attention. How were nonsampling errors defined? According to the distinguished statistician B. A. Bailar (1987), nonsampling errors include all errors except those resulting from using a sample rather than the entire population. She highlighted the unsystematic way in which nonsampling errors were studied and noted that interactions between different types of errors were seldom considered.
The special issue included also an article by Thomas Jabine, who referred to the U.S. Census Bureau survey model (Hansen et al. 1959) as a framework for addressing the components of nonsampling error jointly. Referring to Dalenius (1974a), he noted: “When sampling errors are included in this process, we speak of total survey design, a concept first introduced by Dalenius (1974).” (Jabine 1987, 335).
Models for controlling nonsampling errors, such as Process Quality Control Systems, were discussed by Tortora (1987). Trewin (1987) considered so-called “irregular factors,” which refer to errors arising from socio-economic shocks and may persist even under ideal measurement conditions. Tanur (1987, 473) asked: “How can survey researchers and cognitive psychologists collaborate?” and proposed the establishment of cognitive testing laboratories as a response to measurement error. Further developments in questionnaire design were presented in Press and Tanur (2004).
The use of multiple imputation techniques to address item nonresponse in data from the 1970 and 1980 United States Decennial Censuses was described by Rubin and Schenker (1987) within the framework of Bayesian inference.
Dalenius (1985) introduced relevance as a key conceptual component of quality and, in the 1987 special issue, discussed how to measure it and combine it with sufficient accuracy to ensure statistical usefulness and reliability (Dalenius 1987). Brackstone (1993, 49) proposed practical mechanisms and prerequisites for maintaining relevance, drawing on experience at Statistics Canada.
The focus in the 1980s and 1990s was often on individual components of nonsampling error. Nonresponse and measurement error within a CATI framework were addressed by Groves and Nicholls (1986) and Nicholls and Groves (1986). Weighting and imputation techniques for handling attrition in panel surveys were examined by Kalton (1986), while the effects of data collection mode on measurement error were studied using misclassification models by Nathan and Eliav (1988).
Computational techniques for treating nonresponse in both the data collection and estimation phases were proposed by Bethlehem and Kersten (1985). The weighting method was further developed by Bethlehem and Keller (1987) and Bethlehem (1988). The analysis of complex surveys by incorporating sources of random variation defined by superpopulation, sampling, and response models was explored by Nordberg (1989). Ekholm and Laaksonen (1991) modeled response probabilities in a Household Budget Survey using logistic modeling with unit-level tax and education register data.
4.2. Can a Statistician Deliver?
Competing approaches to statistical theory have shaped the discourse on survey quality. A notable intervention in this debate is Platek and Särndal (2001), which has comments from seventeen discussants. The title of the article, “Can a Statistician Deliver?” hints at its polemical stance. I have chosen here some of the disputable points.
Richard Platek and Carl-Erik Särndal examined survey quality from statistical, methodological, and contextual perspectives. Drawing on the definition by Elvers and Rosén (1997) in the Encyclopedia of Statistical Sciences, they conceptualized quality through five dimensions: (1) Contents, (2) Accuracy, (3) Timeliness, (4) Coherence, particularly Comparability, and (5) Availability and Clarity, each with multiple sub-dimensions. A quarter-century later, the European Statistical System Handbook outlined a related set of output principles, but the concept of relevance of Dalenius (1985) was used instead of the term contents.
Platek and Särndal presented complex questions without definitive answers. For instance, they asked, “Is Survey Theory and Practice in a Crisis?”, a question linked to their description of survey methodology as “a collection of practices, backed by some theory and empirical evaluation.” They also questioned the feasibility of creating a fully functional total survey error model, as reflected in their query, “Creating a Functional Total Survey Error Model: An Impossible Dream?” (Platek and Särndal 2001, 14, 16).
I have collected below a summary of the stimulating arguments presented in the conclusions section (Platek and Särndal 2001, 18, 19): Survey methodology is a set of practices lacking a unifying theory that covers the entire process; A complete theory is not yet in sight; Variance is overemphasized and too little is known about different types of bias; Total survey error modeling is an admirable concept for the systematic evaluation of total accuracy, a crucial dimension of quality, but statisticians have not delivered on its promise; Progress has largely occurred only in the study of specific error components, including several types of nonsampling errors.
Was there a widely accepted theoretical foundation for survey theory and practice around the table? Perspectives varied among the discussants coming from academia, national statistical offices, survey agencies, and research institutes. Let me consider a selection of points. In their comment, Alain Desrosières, Jean-Claude Deville, and Olivier Sautory of INSEE, France, referred to the total survey modeling approach and noted that “timeliness seems to be measurable, but not availability. On the contrary, accuracy (which is the major topic discussed by the authors), contents (or better still, relevance) and coherence/comparability are very related and interwoven. Among those criteria, only accuracy seems to be measurable, essentially with the use of probabilistic tools” (Desrosières et al. 2001, 33).
Additional points were given by Paul Biemer, who emphasized the slow progress of statisticians in developing the Total Survey Error (TSE) model initiated by Morris Hansen and his colleagues since the 1960s. He concluded with calling for a revised total error concept for surveys (Biemer 2001, 25, 31). Groves and Mathiowetz (2001, 53) highlighted the potential of behavioral and cognitive theories in understanding response error but noted that these theories have not yet been translated into models of statistical error. David Holt offered explanations for the shortcomings of the TSE model while also recognizing its achievements. He also considered it important to pay greater attention to the quality of administrative sources (D. Holt 2001, 59).
Barbara Bailar addressed the conclusion of the authors that “Total survey error does not yet provide a systematic evaluation of total accuracy” by asserting: “However, there has been admirable and substantial progress at looking at many of the specific error components,” and, “We know a lot more than we did 50 or even 20 years ago. The glass is filling steadily and is more than half full.” (B. Bailar 2001, 24).
Was the glass more than half full? Many fundamental issues in nonsampling error treatment persist, one of which is handling of nonsampling error components jointly. Deming (1944) identified thirteen separate components, including mode effect, interviewer effect, nonresponse, and processing errors. Dalenius (1962) developed the theory and methods for specific sources of nonsampling errors and a theory of nonsampling errors, or of mixed models, as was termed at that time. In his 1969 article, “Towards a Survey Measurement System,” Dalenius called for a measurement system capable of integrating errors from the various stages of the survey process. A more comprehensive survey design was proposed in Dalenius (1974a), published by Stockholm University, and in the JOS article of Dalenius (1987). Forsman (1989) examined early survey models and their use in survey quality work.
More recently, the 2017 JOS special issue on Total Survey Error (Eckman and de Leeuw 2017) emphasized a temporal shift from the separate investigation of error sources to the simultaneous analysis of two or more sources within the TSE model. The Wiley book on TSE in practice by Biemer et al. (2017) as the editors, provides a comprehensive source in the field.
Returning to the fascinating metaphor of Barbara Bailar, one optimistic perspective is that the glass itself is now larger than in the past, but the filling level has remained unchanged.
4.3. The Morris Hansen Lecture of Paul Biemer
Paul Biemer published his Morris Hansen lecture in JOS titled “The Twelfth Morris Hansen Lecture. Simple Response Variance: Then and Now” (Biemer 2004). Robert Groves and Keith Rust served as discussants. The lecture and the subsequent commentaries covered several important topics, a selection of which is presented here.
Biemer begins by revisiting the measurement error model proposed by Hansen et al. (1964), focusing on the concept of simple response variance in measurement error modeling. He discusses how the model extends the classical theory of finite population sampling to incorporate both randomization theory and nonsampling error theory. Biemer (2004, 417) compares this classical approach with more modern methods, such as latent class analysis, used to estimate error parameters, and provides numerous illustrations and practical examples.
Groves (2004, 442) emphasized the challenges of simple response variance modeling and highlighted the limited application of such models in survey inference. Rust (2004, 445) emphasized the caution of Biemer against oversimplifying response error by relying too much on the Index of Inconsistency of Hansen, Hurwitz, and Pritzker. Rust also noted the risk of focusing excessively on response error as a source of variance, potentially overlooking its role as a source of bias.
Of historical interest are the Sankhyā paper by Bailar and Dalenius (1969), which examined correlated response variance components and proposed designs for estimating such components in practice, and Thomsen (1973), who in Statistisk Tidskrift discussed post-stratification for nonresponse bias adjustment in surveys.
In his subsequent work, Biemer et al. (2014) developed a general framework for continuous improvement of key statistical products, with empirical results presented for Statistics Sweden. His publication history in JOS spans from his first article in 1989 to his most recent in 2018.
Regarding nonresponse bias, Pfeffermann and Sikov (2011) proposed a method for imputation and estimation under nonignorable nonresponse in household surveys with missing covariate data. The method was applied to real data from the 2005 Household Expenditure Survey conducted by the Israel Central Bureau of Statistics.
Lundström and Särndal (1999) explored the use of calibration to reduce nonresponse bias and sampling error in surveys where reliable auxiliary information is available. Särndal and Lundström (2008) later proposed a simple and unified approach to incorporating auxiliary data into calibration estimation to reduce both sampling error and nonresponse bias. Applications involved data from a Swedish National Crime Victim and Security Study of 2006. The approach was extended by Montanari and Ranalli (2012) through the use of implicit semiparametric regression.
The Wiley book by Särndal and Lundström (2005) on estimation in surveys with nonresponse presents a comprehensive source in this field.
4.4. The Morris Hansen Lecture of Carl-Erik Särndal
A methodological innovation for the combined treatment of nonresponse bias in data collection and in estimation was presented by Carl-Erik Särndal in his Morris Hansen Lecture “Dealing with Survey Nonresponse in Data Collection, in Estimation,” published in JOS in 2011. Särndal introduced the concept of a balanced response set as an extension of the responsive design approach of Groves and Heeringa (2006) presented in the Journal of the Royal Statistical Society. Michael Brick and Roger Tourangeau were the discussants.
By Särndal (2011, 1), nonresponse adjustment was aided by a bias indicator, a product of three factors involving selected powerful auxiliary variables. He pinpoints the fact that a well-balanced response set does not eliminate the need to seek effective adjustment at the estimation stage, and that efficient adjustment requires good auxiliary information. For countries having had this option, Särndal (2011, 5) refers to the Scandinavian countries and the Netherlands.
The method of Särndal required interconnected technical steps, which were developed in detail, and their statistical properties were studied theoretically. A stepwise procedure was presented for constructing the multivariate auxiliary vector for nonresponse treatment of all study variables in the survey and was illustrated using empirical data from Statistics Sweden.
The simultaneous examination of sample design, data collection, and estimation can be taken to represent a significant extension of the traditional way of treating the three survey phases separately. One of the cornerstones of the approach of Särndal is the interaction between data collection from the sample elements and calibration estimation for the respondent set during the survey process, given a probability sample and a fixed set of auxiliary variables.
In their extensive comments, Brick (2011) and Tourangeau (2011) highlighted the theoretical significance of the approach of Särndal to survey theory and methodology. In addition, the need for robust auxiliary data for the entire procedure was emphasized as one key aspect of the method.
In further work on this topic, Särndal and Lundquist (2017) investigated the relationship between nonresponse bias and regression inconsistency and demonstrated how both are influenced by response imbalance discussed in Särndal (2011). Their work provided valuable insights into how deviations from consistent regression models can contribute to bias in survey estimates, particularly when response patterns are uneven across different segments of the population.
Responsive and adaptive designs to optimize costs and survey quality were developed in Chun et al. (2018). The approach had four pillars: survey process data and auxiliary data, design features and interventions, explicit quality and cost metrics, and quality-cost optimization (Chun et al. 2018, 583). The core is in methods having a well-established statistical background for addressing multiple quality aspects simultaneously, rather than studying them in isolation. This view has been elaborated in the literature since Tore Dalenius JOS article of 1985.
4.5. On Combined Data Sources in Official Statistics
Population registers and other administrative sources have long served as sampling frames and sources of auxiliary data in the production of official statistics. Continuously updated register-based data infrastructures have been constructed in many countries. Administrative registers provide important alternatives to directly collected census and sample surveys and have been discussed over the years in the Journal of Official Statistics. The Wiley book by Wallgren and Wallgren (2007) provided a comprehensive source on register-based statistics in official statistical production.
Central Population Registers were established in the Nordic countries in the 1960s, and register-based population censuses were introduced in Statistics Denmark in 1981 and Statistics Finland in 1990. Redfern (1986) discussed the early developments in his article titled “Which Countries Will Follow the Scandinavian Lead in Taking a Register Based Census of Population?” Experiences of Statistics Finland in building a data infrastructure based on administrative registers were presented by P. Myrskylä (1991). Statistical registers are updated regularly and can be integrated with various public administration register sources using unique identification keys.
Case studies include Houbiers (2004), who presented methods of Statistics Netherlands to build a social statistics database, in which several registers are via a unique key linked to each other, as well as to data from sample surveys (Houbiers 2004, 55). Thomsen and Villund (2011) presented an approach where sample data from the Norwegian Labour Force Survey are merged with register-based employment data over a time series from 1997 to 2008 to study the impact of proxy interviews on the quality of employment rate estimates. It was shown in earlier work that because of register delays, the register data did not affect the accuracy of change estimates (Thomsen and Zhang 2001). Djerf (1997) presented a framework where records from the Register of Job Seekers and Register-Based Employment Statistics are uniquely merged with records from the Finnish Labor Force Survey to reduce nonresponse bias and sampling error using calibration and generalized raking estimation.
Discussion on coverage errors in censuses and administrative registers include Citro and Pratt (1986), who considered the recommendations of the Panel on Decennial Census Methodology for the U.S. bicentennial census in 1990. The main topics were the adjustment of census counts for coverage errors, methods of coverage evaluation, the use of sampling in obtaining the count and the use of administrative records to improve the quality of selected content items (Citro and Pratt 1986, 359). Further, potential uses of administrative records were studied by Griffin (2014) for triple-system modeling to estimate census coverage error in the U.S. 2020 census post-enumeration survey. Righi et al. (2021) described a population coverage survey for the new Italian register-based census of 2018.
A 2015 special issue focused on coverage problems in administrative registers. In the preface by Bakker et al. (2015), it was pinpointed that the assumptions underlying the dual-system capture-recapture methodology for correcting undercoverage in censuses has received considerable attention. A Bayesian approach to deriving estimates from multiple administrative sources was described by Bryant and Graham (2015) and applied to the estimation of regional populations in New Zealand. In his article “On Modelling Register Coverage Errors,”Zhang (2015) discussed possible approaches to modeling capture-recapture data with additional overcoverage error using various types of log-linear models.
In further work, a robust method for capture-recapture was presented in Chipperfield et al. (2024) to adjust for census coverage errors with administrative data. They studied the method empirically in the context of the Australian population. A Bayesian approach was proposed by Ballerini et al. (2025) to correct coverage errors in register-based statistics by combining administrative and survey data. Their reference was the setting of the Italian Permanent Census.
5. Privacy, Confidentiality, and Disclosure Control
Confidentiality protection and disclosure control are key aspects in official statistics and have been discussed in the Journal of Official Statistics as well as its predecessor Statistisk Tidskrift/Statistical Review, where Tore Dalenius published two articles: “The Invasion of Privacy Problems and Statistics Production – An Overview” (Dalenius 1974b), and the widely cited “Towards a Methodology for Statistical Disclosure Control” (Dalenius 1977). He proposed a definition and theoretical framework for statistical disclosure control and warned against inferential disclosure.
A new data swapping method for statistical disclosure protection in confidential databases was published by Dalenius and Reiss (1982) in the Journal of Statistical Planning and Inference. In their JOS article of 2005, Fienberg and McIntyre (2005) discussed the innovative idea of data swapping and its role in recent approaches of statistical disclosure limitation and the release of statistically usable databases. According to Fienberg and McIntyre (2005, 310), “Dalenius and Reiss were the first to cast disclosure limitation firmly as a statistical problem.”
In a JOS article titled “Finding a Needle in a Haystack or Identifying Anonymous Census Records,”Dalenius (1986) examined two variants of a simple method for identifying unique records, both of which involved sorting the records. He introduced a technical framework for handling records containing unique data and proposed data perturbation and data encryption as alternatives for this purpose. Dalenius (1995) revisited data confidentiality in longitudinal surveys in “Controlling Invasion of Privacy in Surveys of Change Over Time – A Non-Technical Review.”
5.1. Special Issues of 1993 and 1998 on Disclosure Protection
The literature on confidentiality protection and disclosure control expanded from the 1990s onward, highlighting the increasing needs and diversity of approaches. Significant advancements in statistical disclosure limitation methods and the analysis of manipulated data were discussed in the special issues of 1993 and 1998.
In his commentary in the 1993 special issue on disclosure protection, Dalenius (1993) expressed concern about the rising nonresponse rates in surveys and the increasing difficulties social researchers faced in gaining access to data collected by official statistical agencies. He proposed an approach that would enable social researchers to access the data they require.
Fuller (1993) proposed masking methods in which error was added to the units of a data set prior to its release, and examined the costs incurred by data providers and users of the masked data. Little (1993) presented a model-based likelihood theory and methodology for analyzing data masked for confidentiality purposes. Rubin (1993) suggested an approach to data masking in which only synthetic microdata, generated through multiple imputation, was released, allowing the use of standard statistical software for analysis.
An overview of the policies and practices of National Statistical Offices regarding the release of microdata to external users was presented in Citteur and Willenborg (1993). They noted that there was considerable variation among statistical offices in their approaches to releasing microdata for public use (Citteur and Willenborg 1993, 784). Sundgren (1993) emphasized computer security as a crucial component of data confidentiality within the data storage and processing systems. Beyond the practical measures related to data security, confidentiality, and access arrangements, he also addressed many theoretical and technical issues.
After an extensive introduction by Fienberg and Willenborg (1998), the special issue of 1998 focused on the progress in the field of statistical disclosure limitation. Fienberg et al. (1998) presented new methods for categorical data, including cell suppression and data switching in the context of log-linear modeling and simulation of exact distributions. The use of simulated data in the context of the perturbation approach was described in their earlier work. A conceptual framework for measuring re-identification risk per record was introduced in Skinner and Holmes (1998) and applied to data from the 1991 Census in Great Britain.
The case studies included articles on automated cell suppression for economic statistics, applied to the Manufacturing Energy Consumption Survey of the U.S. Energy Information Administration (Kirkendall and Sande 1998), experiments with controlled rounding for statistical disclosure limitation in tabular data (Fischetti and Salazar-González 1998), and microdata masking using micro-aggregation, applied to data from a survey on technological innovation in Europe (Defays and Anwar 1998).
For statistical disclosure control of microdata, De Waal and Willenborg (1997) discussed global suppression and optimal local suppression techniques, both applied to data from the Dutch Labour Force Survey. A study on post-randomization for statistical disclosure control was presented in Gouweleeuw et al. (1998), covering both the theory and implementation of the method.
5.2. Further Developments of Disclosure Protection Methods
Methodologies involving synthetic data in statistical disclosure limitation include Raghunathan et al. (2003), where multiple imputation was evaluated as a method to protect confidentiality, and Kinney and Reiter (2010), where tests of multivariate hypotheses were examined when multiple imputation was used both for handling missing data and for disclosure limitation. Reiter and Kinney (2012) showed mathematically that, in a partially synthetic design, point and variance estimates can be approximately unbiased. Advice by Loong and Rubin (2017) to the imputer in the context of multiply-imputed synthetic data was that the imputation method must condition on covariates for accurate inferences (Loong and Rubin 2017, 1013–4).
Disclosure problems in web-based technologies that allow user-defined tabular outputs from census files of statistical agencies were discussed by Shlomo et al. (2015). They identified questions regarding which data should be used to generate the tables and which disclosure control methods should be applied, and examined new strategies for dealing with these issues in disseminating statistical information.
Goldstein and Shlomo (2020) introduced an approach related to the statistical integration of the data collection process with the estimation phase. A probabilistic framework was proposed for integrating the anonymization process with data analysis, where two stages are included (Goldstein and Shlomo 2020, 89). First, random noise with known distributional properties is added to some or all variables in the data to be released. Then, for the analysis of the (pseudonymized) data, the model of interest is specified so that parameter estimation accounts for the added noise. A Bayesian MCMC algorithm was considered for consistent estimation.
Over its history, JOS has been active in promoting discussion on how to maintain privacy and confidentiality while increasing data accessibility for users. It is interesting to note that many of the original ideas published in JOS originate from Tore Dalenius.
6. Survey Analysis and Inference
Finite population inference emerged in the 1970s, when the long-dominant randomization-based inference with probability sampling was questioned, often traced back to the intervention of Royall (1970) in his Biometrika article “On Finite Population Sampling Theory Under Certain Linear Regression Models.” A point Ken Brewer made in Survey Methodology was that the reinstatement of purposive sampling and prediction-based inference came as a shock to the finite population sampling establishment (Brewer 2013, 256).
6.1. How Survey Methodologists Communicate
The Scandinavian Journal of Statistics published in 1978 an article by Carl-Erik Särndal titled “Design-based and Model-based Inference in Survey Sampling,” with discussion by statisticians from different approaches: Ib Thomsen (Statistics Norway), Jan Hoem (University of Copenhagen), D. V. Lindley (University College London), Ole Barndorff-Nielsen (Aarhus University), and Tore Dalenius (Brown University).
Two fundamental questions was presented by Särndal (1978, 27): (1) How do some of the established results in the classical design-based theory fit into model-based thinking? and (2) Could classical design-based theory be replaced by a model-based theory, and would anything be gained thereby? The article reflected the ongoing debate about the role of statistical models in official statistical production.
In the JOS article “How Research Methodologists Communicate,” Särndal (1985) continued the topic with a detailed and critical consideration of some key terms and their obscure replacements found in statistical literature. For example, he argued: “… design-based estimator” and “model-based estimator” are particularly unfortunate and ambiguous terms. In addition, writers use related terms such as “model-dependent” (approach, estimator, etc.), “model-free” (approach, estimator, etc.), and even “design-free”.
Särndal also suggested interpretations in cases where meaning is not automatically clear (Särndal 1985, 49). In addition to conceptual analysis, Särndal reviewed the theoretical and statistical specifications and underlying assumptions of design-based and model-based inference. He introduced the general form of a (multiple) regression estimator in the design-based approach and its sample-based counterpart, which eventually became known as the generalized regression estimator in model-assisted estimation. Model-assisted estimation was formally presented in the Springer book Model Assisted Survey Sampling by Särndal et al. (1992).
In 2005, on the anniversary of the Journal of Official Statistics, an interview conducted by Phillip Kott with Carl-Erik Särndal, Bengt Swensson, and Jan Wretman, the authors of the 1992 Springer book, was published, with a foreword by Lars Lyberg (Kott 2005). During the discussion, Bengt Swensson highlighted the success of the book in connecting survey sampling with mainstream statistics, and Carl-Erik Särndal expressed satisfaction that the book had proved to be a “lasting contribution” and a standard text. Jan Wretman contributed several points, one of which was that (generalized) regression estimation had emerged as an important unifying concept.
Further consideration of the basics of survey inference in JOS includes an article of J. N. K. Rao, who developed a general framework for survey inference for the estimation of totals and distribution functions using auxiliary information at the estimation stage (Rao 1994). He examined calibration estimation under a general class of estimators and conditional probability sampling for conditionally valid repeated sampling inferences. He showed that a certain special case of the general class of estimators reduces to the classical calibration estimator of Deville and Särndal published in 1992, being identical to the generalized regression estimator of Särndal (1980). The methods were illustrated for stratified simple random sampling and stratified multistage sampling.
The confidence interval coverage properties of regression estimators were examined both theoretically and with simulation experiments in Rao et al. (2003). For two-phase sampling, reasons for the poor performance of design-based normal theory intervals were identified, even with moderately large second-phase samples when the underlying model is severely misspecified. The authors proposed practical solutions to improve the coverage probability.
6.2. Small Area and Domain Estimation
Estimation of descriptive statistics such as totals, means, and proportions (or more complex parameters) for population subgroups or domains within a finite population is referred to as small area estimation. The Wiley book by J. N. K. Rao (2003) on small area estimation provides a comprehensive source on the topic. Estimation for small areas is generally based on model-based approaches, which “borrow strength” from other areas to improve estimation for each small area. The design-based approach can be a viable option for domains with large domain sample sizes. A critical requirement for success is the availability of strong auxiliary data for the estimation process, along with careful model building, particularly in the model-based approach.
Small area and domain estimation has been discussed in JOS since the 1980s, beginning with the article by Kish and Verma (1986), which focused on producing post-census estimates for small domains by combining data from censuses, registers, and rolling samples. Isaki et al. (1988) compared small area estimators of census coverage using artificial populations and found that synthetic estimation, when combined with regression modeling, performed best among the methods considered. Both of these two studies refer to the U.S.
The growing demand for official statistics from small areas became apparent over time. Marker (1999) conducted a literature review of existing small area estimators with the goal of providing a better understanding of their properties and applicability in various practical situations. He organized the estimators from a general linear regression perspective and presented a schematic overview of the interrelations among the estimation techniques considered, along with statistical explanations.
The use of conditional arguments for the estimation of domain totals was extended by Falorsi and Russo (1999) from simple random sampling to the classical multistage household survey design of Hansen and Hurwitz (1943), as implemented by the Italian Statistical Institute and elsewhere. Among the expansion, ratio, synthetic, and composite estimators compared empirically, the ratio estimator yielded the lowest values of the conditional Mean Squared Error (MSE; Falorsi and Russo 1999, 550).
A weighting method for small area estimation applicable to the relevant study variables in a multivariate survey was developed by Chandra and Chambers (2009). Their model-based direct small area estimation approach was an update of Chandra and Chambers (2005). The underlying idea relates to the design-based calibration approach of Deville and Särndal (1992; Chandra and Chambers 2009, 379).
Spatial models in small area estimation were developed in Pratesi and Salvati (2009) to include a common autocorrelation parameter among small areas in the Fay–Herriot model and its MSE. Their empirical experiments showed that the spatial empirical best linear unbiased estimator outperformed the basic EBLUP estimator in terms of efficiency and relative bias. Among recent contributions to small area estimation are also Sakshaug et al. (2019), who discuss a method in which small probability samples are supplemented with nonprobability samples using a Bayesian approach, and Parker (2024), who considers nonlinear Fay–Herriot models for small area estimation using random weight neural networks.
In a JOS article of 2004, Victor Estevao and Carl-Erik Särndal examined the potentials of borrowing strength in design-based estimation for population subgroups or domains. In response to the question “What can borrowing strength do for design-based domain estimation?” their conclusion was that “borrowing strength is unfruitful in the design-based tradition” (Estevao and Särndal 2004, 645). They showed that, for a fixed set of auxiliary information, the minimum asymptotic design-based variance is achieved with a direct estimator derived through calibration rather than regression fitting, where strength is borrowed from outside a domain via a fitted model.
The rationale for their result was that any domain of interest can always be so specific that its own y-values bear no resemblance to y-values from outside the domain, making direct estimation the preferable approach (Estevao and Särndal 2004, 668). In simulation experiments, their theory performed well when the expected domain sample size was around 150 units (or more). For small domain sample sizes, borrowing strength with models can offer a viable option, a scenario Särndal addressed elsewhere using model-assisted methods and model calibration.
Estevao and Särndal (2004) provides the most recent JOS publication of Carl-Erik Särndal to date on “pure” design-based inference theory (nonsampling errors are excluded). His publication record in JOS began in 1985. Other work by Särndal in JOS includes such topics as a functional form approach to calibration (Estevao and Särndal 2000), the use of auxiliary information for calibration in two-phase sampling (Estevao and Särndal 2002), and methodological principles for a generalized estimation system at Statistics Canada (Estevao et al. 1995), where the classical generalized regression (GREG) estimator and known auxiliary variable totals play a central role. His latest article to date in JOS is Särndal and Lundquist (2017).
6.3. Morris Hansen Lecture by Graham Kalton
The tenth Morris Hansen Lecture, given by Graham Kalton, was titled “Models in the Practice of Survey Sampling (Revisited).” He considered design-based methods appropriate for large-scale sample surveys but also emphasized the role of model-based inference for small area estimation and missing data problems. Kalton (2002, 129) considered issues of model-assisted inference, conditional inference, the effect of measurement errors, and analytic uses of survey data.
Remarks by Chris Skinner were broadly in line with views of Kalton on descriptive inference with design-based methods. For analytic surveys, Skinner (2002, 156) took model-dependent inference as the preferred approach, complemented by weighting with auxiliary information to account for nonignorable sampling. William Bell presented “An Outsider’s Perspective” on the persistent challenge that the statistical field in general has largely relied on a model-based approach, except in survey statistics, and noted that the differences between the respective approaches to inference hinder communication between design-based survey samplers and other statisticians in both directions (Bell 2002, 159).
6.4. Special Issue on Calibrated Bayes
Roderick Little has contributed to JOS since 1991, when he reviewed the analysis of disproportionate stratified samples from a Bayesian perspective and emphasized the importance of explicitly accounting for differences between strata (Little 1991). Elliott and Little (2000) compared weight trimming with random-effects models that shrink weights across strata. Extensions included a compound weight pooling model using Bayesian averaging over different trimming points, and a weight smoothing model with nonparametric splines. Simulation experiments indicated the superiority of weight smoothing compared to the alternatives (Elliott and Little 2000, 191). The most recent publication of Roderick Little in JOS considers diagnostics for selection bias (Boonstra et al. 2021).
In a 2012 special issue, Little introduced the Calibrated Bayes approach, where inference is Bayesian, but models are selected to produce inference with good design-based properties (Little 2012, 309). Calibrated Bayes was proposed as an alternative inferential paradigm for official statistics, also referred to by Roderick Little as “A Bayes/Frequentist Roadmap.” Jean-François Beaumont, Alan Dorfman and Paul Smith were the discussants.
The prevailing approach to statistical inference at the U.S. Census Bureau was a combination of design-based and model-based ideas, which Little (2012, 309) termed the “design/model compromise.” Design-based inference was devoted to descriptive statistics like means and totals in large samples, while models were used for small area estimation, survey nonresponse treatment, and time series analysis. According to Little, Calibrated Bayes would help to avoid “inferential schizophrenia” between these approaches. The Calibrated Bayes approach was illustrated with applications to data from the U.S. Census Bureau.
Beaumont (2012, 335) summarized the main idea of Calibrated Bayes for official statistics as making Bayesian (model-based) inferences that have good design properties, implemented in practice by incorporating design information into the model and by using weak prior distributions. Dorfman (2012, 349) interpreted the term calibration of Little to refer to a model that automatically reflects the (probabilistic) sample design and can be viewed, in large samples, as a particular version of model-assisted sample estimation. Smith (2012, 360) argued that for a paradigm shift of the magnitude proposed by Roderick Little, more information would be needed on model-based methods for business surveys.
In his response, Little (2012, 370) noted, among other things, that Bayesian inference for a model that includes sampling weight as a covariate (as he suggested), perhaps as a penalized spline as in Zheng and Little (2003), might yield better calibrated results than models that do not include weights as covariates.
6.5. Analysis of Complex Surveys
In analytical surveys with complex sampling, methods are needed that account for the survey design for valid results. A review of replication and linearization methods for variance estimation of nonlinear statistics, including Taylor series linearization, random groups, balanced repeated replication (BRR), jackknife, and bootstrap, was presented by Rust (1985). Andersson et al. (1987) used Monte Carlo methods for variance estimation with replication techniques for consumer price index in a survey based on stratified PPS sampling. Valliant (1990) compared stratified ratio and regression estimators and associated variance estimators theoretically and empirically. Kott (2001) applied his delete-a-group jackknife estimator for a list-based survey, the National Agricultural Statistics Service of the U.S. Department of Agriculture.
A design-based variance approximation comparable to jackknife was developed in Lu and Gelman (2003) and was applied to the New York City Social Indicators Survey, which used inverse-probability weighting, post-stratification, and raking to adjust for sampling design and nonresponse. Variance estimation for calibrated estimation with deterministic regression imputation in surveys with missing data was derived in Davison and Sardy (2007). The method used linearization and was compared with BRR, bootstrap, block jackknife, and multiple imputation for simulated data based on the Swiss Household Budget Survey (Davison and Sardy 2007, 371).
Rescaled bootstrap was compared with linearization and ultimate cluster variance estimation for the “Racism and Ethno-racial Discrimination” survey in 2021. In the experiments, linearization and rescaled bootstrap led to similar results on bias and accuracy, but the ultimate cluster based method appeared biased, leading the authors to suggest rescaled bootstrap as the relevant approach in the case considered (Guadarrama Sanz et al. 2025, 202).
Deville and Särndal (1994) developed variance estimation for the Horvitz-Thompson estimator, where multiple regression imputation with multivariate auxiliary information was used for missing values, and standard software was used for variance estimation. Andersson and Nordberg (1994) considered a method and theory for variance approximation of non-linear functions of totals and introduced the CLAN software developed at Statistics Sweden. CLAN has been long used in statistical offices.
A software package named ReGenesees developed by Zardetto (2015) at the Italian National Institute of Statistics provides an advanced R system for design-based and model-assisted estimation and sampling error assessment for complex estimators, provided they can be expressed as differentiable functions of Horvitz-Thompson or calibration estimators of totals in complex sample surveys. West et al. (2018) reviewed and compared the properties of statistical software for the analysis of complex surveys and their availability for implementation in practice.
Test statistics for goodness-of-fit, homogeneity, and independence are more complex for data from stratified cluster samples compared to tests for simple random samples. The clusters can be positively intra-cluster correlated with respect to the phenomenon under study, making the test statistics no longer asymptotically chi-squared. Test statistics for complex surveys is a frequent topic in general statistical literature.
Michael Hidiroglou and J. N. K. Rao published in JOS their research on chi-squared tests with categorical data from complex surveys from cluster samples (Hidiroglou and Rao 1978a, 1978b). Part I developed tests for goodness-of-fit, homogeneity, and independence in a two-way table, and Part II presented a test of independence in a three-way table. Various adjustment methods were introduced to improve the asymptotic distributions of the test statistics. Applications were for the Canada Health Survey of 1978–1979. Binder (1991) presented a general statistical framework for the analysis of categorical survey data with nonresponse for complex samples.
The Wiley book by Valliant et al. (2000) provides a comprehensive source on the prediction approach in finite population inference. Zhang et al. (2025) is a recent contribution to predictive inference published in JOS.
7. Some Final Notes
7.1. Special Issue in Memory of Dr Lars Lyberg
A special issue in memory of Lars Lyberg was published, with forewords titled “In Memory of Dr Lars Lyberg, Remembering a Giant in Survey Research, 1944–2021,” written by P. Biemer et al. (2002). The issue was based on presentations by colleagues and friends from a session organized in his honor by Brady West and Michael Elliott at the 2021 Joint Statistical Meetings of the American Statistical Association, held virtually in August 8–12, 2021.
Referring to Japec and Lyberg (2021), Paul Biemer listed features in the “changing landscape” of survey methodology, including such items as increasing data collection costs, declining response rates, the rise of nonprobability samples (especially in hybrid applications), desire of clients for “wider, deeper, better, quicker, and cheaper data” (as stated by Holt (2007)), and a need to combine multiple data sources. Research in non-probability sampling is one of the new areas in JOS; the latest article to date on this area is Čiginas et al. (2025). Japec and Lyberg (2021) analyzed how national statistical institutes have reacted to the ongoing changes and suggested strategies that they think will address them.
The publication policy of the Journal of Official Statistics over the past years indicates the success of the journal in being at the forefront of publishing research articles that are topical and important for survey and official statistics.
7.2. Why Innovation Is Difficult in Government Surveys
The well-known article by Don Dillman, “Why Innovation Is Difficult in Government Surveys” (Dillman 1996), with comments from fifteen discussants, provides another example of challenging topics published in the Journal of Official Statistics. The provocative question in the title reflects his own experiences, particularly during his work at the U.S. Bureau of the Census from 1991 to 1995.
In the article, Dillman examines the barriers to implementing innovation within government survey organizations, with particular attention to issues related to nonresponse and measurement error. One of the main obstacles he mentioned is the parallel existence of research and production cultures, which often drive conflicting objectives. This tension may be increased by the imbalance in their underlying epistemological priorities: neither culture sufficiently integrates insights from cognitive psychology or sociology. Consequently, critical error sources, such as measurement and nonresponse error, can be inadequately addressed.
Dillman proposed several steps to advance innovation in this area. He also emphasized the importance of increasing staff awareness of the multidimensional nature of survey error. Interdisciplinary training programs offer a promising path forward for enhancing awareness of the multidimensional property.
Dillman referred to the Joint Program in Survey Methodology (JPSM), a collaboration between the University of Maryland, the University of Michigan, and Westat. Today, the program has a European dimension through a joint study program in survey and data science offered by the University of Mannheim and JPSM. The European Master’s Programme in Official Statistics (EMOS), which is running in European universities, has provided a collaborative arrangement between universities and national statistical institutes in cooperation with Eurostat, the statistical office of the European Union.
Historically, improving university-level training in survey methodology was the first in the agenda of Dalenius (1987) for future work on error control in surveys.
Training of statisticians is among the topics supported by the Journal of Official Statistics. Notable contributions in JOS include the articles by Ntozi (1992), Lohr (2009), Valliant et al. (2010), Tucker (2010), Pullinger (2016), and Gal and Ograjenšek (2017). However, training-related topics remain relatively rare. Drawing on my background in both academia and official statistics, I would like to take this opportunity to gently encourage the journal to devote attention to this field. For example, special issues on the topic would be welcome.
