Abstract
This paper provides an overview of progress and opportunities in Stats NZ’s journey towards a register-based statistical system. It sets out to provide a status update of the components of the system at Stats NZ – Statistical Business Register (SBR), Statistical Person Register (SPR), and Statistical Location Register (SLR). The drivers for change and changes to the authorising environment are described, including the prioritisation of a register-based statistical system through Stats NZ’s strategic priorities and the updates to the legislative context through the Data and Statistics Act 2022. The current state of each of the base registers is briefly described and detail is provided on the evolution of a SPR and concept development of a property-centric location register.
Keywords
Introduction
Stats NZ is working towards a register-based statistical system with the ability to connect data across domains – people, business, and location related data through a system of integrated registers. This is an update on a previous paper prepared in 2022 [1].
The paper commences by describing the drivers of this development, within the wider context of transformation work at Stats NZ. Recent changes to the Stats NZ authorising environment are then discussed increasing the impetus for registers development. The paper then goes on to describe the current state of registers at Stats NZ, including the Statistical Business Register (SBR), Statistical Person Register (SPR), and Statistical Location Register (SLR). While we have existing components for each of these base registers, they have not been developed with a register-based system in mind. To conclude, we discuss in more detail recent work towards an SPR and property-centric SLR, including the opportunities and challenges this has presented.
Drivers of change
Stats NZ is looking at different ways to form the basis for future statistical production, which can produce existing statistics in a more efficient way, while also being flexible to future requirements and cross-domain insights using integrated social, economic and environment registers and data.
There are several factors leading Stats NZ to this transformation including the increasingly unsustainable costs around surveying, important groups being missed out of national coverage, and increasing reluctance of engagement with government.
These changes to context do not come without opportunities. There is potential in greater use of integrated data and there are now fewer barriers for producing official statistics through a registers system. There is also untapped potential for statistical models to add value, including working with Māori, the indigenous people of New Zealand.
However, there are also wider considerations around ethics, privacy and security of data, and social acceptability, that will need to be addressed to reap the benefits of a registers system.
An organisation-wide transformation programme of work will aim to address the challenges and seize the opportunities outlined above.
The delivery of benefits from a register-based statistical system will heavily rely on having a set of linked base registers. Three base registers can be linked together to form an integrated data system. The data contained in the registers is the best that can be obtained from available sources (such as surveys or administrative sources such as government departments). As New Zealand does not have full coverage or consistent personal or building IDs, the data used is not complete, and is linked using a mix of probabilistic and deterministic linking methods. This means that we would look at building a system of
Authorising environment
There are factors within our authorizing environment and society that mean we are not able to produce statistics in the same way we always have.
Strategic priorities
In 2021, Stats NZ refreshed its organisation strategy. One of the priorities for the next five years is to become an organisation that uses administrative data first. A register-based system has been identified as a key enabler to achieving this priority.
“We will move away from using information gathered directly from New Zealanders, instead relying increasingly on data already held by other agencies. This data is routinely collected for administrative purposes by these organisations and is a valuable source of input for statistical insights that we intend increasingly to draw on. We will ensure our infrastructure and system assets are ready for this shift in focus, as part of which we will set up a network of registers to integrate data more efficiently” [2].
Data and Statistics Act 2022
On the 1st of September 2022 new legislation – the Data and Statistics Act [3] (the Act) was enacted. This legislation makes it easier for Stats NZ to make use of public sector data for official statistics. The Government Statistician is now able to request data from public sector agencies to produce official statistics. Another significant change is the requirement for public sector agencies to proactively inform the Government Statistician about changes to data used for official statistics: for example, concerns about quality, and to let Stats NZ know if any changes are being made. The Act still requires a five-yearly census but does not prescribe how the data is to be collected – giving increased freedom to move towards census models that could use increasing amounts of administrative data.
The legislation also incorporates specific commitments to Māori, the indigenous peoples of Aotearoa New Zealand. The Act requires the Crown to give effect to the principles of te Tiriti o Waitangi/the Treaty of Waitangi, via the Government Statistician. Stats NZ is committed to working with Māori, iwi and hapū to improve the quality and quantity of data about Māori to better inform their decisions about their social, economic, environmental, and cultural wellbeing. The Treaty, an agreement (in Māori and English) made between the British Crown and Māori rangatira (chiefs), is one of New Zealand’s foundational documents.
Part of this commitment to Māori is in Stats NZ’s approach to the creation of a register-based statistical system. Building indigenous context design into the data model and giving thought and effect to a partnership approach with Māori will give a greater chance of a successful integrated data system that works for all.
Data supply, government support and use
As mentioned, Stats NZ has the power to request and use government data for statistical purposes under the Act. While this legislation gives the power for the Government Statistician to make a mandatory request for data for the purpose of producing official statistics, relying on compulsion is not the preferred approach. Instead, Stats NZ is building on years of cross-agency relationships and confirming that data suppliers understand their role as stewards – being aware of how their data is being used and ensuring that the data is collected with the appropriate permissions and consent for use beyond the original purpose of collection.
Public engagement, consultation, and indigenous rights
Innovations around the establishment of a register-based statistical system may alter the public’s awareness, boundaries and expectations of how Stats NZ does its work. A key consideration is the acceptability of repurposing data about people and the need for Stats NZ to be transparent about the ways in which data is used. Stats NZ is thinking about how to best approach and engage on statistical use of person-centred data and engagement with New Zealanders to determine public acceptability will influence the shape of our statistical solutions.
Stats NZ is committed, through both the Act and in its direct relationships with Māori/Iwi (tribal groups), to ensuring that the register-based statistical system enables indigenous data requirements. As treaty partners, Stats NZ will work with Māori to include indigenous voices and experience in the design of the registers system.
Current state of registers
The base registers consist of a Statisical Business Register (SBR), a Statistical Location Register (SLR) and a Statistical Person Register (SPR).
Statistical Business Register
The SBR is a list of all known individual private and public sector businesses and organisations engaged in the production of goods and services in New Zealand. The SBR is primarily maintained from Inland Revenue (IR – New Zealand’s tax agency) data, with supplementary information on registered businesses coming from the Ministry of Business, Innovation and Employment. IR sources include Goods and Services tax (a ‘value added’ tax collection) and payday filing (a payroll tax collection). Only large and complex businesses are surveyed on an annual basis by Stats NZ to maintain structures, classifications, and size indicators.
The SBR is critical to the production of economic statistics. Economic indicator statistics, underpinned by the SBR, are critical to the production of key macro-economic measures such as Gross Domestic Product and the Balance of Payments. Without a stable, resourced SBR Stats NZ would be unable to meet international obligations and deliver official statistics on important measures of the economy.
The SBR is the most mature register at Stats NZ, with approximately 570,000 live businesses. It has evolved over the course of more than 30 years. There are established maintenance practices with the focus on continuous improvement, and further enhancements to allow the SBR to operate alongside a system of registers.
Statistical Location Register
The current state SLR aims to be a complete list of addresses in NZ. The SLR is less evolved than the SBR, having been set up specifically for the last two censuses (2018 and 2023). It was completely refreshed in preparation for the 2023 Census and does not currently have capability for administrative data maintenance (other than for address data). The SLR contains a list of dwellings, but these are dependent on census and are not updated between census cycles. It does not contain other property or building data.
A future SLR model will be based on property, not address, and need to resolve issues around data quality, duplication, and omissions. In 2022 a proof of concept was trialled to test this property-centric model.
Statistical Person Register
A SPR aims to cover all the people who have ever resided in NZ, supported by a core set of important demographic attributes. Stats NZ does not have a SPR but are building towards the ability to derive population statistics from administrative data sources. Across 2018 Census and 2023 Census Stats NZ has used the Integrated Data Infrastructure (IDI) [4], a research database comprised of person-level administrative and survey data collected from across government, to fill the data gaps. The IDI spine was used in place of a SPR to enable other attribute data to be linked to it.
While Stats NZ has some elements of a register-based statistical system, these have been brought together inefficiently. A future integrated set of statistical registers will allow Stats NZ to integrate data across a range of sources to meet customer and partner needs. Looking further to the future Stats NZ will be able to integrate new data sources more easily into existing statistical outputs to generate the data and statistics customers need.
Evolution towards a SPR
Current status
As mentioned above Stats NZ currently produces the IDI. While this has been used for some statistical production, much of the work is research conducted by other organisations, most of which supports other government agencies’ policy outcomes and service delivery design.
Developing an interim person spine.
While person-centred data integration was initially developed only using people registered to pay tax, the current IDI spine is built from linking births, visa, and tax data. This coverage is generally sufficient for research and statistics, however there are gaps that need further investigation to fully understand their impact on the ability to produce statistics for specific sub-groups of the population.
Although the IDI has been used in some use cases successfully, it is not suitable as a future SPR for a few major reasons:
The IDI is still a prototype which has been added to over and over. It takes three months to refresh the data as the whole process is completely re-run creating new links and IDs each time. The timeliness of data processing is inadequate for regular statistical production. The IDI was established primarily for research outside of Stats NZ and hence the rules developed to allow safe use of the data do not always need to apply to Stats NZ, as the National Statistical Office, and currently impose barriers to effective access. Stats NZ has more work to do in the areas of social acceptability, trust, and confidence before we expand our use of administrative data further into transforming our statistical production, particularly for population and social statistics.
Stats NZ has a project underway called the Interim Person Spine (IPS) to support use of administrative data in official statistical production for a specific use case. This project, while not yet complete, is a crucial step in enabling Stats NZ to build a register-based statistical infrastructure that is flexible enough to meet the many data demands for statistical production from across the organisation.
The IPS is an incrementally updated IDI spine (births, visa, and tax data) with some methodological improvements to underlying data processing and data structures. It picks up updates (dashed boxes in Fig. 1) with new linking or relinking when a record has changed (e.g., a person changes their name), and new entries (solid boxes in Fig. 1), e.g., a child is born. Additional life cycle information (date of death, when known) is recorded in the IPS, however migration events (important life cycle information for determining the resident population at a given point in time) are currently not comprehensively captured. The IPS is the prototype of the SPR.
Potential future state of a SPR
Possible option for the build of a Statistical Person Register.
Building a base spine is a complex and time-consuming effort and the current IDI spine is rebuilt every time it is refreshed (three times a year). This timeline does not allow the flexibility that Stats NZ requires for the production of statistics as the spine cannot be changed or updated between the scheduled refreshes. With a spine updated incrementally such restrictions are reduced (Fig. 2).
Once formed the base spine would then be incrementally updated using the data sources that capture the only way of entering the country: being born or crossing the border. Data provided from across the data system in New Zealand is linked to this spine. This secondary linking of other person-centric datasets would also be incremental. The base spine would be fully refreshed periodically (likely five or 10 years) to ensure accuracy and coverage.
In the processing of the data there will be a complete separation between identifying information required for linking and information required for statistical production. Identifying information would only be available to a dedicated team responsible for all data linking. This team would not have access to any of the information about the people available for statistical production.
This background infrastructure then supports the potential for a future SPR. Conceptually this is a de-identified table that contains statistical life cycle information for each person. Attributes required by the SPR are derived from the combination of other person-centric datasets linked to earlier. These include important demographic attributes and links to the other registers, required for the functioning of the register system. Sample frame attributes will, by necessity, contain identity information and will be managed accordingly.
Only the de-identified linked data could then be accessed by statistical production teams, via an application and approval process which would include ethics and privacy considerations, likely based upon current IDI practices, the Five Safes Framework and Ngā Tikanga Paihere [4]. The statistical production teams can then use their specialist knowledge to create other derived tables. Such tables would include standard derivations of population, ethnicity, income, addresses, household, relationships among many others. Access would be strictly managed and controlled, with statistical production teams only having access to the data they require and a valid, approved use case endorsed through the application process.
The SLR Stats NZ developed for the 2018 Census stored dwelling data with a direct relationship to an address; effectively address used as a proxy for a dwelling. There are shortcomings with this approach that mean it is not always clear where dwellings are located, how many dwellings should be at a location, or even if dwellings exist. To understand dwelling types and dwelling density across New Zealand, Stats NZ requires a maintainable list of dwellings, uniquely identified, each with its own identification number.
In a collaboration between Stats NZ and Toitū Te Whenua Land Information New Zealand (LINZ) a research project was undertaken to help the agencies understand the importance of connected property data (a property spine linking address to a unit of property, to a building to a dwelling), the quality of the data, the data gaps and what indicators provide the right information relating to buildings/dwellings. This project explored the potential of an enduring dwelling solution to not only meet Census needs utilising an existing framework – the Property Data Management Framework [5] – but to also provide a consistent way to seamlessly integrate data on property, and a platform (LINZ’s Connected Property Data Management System) as the key starting point.
Property data was sourced from LINZ. New Zealand maintains a Torrens land transfer registration system and use of this system is compulsory. Additionally, local councils maintain a district valuation roll used to determine rates. The key entity in a district valuation roll is a rating unit which is a representation of a property built and maintained using title and land parcel data from the land transfer system [6]. The data from these systems is available for reuse and this was the property data used in the project. LINZ also maintains a building outline dataset with national coverage, captured from aerial imagery, and this dataset was used as a proxy for a building register. The project set a goal of uniquely and unambiguously connecting all dwellings from the 2018 Census dwellings dataset to a building (and hence a property and an address). A stretch goal was to use the property spine data to test predicting the existence of dwellings built after 2018 (meaning no dwelling records existed in the census data).
The approach involved using the data relationships defined in the property data management framework, and knowledge of the structure of a property, to determine property scenarios that represent the most common configurations of properties. These scenarios were then applied to link a dwelling to a building (or part of a building). Using property scenarios simplified the cardinality on the data relationships and allowed complex property configurations to be broken down into combinations of simpler scenarios.
The proof of concept was a great success, and the project was able to show that existing Stats NZ dwellings can connect into a property spine [7]. The project was able to prove that using the property data management framework and a property spine meant that new dwellings can be predicted using administrative data, furthering efforts towards administrative data maintenance and working with data government already collects.
The connections created were repeatable, transparent and could significantly enable maintenance of the dwellings dataset between census cycles and reduce the interventions currently required to ‘ready’ the dwellings dataset for each census or survey cycle. More importantly, the proof of concept application enabled Stats NZ to explore the connected data, gaining confidence
that dwellings are being found and accurately represented.
Conclusion
As outlined in this paper, there is considerable progress underway at Stats NZ towards a register-based statistical system. While many challenges must be overcome, Stats NZ is seizing the opportunities that have been presented through the wider government data system and authorising environment.
The SPR and property-centric SLR initiatives discussed demonstrate some of the development underway and will be built upon over time. Balancing this development across all domains while maintaining the trust of New Zealanders, ensuring appropriate ethical and cultural considerations, and demonstrating genuine commitment to te Tiriti o Waitangi/the Treaty of Waitangi will be critical to the success of an integrated data system and generating the range of potential benefits suggested.
