Abstract
This paper presents the principles of implementing register-based cohort studies as currently applied for real-time estimation of influenza vaccine effectiveness in Finland. All required information is retrieved from computerised national registers and deterministically linked via the unique personal identity code assigned to each Finnish resident. The study cohorts comprise large subpopulations eligible for a free seasonal influenza vaccination as part of the National Vaccination Programme. The primary outcome is laboratory-confirmed influenza. Each study subject is taken to be at risk of experiencing the outcome from the onset of the influenza season until the first of the following three events occurs: outcome, loss to follow up or end of season. Seasonal influenza vaccination is viewed as time-dependent exposure. Accordingly, each subject may contribute unvaccinated and vaccinated person-time during their time at risk. The vaccine effectiveness is estimated as one minus the influenza incidence rate ratio comparing the vaccinated with the unvaccinated within the study cohorts. Data collection in register-based research is an almost fully automated process. The effort, resources and the time spent in the field are relatively small compared to other observational study designs. This advantage is pivotal when vaccine effectiveness estimates are needed in real time. The paper outlines possible limitations of register-based cohort studies. It also addresses the need to explore how national and subnational registers available in the Nordic countries and elsewhere can be utilised in vaccine effectiveness research to guide decision making and to improve individual health as well as public health.
Background
The protective effectiveness of a vaccine against a given disease is defined as the relative reduction in the disease incidence (measured as a risk or rate) attributable to the administered vaccine in a real-world setting [1, 2]. Based on a comparison of vaccinated and unvaccinated individuals exposed to the same vaccination programme, the vaccine effectiveness is an important measure for evaluating the direct effect of the vaccine. Assessment of vaccine effectiveness supports further development of the vaccination programme, for example by suggesting change of the vaccine brand or revision of the target groups eligible for the programme.
The estimation of the effectiveness of seasonal influenza vaccines is challenging. Owing to the continuous evolution of the influenza viruses and regular updates to the vaccine compositions, the effectiveness must be reassessed each season [1, 3]. Moreover, it is envisaged to evaluate the performance of the vaccination programme early in the season to ensure high population-level protection, for example through recommendations for the use of antivirals in case the vaccine effectiveness is low. The World Health Organization regularly reselects the influenza strains to be included in the vaccines, attempting a match with those predicted to dominate in the upcoming season [1, 3]. Timely effectiveness estimates therefore help in making recommendations for future vaccine compositions.
In Finland, seasonal influenza vaccination is offered free of charge as part of the National Vaccination Programme to several target groups. These currently include children aged 6 to 35 months (and will be extended to 3- to 6-year-olds starting from 2018/19), elderly people aged 65 years and over, pregnant women, health and social care workers, military conscripts and people with certain chronic diseases or underlying conditions. Under the Communicable Diseases Act [4], the implementation of the National Vaccination Programme and the monitoring of vaccine effectiveness are mandates of the National Institute for Health and Welfare (THL). Accordingly, the THL assesses and communicates the performance of the Finnish influenza vaccination programme.
Given the availability of population-based register data on vaccinations and influenza illness for secondary use and the permission to link these data at the individual level, the THL has recently established online surveillance of influenza vaccine effectiveness using computerised national registers [5, 6]. A cohort study approach in line with the protocol for administrative databases utilising cohort studies, commissioned by the European Centre for Disease Prevention and Control, [7] has been preferred over other observational designs.
Within the scope of this article we present the principles of the register-based cohort study design currently applied in Finland for the estimation of influenza vaccine effectiveness in real time. Special emphasis is given to available data sources, exposure and outcome definitions, and consideration of time in the cohort design. Although technical details of statistical analysis are beyond the scope of this paper, we also outline the strengths and possible limitations of our approach.
Data sources and data linkage
The Finnish Population Information System contains individual-level data needed in defining the study cohorts [8], including the personal identity code, sex, place of residence, date of birth and the date of death of each citizen and permanent resident (Table I).
Computerised national registers currently utilised for monitoring the effectiveness of seasonal influenza vaccines in Finland.
The deadline for submitting the records of a full calendar year is the 28th of February (since 2017, previously 31st of March) of the following year. However, the final records have only been available in the register by September or October. Voluntary monthly data submission has been enabled since 2017 and is envisaged to become mandatory in the near future.
The deadline for submitting the records of a full calendar year is the 31st of March of the following year. The final records are available in the register by October. Voluntary monthly data submission has been enabled and is envisaged to become mandatory in the near future.
ICD-10: International Classification of Diseases, 10th revision; ICPC-2: International Classification of Primary Care, second edition.
The National Vaccination Register provides the vaccination information [9], i.e. vaccination records characterised by the vaccinee’s personal identity code, the administered vaccine, including the batch number and trade name, and the date of vaccination (Table I). Currently, the register mainly covers vaccinations given in the public primary healthcare sector, where the vaccines included in the Finnish National Vaccination Programme are administered. However, the expansion of the register to include vaccinations given in the private and secondary healthcare sectors is ongoing [9].
The National Infectious Diseases Register (NIDR) provides data about laboratory-confirmed influenza cases [10]. In Finland, every clinical microbiology laboratory notifies all influenza-positive findings to this register with the following information utilised: date of specimen, influenza type and the patient’s personal identity code (Table I). The subtype of influenza A positive and the lineage of influenza B positive specimens are reported to the NIDR only if the specimens have been analysed in the Finnish National Influenza Centre.
The Register of Primary Health Care Visits (Avohilmo) contains diagnostic information on outpatient public primary healthcare delivered in Finland [11]. Each patient encounter is characterised by the patient’s personal identity code, diagnostic codes (International Classification of Diseases, 10th revision (ICD-10) or International Classification of Primary Care, second edition (ICPC-2)) and the calendar date (Table I).
The Care Register for Health Care (Hilmo), previously known as the Hospital Discharge Register, contains diagnostic information on emergency and inpatient healthcare provided in Finnish hospitals [11, 12]. The hospital visits are characterised by the patient’s personal identity code, ICD-10 diagnostic codes, the calendar date and the duration of hospitalisation (Table I).
The Medical Birth Register provides data on live births and stillbirths and contains socio-economic and other background information about the mother and the infant [11]. Relevant data include the infant’s personal identity code and, for example, the mother’s marital status, number of previous pregnancies, nationality and smoking behaviour during pregnancy, as well as the infant’s weight and gestational age at birth (Table I).
The first four of the six registers described above are real-time registers, i.e. their data content is updated daily. In contrast, the records in Hilmo and the Medical Birth Register are only available with certain delay (Table I). However, as part of the reform of health and social services, improvements in the timeliness of these registers are underway.
The Communicable Diseases Act grants the THL the right to link data in the computerised national registers to fulfil its mandates [4]. After extracting the relevant data (Table I) from the registers, the records are linked deterministically using the personal identity code. This code is unambiguously assigned to all Finnish citizens as well as foreign citizens who have been registered in the Finnish Population Information System [13]. The linked records are pseudonymised before they are further processed and analysed. The right to access individual-level data is granted only to selected THL employees.
Cohort definition
To estimate influenza vaccine effectiveness, we have designed a population-based cohort study in which all required information is retrieved exclusively from the above registers. The study period consists of the influenza season of interest, i.e. the time period when influenza viruses circulate in the population. In accordance with THL’s mandates, the current surveillance focusses on subpopulations eligible for seasonal influenza vaccination as part of the National Vaccination Programme. In particular, we have defined two study cohorts: children aged 6 to 35 months at the onset of the season and elderly people aged 65 years and over. Ideally, the study cohort should comprise the entire subpopulation of interest.
Outcome, exposure and covariate definitions
We have established two outcome definitions. The primary outcome is laboratory-confirmed influenza recorded in the NIDR, further identified as influenza A or influenza B. The secondary outcome is clinically suspected influenza-like illness (ICD-10 diagnostic codes J09, J10 and J11 and ICPC-2 diagnostic code R80) based on diagnostic information recorded in Avohilmo and/or Hilmo.
The exposure variable is influenza vaccination during the season under study and is identified based on the data recorded in the National Vaccination Register.
A set of covariates describing potential confounders and effect modifiers is formed using all the above registers except the NIDR. Age, sex, influenza vaccinations in previous seasons and diagnostic information indicating the presence of chronic underlying conditions have been considered relevant covariates. For analyses conducted at the end of the season and/or focussing on young children, further covariates such as the number of hospital visits in the year before the study period, and/or the socio-economic background recorded in the Medical Birth Register can be added to the set of covariates.
The National Vaccination Register, NIDR, Avohilmo and Hilmo only document the presence of chronic underlying conditions or events such as vaccination or influenza diagnosis but not their absence. We therefore assume a condition is absent or an event did not occur if there is no record in the respective register.
Statistical analysis
We regard each study subject to be at risk of experiencing the outcome of interest from the onset of the study period until the first of the following three events occurs: outcome of interest, loss to follow-up (either due to death or emigration), or end of the study period (Figure 1). Consequently, the follow-up time can differ across study subjects. For simplification, we do not include multiple events of influenza for the same subject in the analysis as repeated infections within the same season are rare.

Time-to-event framework for estimating influenza vaccine effectiveness. An exemplary cohort of six study subjects is followed through an influenza season. The time at risk for subjects 2 and 5 ends at the occurrence of the outcome of interest, for subjects 1 and 4 at their loss to follow up (either due to death or emigration), and for subjects 3 and 6 at the end of the season. All six subjects contribute unvaccinated person-time to the analysis. Subjects 4, 5 and 6 additionally contribute vaccinated person-time.
During their time at risk, each subject can contribute unvaccinated as well as vaccinated person-time. In other words, seasonal influenza vaccination is a time-dependent exposure (Figure 1). We consider the study subject to be exposed since their first vaccination during the study period, irrespective of whether they would be vaccinated again at a later time point. In case several influenza vaccines are used in parallel in the cohort but the interest is in estimating the effectiveness for a specific brand, the follow up of all subjects vaccinated with another influenza vaccine brand is right-censored at the time of that vaccination (Figure 2).

Estimation of brand-specific influenza vaccine effectiveness. An exemplary cohort of three vaccinated study subjects is followed through an influenza season. Until their (first) vaccination, all three subjects contribute unvaccinated person-time to the analysis. Thereafter, subjects 1 and 2 additionally contribute vaccinated person-time. The follow-up of subjects 2 and 3 is, however, right-censored at the time of vaccination with an influenza vaccine other than the vaccine of interest.
In this time-to-event framework, the vaccine effectiveness is defined as one minus the influenza incidence rate ratio comparing vaccinated with unvaccinated subjects. We estimate incidence rate ratios using the Cox proportional hazards model [14] with time since the onset of the study period as the underlying timescale. To take the time since vaccination into account, we split the exposure variable into multiple levels, for example ‘unvaccinated’, ‘vaccinated ⩽14 days ago’ and ‘vaccinated >14 days ago’. Accordingly, the vaccine effectiveness is estimated for the two vaccinated categories relative to the unvaccinated.
To control for confounding, we consider two approaches. One option is to include the covariates, i.e. potential confounders and relevant interaction terms, in the model. The other option is to use each subject’s propensity of being vaccinated, which is estimated conditionally on their covariates, in the model instead of directly adjusting for covariates [15].
Discussion
In this paper, we have presented the principles of the register-based cohort study design currently applied in Finland for the estimation of influenza vaccine effectiveness in real time. Special emphasis has been given to available data sources, exposure and outcome definitions, and consideration of time in the cohort design.
Influenza seasons can differ greatly [1]. Therefore, ecological trend designs, such as those utilised to assess the impact of pneumococcal vaccination against pneumonia [16], are not applicable. In contrast, the test-negative design, in which study subjects are sampled from among patients seeking medical care for influenza-like symptoms, is frequently used [17]. Because all recruited patients are laboratory tested for influenza and classified as cases (test-positives) or controls (test-negatives), the categorisation of the test-negative design as cohort or case-control study has been discussed [18, 19]. In practice, great efforts are needed to test and distinguish the cases and controls and to collect their vaccination histories and other background information.
In a register-based study, all data are extracted through an almost fully automated process. This means that resources and time spent in the field are relatively small compared to other observational designs. This is an important advantage when vaccine effectiveness estimates are needed in real time. Computerised national registers provide a considerable possibility to estimate the effectiveness early in the season using large and highly representative cohorts. Nevertheless, estimating the effectiveness in real time poses high demands on the registers. This requires outcome and exposure data to be accessible in real time and all other information defining and describing the study cohort must be available from the beginning of the study period.
As in every observational study, the resulting estimates might be affected by selection bias, information bias and confounding, although to different extents [20]. We assume that selection bias is a minor concern in this population-based cohort study design. The entire subpopulation of interest is enrolled by the beginning of the influenza season before exposure or outcome occur, which means there is no selection other than the restriction to the two age groups generally eligible for seasonal influenza vaccination in Finland. In addition, loss to follow up due to death or emigration is taken into account in the statistical analysis as censoring.
The major concern is information bias. We expect to misclassify exposure to vaccination, the influenza outcome and covariates to an unknown extent, when defining them solely based on register data. In particular, differential outcome misclassification due to different case detection rates among the vaccinated and the unvaccinated could lead to systematic bias in the estimates. Future research must clarify the role and examine the magnitude of such biases. Yet another question is how to define optimal outcomes for vaccine effectiveness studies based on register data, including the consideration of disease severity and an analysis of the usage of diagnostic codes.
The set of covariates outlined here and in our earlier publications [5, 6] to adjust for confounding is by no means exhaustive and might not yet control for all potential differences in health-seeking behaviour and infection pressure. The use of negative-control outcomes [19] may prove to be a viable approach to detect and measure the impact of residual confounding. Furthermore, we intend to include additional variables in the analysis, in analogy to a previously conducted register-based cohort study evaluating perinatal survival and health after maternal influenza vaccination [21]. In that study, data on prescribed drugs were retrieved from the Benefits Register of the Social Insurance Institution of Finland, which provides statistics on medical reimbursements. Moreover, data from the Finnish Cancer Registry could provide further information on chronic comorbidities like cancer. In general, the methods we apply are steadily refined in accordance with the continuing improvements in the availability and quality of the registers.
In addition to Finland, several European countries have established pivotal registers at the national level (e.g. Denmark [22, 23], England [24], the Netherlands [25, 26], Norway [27], and Scotland [28, 29]) or subnational level (e.g. Navarre, Spain [30, 31] and Stockholm, Sweden [6, 32]) monitoring vaccinations and influenza cases in the total population or defined subpopulations including representative samples of the total population. Administrative databases and medical registers are widely recognised tools for signal detection and hypothesis generation [33]. However, their efficient use for signal confirmation and evidence, e.g. for assessing the effectiveness of different treatments in real-world settings, is still under development. Bias, caused particularly by differential outcome misclassification, and confounding may have implications on the validity, accuracy and generalisability of the results [34]. Therefore, the robustness of the currently implemented design needs still to be evaluated and the effectiveness estimates must be interpreted carefully. Ultimately, there are also legal implications of linking records from different data sources at the individual level using personal identity codes that might hinder the implementation of our approach elsewhere.
We have already published two studies based on the design outlined in this paper, demonstrating the potential of register data in the estimation of influenza vaccine effectiveness. The first study compared the effectiveness of a live-attenuated and an inactivated influenza vaccine given to the cohort of 2-year-olds in the influenza season 2015/16 in Finland [5]. The second study presented the 2016/17 mid-season vaccine effectiveness estimates observed in two cohorts of elderly people aged 65 years and over in Stockholm, Sweden and in Finland [6]. Further studies and insights are also expected from the Integrated Monitoring of Vaccines in Europe network, which has dedicated one work package solely to computerised administrative databases assessing the effectiveness of seasonal influenza vaccines [35]. Likewise, the newly formed Development of Robust and Innovative Vaccine Effectiveness consortium will look at register-based cohort studies to estimate brand-specific influenza vaccine effectiveness [36].
In conclusion, having various national and subnational registers available in the Nordic countries and elsewhere, we see the need to further explore how these powerful tools can be utilised in vaccine effectiveness research to guide decision making and to improve individual health as well as public health.
Footnotes
Acknowledgements
We thank Niina Ikonen and Outi Lyytikäinen for their valuable comments on an early draft of this paper.
Authors’ contributions
UB, RS, HN and JJ acquired the permission to use the data and adapted the study design. UB, KA, SK and JJ planned the statistical analysis. All authors contributed to the conception and design of the manuscript. UB drafted the manuscript. All authors reviewed, provided comments and approved the final version of the manuscript.
Conflict of interest
The authors declare that there is no conflict of interest.
Funding
THL has been supported in conducting register-based studies estimating seasonal influenza vaccine effectiveness through funding from EpiConcept under the Framework Contract ECDC/2014/026 and through funding from the European Union’s H2020 research and innovation programme under the grant agreement 634446.
