Abstract
Aim:
Linking information on family members in the Danish Civil Registration System (CRS) with information in Danish national registers provides unique possibilities for research on familial aggregation of diseases, health patterns, social factors and demography. However, the CRS is limited in the number of generations that it can identify. To allow more complete familial linkages, we introduce the lite Danish Multi-Generation Register (lite MGR) and the future full Danish MGR that is currently being developed.
Methods:
We generated the lite MGR by linking the current version of the CRS with historical versions stored by the Danish National Archives in the early 1970s, which contain familial links not saved in the current CRS. We describe and compare the completeness of familial links in the lite MGR and the current version of the CRS. We also describe planned procedures for generating the full MGR by linking the current CRS with scanned archived records from Parish Registers.
Results:
Among people born in Denmark in 1960 or later, the current CRS contains information on both parents. However, it has limited parental information for people born earlier. Among the 732,232 people born in Denmark during 1950–1959, 444,084 (60.65%) had information on both parents in the CRS. In the lite MGR, it was 560,594 (76.56%).
Conclusions:
Keywords
Introduction
Data in the Danish Civil Registration System (CRS) represent an important research tool for epidemiological and registry-based research. The CRS permits researchers to carry out population-based studies on such topics as the potential clustering of disease and death in families [1,2]. CRS data are an important and rare asset. Similar data are currently available only in Sweden [3], Norway [4], Finland [5] and Taiwan [6]. In connection with other registers and biobanks, the CRS provides the basis for generating important new knowledge relevant to the etiological and prognostic understanding and possible prevention of diseases [7].
For people born in Denmark in 1960 or later, information on parents is readily available in the CRS. The CRS also can identify a broader range of family members but with variations in coverage, depending on familial relationship, year of birth and birth in Denmark versus abroad [1,2].
Recently, a pilot study initiated by the Danish Advisory Board on Register-based Research explored the feasibility of creating a Danish Multi-Generation Register (MGR), which would include more extensive familial links than those recorded in the CRS. The study concluded that more complete familial links could be obtained by linking the official version of the CRS with transcriptions of scanned hard-copy records in the parish registers (PRs; also known as church books).
Our pilot study also ascertained that some parental links are missing in the official version of the CRS but are available in versions of the CRS transferred to the Danish National Archives in the early 1970s, especially for people born in Denmark in the 1950s. This led to the development of a lite version of the MGR, which we describe here for the first time. Finally, in 2019, the Novo Nordisk Foundation funded the National Archives, Aarhus University and the University of Copenhagen to establish a full-scale Danish MGR to strengthen research for the benefit of researchers and society [8].
The full Danish MGR will be established by linking information in the CRS and the PRs to create a resource covering all people born in Denmark in 1920 or later, together with personal identifiers, parish of birth, dates of birth and death, name, vital status, as well as parents’ personal identifiers and names.
Our aims here are (a) to describe and compare the completeness of generational information in the official versions of the CRS with the lite version of the MGR, and (b) to outline methods to generate the full version of the MGR, which is now in development.
Methods
National registration of Danish citizens has existed for decades, recorded in different data repositories [1,2,7,9–11]. Primary information sources relevant for creating the full MGR include PRs, population surveys, index cards stored by municipalities, as well as the electronic CRS. Below we describe these data sources.
Data: PRs
For each parish in Denmark, PRs record major life and church events such as births, baptism, confirmation, marriages and deaths [12]. The first nationwide PRs were introduced by royal legislation in 1645. The early PRs had very few formal registration requirements, resulting in a huge variation in recorded information. In 1813, it was decided that all events should be recorded in pre-printed PR forms, with two copies saved. Since that time, information on all recorded church ceremonies has been preserved. Currently, the contents of PRs have been scanned but not digitised. For each scanned page, information on parish is preregistered and information on life event is pre-printed on the top of each page. The information in the PRs is considered legally valid. All citizens must report all births to the church authorities, irrespective of religion.
Data: population censuses
Nationwide population censuses were conducted in Denmark in 1787, 1801, 1834, 1840, 1845, 1850 and 1860 and then roughly every five years from 1880 to 1970. Each population census typically collected respondents’ name, sex, place of birth and age or date of birth. People living in the same household were linked, along with information on their position in the household, for example head of family, spouse, children, servants or visitors. Our pilot study [8] demonstrated that the quality of information from population censuses is low compared to similar information in PRs. It is challenging to track individuals through different population censuses due to missing and dual registrations, along with variations in recorded names and other identifying information. In some cases, servants may be incorrectly noted as biological children and vice versa.
Data: index cards and the Danish CRS
Longitudinal national registration of Danish residents was established in Denmark in 1924 [13,14]. At that time, members of each family residing at the same address were registered manually on index cards. These index cards were kept in the municipality in which the family lived. From 1924 to 1967, the index cards contained information on name, sex, date of birth and place of birth of all people in the family, i.e., the head of the family (typically the father), the secondary head (typically the mother) and all children younger than age 15 years who are living at home and who themselves neither were married nor had children. The information on the index cards was updated continuously by the local municipality registration offices. This form of registration was used until 1968, when it was replaced by current Danish CRS, which records information electronically.
When the electronic CRS [1,2,13–17] was established on 2 April 1968, all people alive and living in Denmark were registered, particularly for tax collection and other administrative purposes. Thereafter, all people have been registered at birth or upon immigration. In addition, on 1 May 1972, all people alive and living in Greenland were added to the CRS. Although the Kingdom of Denmark consists of Denmark, Greenland and the Faroe Islands, only people living in Denmark and Greenland are included in the CRS. A few people who died before the initiation of the CRS also have been recorded for various reasons (1470 people). Today, the CRS includes information on all people who were alive and permanent residents on 2 April 1968 (Denmark) or on 1 May 1972 (Greenland) or later.
All people registered in the CRS are assigned a unique personal identification number called the CPR number. CPR is an abbreviation for ‘Centrale Person Register’ (the Danish name of the CRS). The CPR number, consisting of 10 digits, is used in all national registers, enabling accurate inter-register linkage. The first six digits indicate the date of birth (day, two digits; month, two digits; year, two digits). The next three digits are a serial number used to distinguish among people born on the same day. The combination of digits at the fifth, sixth and seventh positions indicates century of birth. The final digit indicates the sex. Until 2012, the last digit also served as a control digit originally introduced to minimise recording errors. Once a person has been assigned a CPR number, it follows him or her throughout life. The same CPR number is not used for any other person. If errors occur in an assigned CPR number (e.g. incorrect date of birth or sex), the person is assigned a new CPR number. The CRS keeps a record of any historical CPR number(s).
Vital status is updated continuously, recording whether a person is alive and resident in Denmark, is alive and resident in Greenland, is lost to follow-up (people whose residence is unknown to Danish authorities), has emigrated, or is deceased, along with the date of these events. For people who are lost to follow-up or who have emigrated, information on death is available only if they died in Denmark or the Danish authorities were informed of their death.
Parents
The CRS also includes links to parents (parents’ CPR numbers). This information has been subject to some changes. Beginning in 1968, the CRS established links to parents based on information on families living at the same address as recorded on the index cards stored at municipal offices. From 1968 to 1978, this link was erased when a child moved away from home, when a parent moved, when a child itself had children or when a child reached 18 years of age. From 1978 onwards, the links were changed from being based on addresses to being based on legal relationships, and links to parents were kept permanently. From 2001 to 2003, an extensive review process was performed, where parental links for all people born in Denmark from 1960 to 1978 were updated and validated by manual linkage between the CRS and the PRs. Today, parental links for people born in Denmark in 1960 or later resemble legally valid parental relationships [12] (details in Supplemental Appendix C).
The lite MGR
As mentioned above, some links to parents had been lost in the current operational CRS (official CRS) [8]. Our pilot study demonstrated that for people born in Denmark in the 1950s, this information could be retrieved from an older copy of the Danish CRS stored at the Danish National Archives (archival CRS) [18].
The lite MGR is based on the official 2021 version of the CRS. If the official 2021 CRS contains links to parents, the links are maintained. If a link to either parent is missing, we search for parental links hierarchically in the archival versions of the CRS, first in 1969 and then in 1968 [19].
The full Danish MGR
The full Danish MGR will be established by linking information in both the CRS and PRs. This will create a register including each person born in Denmark in 1920 or later, along with personal identifier, parish of birth, date of birth, name, date of death, date of emigration, date of disappearance, parents’ names and parents’ personal identifiers (i.e. identical to the basic information recorded in the current version of the CRS).
To complete this task, we will use artificial intelligence (AI) to transcribe records in all PRs from 1920 to 1968 (from 1900 to 1968 for marriages). Our AI pipeline is composed of two sequential modules: the image segmentation module and the text transcription module. The former processes the input images of the PRs to locate the relevant information (e.g. birth dates in a first approach) and crop it into individual patches, which generally correspond to cells in the PR tables (Supplemental Appendix B). The transcription module takes the cells as input, and it outputs an electronically transcribed version of the handwritten birth dates therein. We have evaluated approaches based on convolutional neural networks and transformer networks [20].
In the best-case scenario, our AI models will be able to read detect, understand and transcribe all the information accurately recorded in the PRs, securing the completion of the project. In the worst-case scenario, our AI models will be able to detect and transcribe only information on the dates of births recorded in the PRs, and the additional information will not be digitised. However, as shown in Supplemental Appendix A, most individuals in the CRS can be linked uniquely to their birth records in the PRs based only on their parish of birth, sex and date of birth. Thus, to identify an index person’s record in the PRs, it would in most cases suffice if AI is able to read dates of birth in the PRs. To this end, we have evaluated different approaches to automatic transcription of birth dates on a publicly available portion of the PRs. Here, we can recognise full dates with an error rate of 4%. When our model predicts a wrong birthday, in 75% of the cases, the correct date is among the five most likely predicted dates [21].
Since 1926, the PRs have collected information on parents’ dates of birth. For all marriages recorded in the CRS since its establishment, the combination of dates of births among married couples is uniquely identifiable in 98.8% of marriages (disregarding parish). Therefore, by linking parental dates of birth recorded in a child’s birth record in a PR to marriages recorded electronically in the CRS, we can establish parental identity (disregarding names). This requires that both parents were alive and married when the CRS was established.
Thus, if our AI models can read names as expected, we can generate the full MGR. In the worst-case scenario, assuming AI can read only dates of birth, we can identify missing familial relationships among most people recorded in the CRS. A detailed AI transcription manual of PRs used in the project is provided in Supplemental Appendix B.
Familial relationships
Familial relationships considered are defined below and in Figure 1:

The relationship in each box shows what the respective relative’s relationship is to you (as ‘self’). You, your siblings, first cousins, second cousins and third cousins are all in the same generation. ‘Once removed’ means one generation removed. The red numbers in the upper right of each box show your average percentage of genetic similarity. For example. you share on average 50% of your genetic make-up with your siblings, father and mother, but only 25% with your grandparents, grandchildren, uncle and aunts, and nieces and nephews.
Parents are identified when an index person has a recorded link to his or her parents.
Grandparents are parents of an index person’s parents.
Brothers/sisters are people who share both parents with an index person (full brothers/sisters).
Children are people for whom an index person is a parent.
Grandchildren are children of an index person’s children.
Nieces/nephews are children of an index person’s brothers/sisters.
Uncles/aunts are parents’ siblings.
First cousins are people who share both grandparents with an index person.
Great-grandparents are parents of an index person’s grandparents.
Great-uncles/aunts are brothers/sisters of an index person’s grandparents.
Grandnieces/nephews are children of an index person’s nieces/nephews.
Great-grandchildren are children of an index person’s grandchildren.
Results
In 2021, the Danish CRS included 10,104,068 people; of these 5,953,711 (58.92%) were alive, 3,082,985 (30.51%) had died, 1,060,655 (10.50%) had emigrated and 6717 (0.07%) were lost to follow-up. Overall, 4,906,149 (48.56%) had information on both parents in the CRS, and 5,033,612 (49.82%) had information on both parents in the lite MGR. Among the 732,232 people born in Denmark during 1950–1959, 444,084 (60.65%) had information on both parents in the CRS, while 560,594 (76.56%) had information on both parents in the lite MGR (i.e. an additional 116,510 people in the lite MGR with links to both parents).
Familial relationships in the lite MGR versus the CRS
In the following, we describe the completeness of familial links in the CRS and compare with similar information in the lite MGR. This material is included not only for comparative purposes but also as a reference for researchers designing familial aggregation studies. For simplicity, we consider index people (alias self) who were born in Denmark [2].
Figure 2(a) shows the average number of parents registered, according to an index person’s year of birth. The blue line refers to information from the official CRS, and the red line refers to information from the lite MGR. In the official CRS, the average number of parents registered was 0 for people born in 1900. For people born in 1960 or later, basically everyone had both parents registered. The lite MGR offers more complete information on parental links for people born from 1950 to 1960.

Completeness of selected familial relationships by year of birth and data source among index people born in Denmark.
Figure 2(b) shows the average number of grandparents registered by an index person’s year of birth, according to data source. Among people born in the year 1900, the average number of grandparents registered was 0, whereas among people born in 1990, the average number of grandparents registered was 3.5.
Figure 2(d) shows the average number of children registered by data source. Among people born in 1935, the average number of children registered was roughly 2.1, whereas among people born in the year 2000, the average number of children registered was close to zero (0.02). This graph illustrates the mixture of completeness of registration in the CRS (left truncation) combined with the fact that people may not have had all their children yet (right truncation). This complexity increases when looking at other types of relatives. These truncation mechanisms are vital for the understanding of completeness of familial links by year.
Figures 2–4 show the average number of family members by an index person’s year of birth for the different familial relationships considered. As expected, for all familial relationships under consideration, the lite MGR offers more complete information on familial relationships than the official CRS. The left truncation in completeness by year of birth depends on the most recent common ancestor (parents, grandparents, great-grandparents in Figure 1). For example, for an index person born in 1960 or later, the CRS contain information on all parents, siblings, nieces and nephews and grandnieces and grandnephews. For relatives having grandparents as the most recent common ancestor, complete links are available from 1990 onward, and for relatives where the great-grandparents are the most recent common ancestor, complete links are still not available for index people born in 2021. When the index person is the most recent common ancestor (children, grandchildren and great-grandchildren), complete links are available if the index person was born in approximately 1935 or later.

Completeness of selected familial relationships by year of birth and data source among index people born in Denmark.

Completeness of selected familial relationships by year of birth and data source among index people born in Denmark.
Discussion
This is the first paper to describe the completeness of familial links in the official version of the CRS and to compare them to similar numbers in the lite MGR. This material is included not only for comparative purposes but also for reference to researchers designing familial aggregation studies. There is large variation in completeness of familial relationships by year of birth, type of relationship, sex and age at moving away from home [2].
Compared with parental links in the official CRS, the increase in completeness in the lite MGR may seem modest. The increase in completeness becomes more apparent when considering other familial relationships, such as first cousins. This is a natural consequence of familial links being established using a trio of information (index person, mother, father) from the registers, combined with the higher frequency of first cousins compared to the other familial relationships considered.
The Danish registries contain data about health, education, income and other variables for the entire Danish population since the 1970s. When these data are linked within and across generations, researchers will have novel opportunities for research on family-related diseases, health patterns, social factors and demography. This information represents a foundation for epidemiologists’ ‘wildest dreams’ [22]. Only a few countries have a MGR allowing such research [23]. There are many Danish nationwide data resources available, for example treatments at somatic hospitals from 1977 onwards [11], treatments at psychiatric hospitals from 1970 onwards [9], obstetric information from 1973 onwards [24], cause of death from 1970 onwards [25], cancer register from 1943 onwards [26], prescribed medicine from 1994 onwards [27] and social and socio-economic information from 1981 onwards [28,29]. All nationwide registers are linkable within and between registers, individuals, families and neighbourhoods [30].
Within the field of psychiatric epidemiology, familial information already recorded in the CRS has allowed researchers to conduct novel studies on familial aggregation of childhood- and adult-onset mental disorders [31,32]. However, information in the CRS currently does not permit examination of topics such as familial aggregation of Alzheimer’s disease due to its late onset (from age 70 onwards [33]) and subsequent lack of parental information for this group. A full MGR will make it possible for researchers to undertake studies of late-onset diseases and disorders, as well as more detailed analyses of familial aggregation.
In combination with longitudinal Danish registers containing data from the 1970s, the MGR will foster novel studies comparing risks for people in the same generation. An index person, his or her siblings and first cousins are in the same generation and have thus lived during the same time period and have the same type and quality of information recorded in the Danish registers. This is a benefit of identifying grandparents in combination with the currently available Danish rich population-based registers.
Automatic transcription of historical sources is an upcoming field (e.g. https://read.transkribus.eu). Crucial steps in the PR transcription pipeline are layout analysis and handwritten text recognition, both of which are active areas of research [34–37]. Currently, there are no ready-made algorithms suitable for the transcription of Danish PRs. However, our approach to transcribe the PRs automatically based on a combination of convolutional neural network and a transformer architecture is likely to achieve accurate text recognition with error rates as low as 3–5% based on our proof-of-concept experiments [21] and the existing literature.
During 2025, the full MGR will be available for research purposes through the Danish National Archives, the Danish Health Data Authority and Statistics Denmark following their standard data access guidelines.
A full MGR will represent much more than an additional research register. It will provide an infrastructure tying existing research infrastructures and registries more strongly together, increasing their value and applicability. In the long term, a full MGR will provide much improved options for investigations of the theories of causal relationships, leading to research on better treatment options and preventive efforts in many research areas, including health, history and socio-economic disciplines. A full MGR will improve the value of the unique Danish registers and biobanks [38], allowing researchers to analyse patterns across generations.
Supplemental Material
sj-docx-1-sjp-10.1177_14034948221147096 – Supplemental material for Towards more comprehensive nationwide familial aggregation studies in Denmark: The Danish Civil Registration System versus the lite Danish Multi-Generation Register
Supplemental material, sj-docx-1-sjp-10.1177_14034948221147096 for Towards more comprehensive nationwide familial aggregation studies in Denmark: The Danish Civil Registration System versus the lite Danish Multi-Generation Register by Jeppe Klok Due, Marianne Giørtz Pedersen, Sussie Antonsen, Joen Rommedahl, Esben Agerbo, Preben Bo Mortensen, Henrik Toft Sørensen, Jonas Færch Lotz, Laura Cabello Piqueras, Constanza Fierro, Antonia Karamolegkou, Christian Igel, Phillip Rust, Anders Søgaard and Carsten Bøcker Pedersen in Scandinavian Journal of Public Health
Footnotes
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship and/or publication of this article: The study was funded by the Novo Nordisk Foundation. The funder had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. Carsten Bøcker Pedersen, Marianne Giørtz Pedersen and Sussie Antonsen had full access to all the data in the study and take full responsibility for the integrity of the data and the accuracy of the data analysis.
ORCID iDs
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
