Abstract
Like all behaviour, personality traits are substantially heritable, but their genetic background is poorly understood. Investigating traits’ genetic background could help explain disparities in health and other life outcomes they contribute to. We describe two cohorts of the Estonian Biobank for whom, besides self- and informant-rated personality traits, detailed data are available on a wide range of measures including health behaviour, biomarkers, anthropometric measurements, and medical diagnoses and treatments. The first cohort (Nself-report = 3,640, Ninformant-report = 3,488) filled out the NEO Personality Inventory-3 (NEO-PI-3) between 2008 and 2018. The second cohort (Nself-report = 77,400, Ninformant-report = 21,986), collected between 2021 and 2022, responded to a large and diverse item pool called the 100 Nuances of Personality (100NP) covering the Big Five and other traits. Research opportunities include investigation of personality traits’ properties, gene discovery, prediction of health and well-being, and causal modelling. New data are added periodically through additional data collection waves and linkage with various registries and databases.
Introduction
Persistent individual differences in personality traits—relatively stable patterns of thought, affect, motivation, and behaviour (Baumert et al., 2017; Kandler et al., 2014)—are well-documented but their causes are still poorly understood. For instance, behavioural genetic studies have estimated the Big Five personality traits’ heritability to be about 40% (Vukasović & Bratko, 2015), suggesting strong genetic contributions while leaving ample room for environmental influences. But the heritability estimates say little about the specific factors (e.g., genetic variants or life experiences) that drive variation in traits. As newer advances in genomics have enabled exploring genetic mechanisms in more detail, it has become increasingly clear that personality traits are polygenic with thousands of variants making small contributions to observable personality differences. The genetic variants identified so far account for very small proportions of the heritable variance, and even together, they only explain a fraction of traits’ heritable component (Gupta et al., 2024; Realo et al., 2017; Sanchez-Roige et al., 2018), suggesting that much is still to be discovered. The same applies to trait-relevant life experiences (Bühler et al., 2023).
Knowing personality traits’ genetic and experiential backgrounds can advance not only personality science but also healthcare and well-being. For example, traits’ links with health and both mental and physical well-being have been widely studied (Strickhouser et al., 2017) but whether and when these links are causal is generally still unclear; knowing which genetic variants shape traits can help test causality. Further, designing and testing targeted interventions on personality traits could help people change or work around their traits as desired to improve well-being.
Here, we describe two cohorts of the Estonian Biobank (EstBB) for whom detailed measurements of self- and informant-reported personality traits have been collected. The first cohort, henceforth called the EstBB PS08, was assessed with the NEO Personality Inventory-3 (NEO-PI-3; McCrae et al., 2005) between 2008 and 2018. The second and larger cohort, henceforth called the EstBB PS21, underwent personality assessment between 2021 and 2022 with the 100 Nuances of Personality (100NP; Henry & Mõttus, 2024), an item pool designed for comprehensive and reliable personality assessment. Additional questionnaires have been administered to assess well-being, life satisfaction, lifestyle, sleep, life experiences, and attitudes. Both self-reported and registry data on physical and mental health, medical service use, medication use, and health behaviour are obtained through linkage with national registries. Below, we outline some of the conceptual considerations that guided personality data collection in the EstBB, highlighting how detailed, multi-method measurements allow investigating associations in greater detail while minimizing biases; give an overview of the EstBB, including its conception and general principles of data collection; describe the two partially overlapping personality cohorts and provide snapshots of the research done with them to date; list some of the additional data available in EstBB; outline some future data collection plans including follow-up personality assessments; and describe data sharing principles as well as opportunities for collaboration.
Conceptual Background
The knowledge that can be gained with research directly depends on how the phenomena of interest are conceptualized and measured. For instance, when we seek to identify genetic variants that shape traits, the traits need to be measured in sufficient detail, with sufficient reliability and validity, and in sufficiently large samples. Below, we elaborate on these central considerations of personality data collection and analysis.
Comprehensive Personality Measurement
People differ in their personalities in countless ways. These differences are typically summarized with a few broad traits, each encompassing a range of more specific traits. The best-known “Big Few” personality trait taxonomies are the Five-Factor Model (FFM; McCrae & John, 1992) and the nearly identical Big Five (Goldberg, 1990) which summarize personality variance with the domains extraversion, openness to experience, agreeableness, conscientiousness, and neuroticism (or its opposite, emotional stability). Another widely used Big Few model, the HEXACO (Ashton & Lee, 2020), adds a sixth, honesty-humility domain.
While the Big Few domains have been instrumental in describing the major ways people differ, a comprehensive description of personality requires a more thorough approach. Thus, the domains are often split into a number of narrower facets. For instance, the NEO personality inventories (McCrae et al., 2005) split the FFM domains into 30 facets and the HEXACO model splits the six domains into 25 facets (Lee & Ashton, 2018); a recently proposed model contains 70 facets within and beyond the Big Few (Irwing et al., 2023). In recent years, researchers have started to distinguish an even lower level of traits within facets, called nuances (Condon et al., 2021; McCrae, 2015), which are theorized to constitute the lowest level of the personality trait hierarchy. Facets and nuances resemble domains in important ways: variance in facets and nuances is also partly stable, heritable, and agreed on by different raters (e.g., Kandler et al., 2010; Mõttus et al., 2019). But besides having these trait-properties, facets and nuances can provide incremental value for describing, predicting, and explaining important phenomena (Revelle, 2024; cf. Allik et al., 2024): many facets and nuances show unique developmental trajectories and associations with life outcomes that are often stronger than domains’ (Stewart et al., 2022).
With its 30 facets, the NEO-PI-3, used to assess the EstBB PS08 cohort, provides considerably richer descriptions of personality traits than those solely assessing domains. However, no commonly accepted taxonomy of facets currently exists (Condon et al., 2021; Irwing et al., 2023), and even detailed measures like the NEO-PI-3 have their limitations. For instance, it is possible and even likely that the existing facet systems (a) have not achieved optimal ways to group lower-level traits or (b) leave out potentially important trait content (Irwing et al., 2023). Whereas the former limitation can easily be overcome by using the unaggregated items of existing inventories or aggregating them to suit any specific research purpose, the latter needs a different solution—one that is not limited by the assumption that an inventory designed to measure 5 (or 6) personality domains covers all trait content. Hence, the second personality cohort, EstBB PS21, was assessed with the 100NP, an item pool specifically developed in a bottom-up way for a maximally comprehensive and reliable coverage of the broad trait spectrum (described further below). Its items were carefully selected to measure the Big Five and HEXACO facets and nuances as well as others beyond these, resulting in broader coverage than any standard inventory. The 100NP is publicly available and free to use.
Detailed assessments are valuable for many research and practical objectives. Deep phenotyping is often discussed as an approach to enhance understanding of complex diseases and disorders and advance personalized medicine (Bycroft et al., 2018). Yet, biobanks and panel studies that incorporate genetic data often rely on very short personality inventories that do not distinguish lower-order traits and inevitably have lower reliability and validity than longer inventories (e.g., lower content validity due to incomplete coverage of the intended constructs; McCrae, 2015). For example, a recent large study on personality trait genetics (Gupta et al., 2024) assessed the Big Five domains with two items each; in comparison, the NEO-PI-3 includes 48 items per domain and eight items per facet. Reliance on short inventories also limits what can be learned about traits’ genetics or their interplay with health, but detailed measurements enabled by facet- and nuance-based instruments open possibilities for in-depth analyses.
Complementing Self-Reports With Informant-Reports
Personality traits are most commonly assessed through self-reports that are easy to collect from large numbers of respondents. But self-reports are not direct trait assessments, and it is important not to confuse assessments with the assessed constructs themselves. While traits represent broad collections of thoughts, feelings, behaviours, and motivations, self-reporting is itself a specific kind of behaviour that requires reading and interpreting questionnaire items, retrieving autobiographical and other relevant information, comparing this information to the items’ interpretations and judging their similarity, and clicking on a button to relay the ultimate decision. It is therefore not surprising that around 40% of reliable variance in self-reports is method-specific, likely reflecting rating and unique item/trait interpretations and unique aspects of people’s identities that cannot be independently verified (McCrae et al., 2019).
The only way to address these issues is to complement self-reports with other methods. Currently, the only supplementary method with sufficiently high convergent and face validity that is also deployable at scale is ratings by knowledgeable informants. Available evidence suggests informant-ratings have comparable accuracy to self-reports. For instance, self- and informant-ratings are highly correlated by psychological research standards, typically r = .40 to .60, supporting the two sources’ convergent validity (Allik et al., 2010, 2016; Connelly & Ones, 2010). Further, the high similarity between self–informant and informant–informant correlations suggests that discrepancies between self- and informant-ratings largely reflect rater-specific biases rather than raters’ asymmetrical information about the target; and, from a more practical perspective, informant-reports’ predictive validity is often comparable to and sometimes higher than self-reports’ (Connelly & Ones, 2010). All in all, by reducing rater-specific biases, occasion-specific and random error, self- and informant-reports’ shared variance provide more reliable and valid trait assessment than either source alone (McCrae et al., 2019). In the two EstBB cohorts, self-reports of personality traits are available for all or most participants and informant-reports are additionally available for large subsets. In fact, the EstBB PS21 cohort may be the largest sample to date in which personality traits have been comprehensively assessed using both self- and informant-reports to date. Before that, the EstBB PS08 was likely the largest for over a decade.
Sample Size
Many of the associations of interest in personality science, such as those used in predictive or causal modelling, may be too weak to be reliably estimated in small samples. Large samples may therefore be needed to accurately map personality traits to outcomes as diverse as longevity, number of children, social preferences, and place of residence or associated environmental markers. Rare phenotypes in particular, such as certain sexual orientations or uncommon life events, require large samples for association testing. Large samples are also needed to detect genetic variants’ effects on traits, given that there seem to be no variants with large effects to be discovered. Yet, small and even minuscule gene–trait associations may be useful to know as genetic variants that individually have no detectable association with a given phenotype can together contribute substantially to its prediction—for instance, within polygenic scores (Dudbridge, 2013). Thus, very large samples are needed to build maximally predictive (or practically meaningful) models.
To summarize, the EstBB personality cohorts are particularly unique and valuable resources for research on personality traits and their effects thanks to their large sample sizes and detailed, multi-method measurements and the ability to link them to various other datasets and registries.
Estonian Biobank
EstBB is a population-based biobank that includes over 212,000 people aged 18 years and older (https://genomics.ut.ee/en/content/year-2023-institute-genomics), representing about 20% of the Estonian adult population. Initiated by the Estonian Genome Project Foundation in 1999, a central aim of the Estonian Biobank Project has been to improve population health by investigating risk factors for diseases and traits and applying the results. After its reorganization in 2007, it became the Estonian Genome Center of the University of Tartu. The project is regulated by Estonian Human Genes Research Act (https://www.riigiteataja.ee/en/eli/531102013003/consolide) and Personal Data Protection Act (https://www.riigiteataja.ee/en/eli/523012019001/consolide).
The first participants joined the EstBB and gave blood samples for DNA extraction in 2002. During the first decade of the EstBB, participants were primarily recruited by general practitioners across the country from among individuals visiting general practitioners’ offices and hospitals (more details are given by Leitsalu et al., 2014). Along with giving blood samples, participants completed an extensive computer-assisted phenotype questionnaire consisting of about 330 questions and 1,000 data fields about their demographic, genealogical, educational and occupational backgrounds, as well as lifestyle, health status, and medical history (Leitsalu et al., 2015). Anthropometric measurements were taken, and blood pressure and resting heart rate were measured (Leitsalu et al., 2014). By the end of 2010, 52,000 people, corresponding to about 5% of the Estonian adult population, had been recruited, fulfilling the initial goal of data collection (Leitsalu et al., 2015).
When a new rapid wave of recruitment began in 2017, the sign-up procedure was considerably simplified: participants only needed to sign an online consent form, visit the nearest healthcare provider or pharmacy to donate a blood sample, and complete an online questionnaire. The questionnaire covered similar domains as previously although in less detail (Milani et al., 2025). The majority of participants (about 150,000) joined the biobank between 2018 and 2019 (https://geenidoonor.ee/). Figure 1 depicts a timeline of participants joining EstBB and the two personality cohorts. Timeline of New Participants Joining EstBB and its Personality Cohorts. Note. The Percentage of New Participants Each Month Relative to the Total Sample as of the End of 2022 is Shown
Among EstBB participants, women and Estonians are overrepresented and men, Russians and other ethnic groups underrepresented compared to the Estonian population. EstBB participants also tend to be more educated than the Estonian population on average. Note that due to the change in recruitment practices, the participants recruited in the second wave may differ from those who joined in the first wave in their sociodemographic characteristics. The personality cohorts are described and compared below.
In accordance with the Estonian Gene Research Act and the broad informed consent form signed by all participants, the data collected directly from participants can be linked with health data from various national databases and registries. Additionally, as it is permitted to recontact participants, various data have been collected throughout the years from different EstBB subsamples, including the personality measurements and other data described below. By signing the informed consent form, all participants have given broad consent to use their data in future research not foreseen at the time of giving consent in studies approved by the Research Ethics Committee of the University of Tartu (Leitsalu et al., 2015).
Genotyping
Genotyping was done at the Core Genotyping Lab of the Institute of Genomics, University of Tartu, using Illumina Global Screening Array (GSA) v1.0, v2.0, and v2.0_EST which contain 700,000 markers across the genome and specific add-on content of 2,000 single-nucleotide variants (SNVs) identified by whole genome sequencing in the Estonian population. A population-specific imputation reference panel of 2,297 samples was used (Mitt et al., 2017). Whole genome sequencing is available for over 2,800 and whole exome sequencing for over 2,400 participants. Typically, genomic data from about 206,000 participants are used, excluding those with less than five rows in the health record database over 15 years, those with non-European group ancestry, and heterozygosity outliers for common single nucleotide polymorphisms (≥3 SDs) as deviations can indicate sample contamination or inbreeding. Similarly to Privé (2022), participants are aligned to 21 major ancestry groups, and people with assigned non-European ancestry are removed, keeping Europeans, Finns and Italians.
From this sample, genetic relationships can be inferred for classical heritability and family studies. The full EstBB cohort and the EstBB PS21 cohort’s self-report and informant-report subsamples (described further below) respectively contain 285, 58, and 9 monozygotic twin pairs (100% genetically identical relatives), 32,218, 6,488, and 938 nuclear-family (genetically nonidentical) sibling pairs including dizygotic twin pairs, 70,754, 13,220, and 1,964 parent–offspring pairs (first degree relatives), and 115,341, 16,426, and 1,578 pairs of genetically nonidentical second degree relatives. The respective cohorts also include 8,270, 1,228, and 152 family trios with both parents and at least one offspring present. Notably, each EstBB participant can be related to several others: for instance, a participant can simultaneously be sibling to one, child to another, and parent to yet another participant. In fact, 90% of EstBB participants have third-degree or closer relatives in the biobank (Milani et al., 2025), making EstBB especially well suited for family-based studies.
First Wave of Personality Measurement: The EstBB PS08 Cohort (N = 3,640)
The first wave of personality data collection took place between 2008 and 2018. A subset of EstBB participants was asked to complete the self-report version of the Estonian NEO-PI-3 (described below) and to find an informant who would describe their personality using the inventory’s informant-report form. Along with the NEO-PI-3, items about life satisfaction, happiness, general trust, self-reported cognitive abilities, and general self-esteem were included. The data were collected from EstBB participants partaking in various additional data collection projects as well as from new joiners to the EstBB. Between 2008 and 2010, data were collected using paper questionnaires. Consenting participants received a questionnaire to fill out at home and return either by mail or in person. Starting in 2011, respondents could choose between completing the questionnaire on paper and filling it out online. Responses were received from 1,756 people between 2008 and 2010 and from 1,831 people from 2011 onward. In the latter group of people, 948 (52%) completed the questionnaire on paper (53 people had missing data on year or mode of responding).
Ethics approval for collecting NEO-PI-3 measurements was obtained from the Research Ethics Committee of the University of Tartu (document number 170/T-38, dated April 28, 2008). All participants signed a separate consent form either on paper or digitally.
Sample Characteristics
Altogether, NEO-PI-3 data were obtained for 3,640 EstBB participants. After excluding respondents with missing data, complete self-report data are available for 3,601 participants and complete informant-report data for 3,457. Ratings from both sources (self and informant) are available for 3,419.
Characteristics of the Two EstBB Personality Cohorts
Note. Data on participants with self-report data available are presented. More data on the cohorts can be found in Supplemental Table 1 (EstBB PS08 cohort’s informants’ characteristics) and Supplemental Table 5 (EstBB PS21 cohort’s respondents compared to non-responding EstBB participants).
aSettlement type is determined as a function of population size and density (see https://andmed.stat.ee/en/stat/rahvastik__rahvastikunaitajad-ja-koosseis__rahvaarv-ja-rahvastiku-koosseis/RV0291U).
bFor participants with informant data available.
Estonian NEO-PI-3
The NEO-PI-3 (McCrae et al., 2005) is a 240-item personality inventory slightly modified from its predecessor, the Revised NEO Personality Inventory (NEO PI-R; Costa & McCrae, 1992). This inventory assesses the FFM domains, as well as six narrower facets within each domain. Each facet is assessed with eight items and each domain with 48 items (described in Supplemental Table 2). Answers are provided on a 5-point Likert scale ranging from 0 (strongly disagree) to 4 (strongly agree). Other items administered to the EstBB PS08 cohort are shown in Supplemental Table 3.
The NEO-PI-R was adapted into Estonian by Kallasmaa et al. (2000) and later modified to create the Estonian NEO-PI-3, ensuring that all items correspond to the English NEO-PI-3 in content and direction. The process of modifying the Estonian NEO-PI-R into the NEO-PI-3 is described in detail in Supplemental Information. The Estonian NEO-PI-R/NEO-PI-3 has excellent psychometric properties, with the five domains’ 2-year retest reliabilities ranging from .67 to .86 (Kallasmaa et al., 2000). On average, more than 60% of the genetic variance in NEO-PI-R items is unique to them, not shared with the variance of any domain or facet: median reliability-corrected heritability estimates of raw item scores and items’ unique variances were .42 and .28, respectively (Mõttus et al., 2019, 2022). Likewise, of the stable variance in NEO-PI-R items, over 75% is their unique variance (Mõttus et al., 2019).
Studies to Date
The EstBB PS08 cohort has already been used in over 40 publications. The works published to date have addressed a variety of topics, including genetics of personality, personality differences by age and sex, cross-cultural comparisons, social desirability, configuration of personality traits, reliability and self–observer agreement in personality research, harmonization of phenotypes across inventories, within-trait heterogeneity, and personality traits’ relations with outcomes like body mass index (BMI), eating habits, longevity, COVID-19 vaccination uptake, various mental and physical health issues, risk-taking, adverse drug reactions, chronotype, general well-being, mobility patterns, educational attainment, and occupations. The published papers and their main results are summarized in Supplemental Table 4.
Second Wave of Personality Measurement: The EstBB PS21 Cohort (N = 77,400)
The EstBB PS21 cohort’s personality was assessed between November 2021 and April 2022. All EstBB participants who were alive at the time and had provided valid e-mail addresses were invited to participate, with one or two reminders sent out as necessary. The study was additionally advertised in various media channels including television, radio, newspapers, and social media. Participation was voluntary and no compensation was offered besides feedback on the respondents’ Big Five personality traits. Along with personality traits, data were collected on socioeconomic and demographic status, life satisfaction, various attitudes, and recent life events (see below). At the beginning of the questionnaire, each participant was asked to recruit an informant, such as a spouse, partner, relative, or friend, to provide informant ratings about their personality traits. After completing the questionnaire, respondents were given feedback on their Big Five domains if they opted in.
Ethics approval to collect 100NP measurements was given by the Estonian Committee on Bioethics and Human Research (document number 1.1-12/202, dated 12.10.2021). At the start of the study, participants were presented with an informed consent form, which they had to digitally sign in order to proceed.
Sample Characteristics
Out of the 179,055 people who were invited to participate, 45% (80,116) participated. After removing participants with over 10% missing personality ratings, 77,400 were retained, corresponding to over 7% of the Estonian adult population as of January 2022. Invitation e-mails were also sent to 32,272 informants, 72% of whom responded (23,209). After removing those with over 10% missing personality ratings, 21,986 informants’ ratings remained. The EstBB PS21 and PS08 cohorts partially overlap: 1,452 participants are present in both cohorts; for 538, informant ratings were collected in both waves.
The EstBB PS21 cohort is described in Table 1; further details are shown in Supplemental Table 5. As in the EstBB PS08 cohort, the age distribution is wide. Again, women and Estonians are overrepresented. Over half of the cohort had obtained a higher education by the time of responding. It was possible to complete the questionnaire in either Estonian or Russian; 5% chose Russian. Compared to the EstBB participants who did not participate in the 100NP data collection wave, 100NP respondents were younger (t = 35.19, p < .001), more likely to have a university degree (X2 (1) = 836.4, p < .001), and be women (X2 (1) = 1,164.6, p < .001) and Estonian (X2 (1) = 3,706.7, p < .001). Compared to respondents with only self-report data, respondents with informant-report data available were more open (d = 0.25, p < .001) and extraverted (d = 0.06, p < .001) and less conscientious (d = −0.06, p < .001).
The 100 Nuances of Personality (100NP) Item Pool
The 100NP item pool was developed for comprehensive and reliable personality measurement following the recommendations of Condon et al. (2021). Items were selected for high levels of reliability and minimal redundancy to ensure comprehensive, effective, and efficient assessment of personality traits. The item pool’s 198 items assess the Big Few domains as well as many nuances beyond them (note that the inventory’s name refers to its goal of comprehensive assessment rather than indexing exactly 100 narrow traits). The items can be aggregated into Big Five and HEXACO domains and facets, as well as into other (trait) composites based on specific research goals. The development and validation of the Big Five scores based on the 100NP items alongside the evidence for the scales’ reliability and validity is described by Anni et al. (2024). In short, the 100NP Big Five scores were nearly orthogonal, had test–retest reliabilities over r = .85 and above-usual cross-rater correlations, meaningfully tracked various criterion variables, and correlated highly with the corresponding Big Five scores of other instruments, but explained more variance in them than the other way around. Separate data collected in English and Estonian show that the 100NP items have higher-than-usual test-retest reliability (median r = .69 and .67, N = 888 and 545, respectively) and cross-rater correlations (median r = .37, N = 656; English-speakers only). The development and properties of the 100NP are described in detail by Henry and Mõttus (2024).
The 100NP has separate self- and informant-report forms in both Estonian and Russian and has also been translated into German, Norwegian, Italian, and Chinese. The informant-report forms closely correspond to the self-report forms with first-person language (“I am”, “I tend to”) replaced with third-person language (“he/she is”, “he/she tends to”; note that the Estonian language does not distinguish between genders). The 100NP is an open item pool freely available for use. The original English 100NP items as well as their self- and informant-form translations into Estonian and Russian can be found in Supplemental Tables 6–7 along with details on calculating the Big Five scores (see Supplemental Table 6 note). Additional items presented to the participants and informants are shown in Supplemental Table 8.
Studies to Date
Given that the 100NP data became available relatively recently, considerably less work has been published with this cohort. To date, the cohort has been used to predict subjective well-being and vaccination against COVID-19, showcasing the value of informant-report data in predicting important outcomes and reducing bias in estimating correlations. Ongoing projects explore links with genetic variants, Dark Triad traits, loneliness, and occupations, among others. Published and ongoing work using the EstBB PS21 cohort is summarized in Supplemental Table 9.
Data Beyond Personality Traits
Besides the detailed medical background questionnaire, various data are available for subsets of the EstBB, including EstBB PS08 and/or EstBB PS21. Below, we describe some of the data that personality researchers may find the most relevant. Figure 2 shows the approximate numbers of participants in the subsets; the data are also summarized in Supplemental Table 10 with additional references to publications using or describing these data. Sample Sizes for a Selection of Measures Available in EstBB. Note. The Measurement Instruments are Described in Supplemental Table 10. Health Data From National Databases and Registries are Described in the Main Text. More Information on Omics Profiling is provided by Milani et al. (2025)
Health Status, Treatments and History From Linked Databases
EstBB data have been linked with various national registries and databases that provide information on different aspects of people’s health status and history. Information on medical diagnoses, treatment services, surgical procedures, analyses, and prescriptions is obtained from various sources, including the Estonian eHealth Foundation information system, the Estonian Health Insurance Fund database, as well as the databases from the largest hospitals in Estonia. Information on life history and status are received from the Estonian Population Register (birthplace, marital status, date of death) and the Estonian Causes of Death Registry (in-depth information on conditions and causes of death). More specialized linked databases include the Estonian Cancer Registry (descriptions of cancer cases and undertaken treatments), the Estonian Cancer Screening Registry, the Estonian Myocardial Infarction Registry, and the Estonian Tuberculosis Registry. The frequencies of linking data to the different registries are reported by Milani et al. (2025).
Medical history and current health status are coded following the International Classification of Diseases (ICD-10), and medications following to the Anatomical Therapeutic Chemical (ATC) classification (Leitsalu et al., 2014). The NOMESCO Classifier for Surgical Procedures (NCSP) is used for surgical procedures. The linked data are harmonized periodically, resulting in longitudinal data (including past diagnoses) that allow the study of disease trajectories and ensure up-to-date information. The process of linking EstBB data with the various databases and registries has been described in detail by Leitsalu et al. (2015).
Mental Health
A psychiatric module was added to the EstBB baseline questionnaire in 2007 (Leitsalu et al., 2014). This module included the Mini-International Neuropsychiatric Interview (MINI; Sheehan et al., 1998), a brief diagnostic interview for DSM-IV and ICD-10 psychiatric disorders, and The Swedish universities Scales of Personality (SSP; Gustavsson et al., 2000), a psychometrically improved shorter revision of the Karolinska Scales of Personality (KSP) which covers various personality constructs of interest in psychiatric and psychobiological research. However, data within this module were collected from very limited numbers of people (Ns < 300). In the same year, a sleep module was added and the Munich Chronotype Questionnaire (MCTQ; Roenneberg et al., 2007) has since been administered to about 71,000 people.
A much larger mental health and well-being study was carried out more recently. Data were collected between March and July 2021 on psychiatric disorders and related phenotypes with a Mental Health Online Survey (MHoS) questionnaire covering self-reported current and lifetime symptom-level information assessed with brief screening instruments on a broad range of common psychiatric disorders, their risk factors, medication use and side effects (Ojalo et al., 2024). Symptoms of ADHD, anxiety, depression, eating disorders, gambling, mania, psychotic experiences, PTSD, substance use, and suicidality were covered. Information on contextual factors (lifestyle, life satisfaction, social support) and events (childhood adversity, stressful life events) was also collected, as well as measures of stress during the COVID-19 pandemic. The MHoS study had over 86,000 respondents with a response rate of 47%.
Other Questionnaires
Over the years, subsets of EstBB participants have responded to several other add-on questionnaires. A diet questionnaire has been administered to measure the frequency of consumption of 15 typical foods and a food neophobia questionnaire to measure avoidance of new and unfamiliar foods. A COVID-19 severity questionnaire was added in 2020 along with mental health questions. In 2022, approximately 40,000 participants completed a medication side effects questionnaire that mapped self-reported adverse drug reactions to many common medications. Some participants have also had their mobile phone cell-tower based movement activity tracked for one year.
Metabolomics and Biomarkers
The collection of blood samples from EstBB participants enables testing their blood for various biomarker levels, enabling investigations of potential mechanisms of disease. Several metabolomics datasets are available, as well as a dataset that entails clinical biomarker entries and written clinical epicrises which is periodically updated, allowing for longitudinal analyses. Nightingale Health’s biomarker analysis (based on nuclear magnetic resonance spectroscopy coupled with proprietary algorithms; Nightingale Health Biobank Collaborative Group et al., 2023) to quantify over 200 biomarkers in absolute units from a single blood sample has been used to generate plasma metabolite profiles. Mass spectrometry-based data, microbiome, and clinical biochemistry measurements are also available for smaller samples.
Data Collected From the EstBB PS21 Cohort Along With the 100NP
Whereas the data described above are available for different subsets of EstBB participants, including some or all of the two personality cohorts, the following data were collected alongside the 100NP. The exact phrasings of the items in Estonian and Russian along with English translations are given in Supplemental Table 8.
Recent Experiences and Life Events
To enable exploring common life events’ influences on personality traits, questions about having experienced 12 life events in the domains of work, health, family and relationships, as well as experiencing adverse events within the last year were included. Considering that personality traits may shape the subjective experience of life events and their influences, participants were also asked to rate these events’ subjective importance.
Attitudes and Life Satisfaction
Attitudes may also be linked with personality traits. The EstBB PS21 respondents were asked five questions about their social and political attitudes and five questions about their attitudes towards the environment and sustainability. Further, they were asked six questions regarding their life satisfaction in the domains of work, finances, relationships, residency, governance, and health.
Eating Behaviour
Five items were included to assess typical eating behaviours such as dieting, pace of eating, and perceived loss of control over eating. The questions were chosen to be maximally different to characterise eating behaviours as completely as possible while still keeping the number of items to a minimum. The items, administered in both self- and informant-report forms, are shown in Supplemental Tables 6–7.
Future Plans
Data collection in EstBB is ongoing. Specific plans in the psychological domain include follow-up personality measurements with the 100NP and repeated cognitive testing of all EstBB participants.
Personality Traits
Repeat administrations of the 100NP are planned to take place in 2025 and 2028 to enable longitudinal analyses of personality traits’ links with health outcomes, including predictive modelling.
Cognition: TestMyBrain Neurocognitive Tests
Cognitive abilities—intelligence, decision-making, and other mental operations—could provide incremental value over personality traits in understanding health behaviours and predicting health outcomes. In the future, all EstBB participants will be asked to complete various cognitive tests to efficiently but broadly characterise aspects of neurocognition that could link to health. For instance, the TestMyBrain toolkit (Singh et al., 2021; https://testmybrain.org/) is well suited for this aim as it offers a range of validated cognitive tests freely available to researchers. Pilot findings suggest acceptable test–retest reliability across computer and smartphone for most tasks (https://osf.io/xc65p).
Limitations
Participation bias is inherent in volunteer-based studies. Accordingly, compared to the general Estonian population, EstBB and both its personality cohorts include larger proportions of women and Estonians and have higher education levels, on average. Likewise, a healthy-volunteer bias has been reported for the second recruitment wave of the EstBB (see Milani et al., 2025). Although recruiting people from underrepresented groups remains a challenge in general, statistical corrections such as over- and undersampling may help reduce biases arising from these imbalances (see Arumäe et al., 2024 for an example).
Also, generalizations to other populations should be made cautiously, with the genetic and sociocultural background of Estonia and the EstBB subsamples in mind. For instance, genetic risk models may not generalize well to other (e.g., non-European) ancestry groups (Carlson et al., 2013), nor may all findings pertaining to personality traits and other psychological factors. That said, at least some of our findings so far have also replicated in other samples (Allik et al., 2010, 2015, 2016; Anni et al., 2024; Mõttus et al., 2024).
Finally, although using two sources of personality ratings is a particular strength of the EstBB personality samples over other large-scale studies, self- and informant-reports both may be susceptible to some biases, such as those arising from stereotypes (e.g., “I/they have obesity; therefore I/they must be lazy“). Moreover, ideal data could include ratings by multiple informants, providing more perspectives on the participants’ traits.
Data Sharing and Collaboration
Although EstBB data cannot be made publicly available, it can be shared in accordance with the Human Genome Research Act. The process for accessing EstBB data is described at https://genomics.ut.ee/en/content/estonian-biobank#nav-24032. To discuss collaborative projects involving personality data, please contact the corresponding authors of this manuscript.
Conclusion
Detailed self- and informant-reported personality measurements have been collected for two large cohorts within the EstBB, a volunteer-based biobank in Estonia. Both cohorts have data on a wide range of mental and physical health phenotypes, health behaviours, and life events. Besides genetic risk modeling, the EstBB samples offer extensive opportunities for within-family studies given the large proportion of related participants. Among various other possibilities, these data enable detailed investigations into personality traits’ links with and effects on various health outcomes.
Supplemental Material
Supplemental Material - Cohort Profiles: Personality Measurements at the Estonian Biobank of the Estonian Genome Center, University of Tartu
Supplemental Material for Cohort Profiles: Personality Measurements at the Estonian Biobank of the Estonian Genome Center, University of Tartu by Mariliis Vaht, Kadri Arumäe, Anu Realo, Liisi Ausmees, Jüri Allik, Sam Henry, Estonian Biobank Research Team, Mairo Puusepp, Sirje Lind, Innar Hallik, Helene Alavere, Andres Metspalu, Priit Palta, Tõnu Esko, René Mõttus, and Uku Vainik in Personality Science
Supplemental Material
Supplemental Material - Cohort Profiles: Personality Measurements at the Estonian Biobank of the Estonian Genome Center, University of Tartu
Supplemental Material for Cohort Profiles: Personality Measurements at the Estonian Biobank of the Estonian Genome Center, University of Tartu by Mariliis Vaht, Kadri Arumäe, Anu Realo, Liisi Ausmees, Jüri Allik, Sam Henry, Estonian Biobank Research Team, Mairo Puusepp, Sirje Lind, Innar Hallik, Helene Alavere, Andres Metspalu, Priit Palta, Tõnu Esko, René Mõttus, and Uku Vainik in Personality Science
Footnotes
Acknowledgements
Not applicable.
Author Contributions
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work has been funded by Estonian Research Council’s personal research funding start-up grants PSG656 and PSG759, and Estonian Research Council’s team grants PRG2190 and PRG1291. The research was conducted using the Estonian Center of Genomics/Roadmap II funded by the Estonian Research Council (project number TT17). Data analysis was carried out in part in the High-Performance Computing Center of University of Tartu.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
EstBB data cannot be made publicly available due to privacy regulations but can be shared in accordance with the Human Genome Research Act. The process for accessing EstBB data is described at
. Please contact the corresponding authors to discuss collaborative projects involving EstBB personality data.
Supplemental Material
Supplemental material for this article is available online. Depending on the article type, these usually include a Transparency Checklist, a Transparent Peer Review File, and optional materials from the authors.
Note
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
