Abstract
Purpose:
In this longitudinal study, we investigated the development of empathy during medical education and assessed potential predictors of empathy at different time points in the course of medical studies.
Methods:
In our longitudinal study, starting in 2011, we surveyed medical students at Lübeck Medical School, Germany at the beginning of their course of study and after 2, 4, and 6 years (t0-t3) using standard instruments for empathy (Jefferson Scale of Empathy, Student Version, JSE-S), anxiety and depression (Hospital Anxiety and Depression Scale, HADS), stress (Perceived Medical School Stress scale), and behavior and experience patterns (Arbeitsbezogene Verhaltens- und Erlebensmuster [Work-related Behavior and Experience Patterns]).
Results:
A total of 43 students completed all surveys. The cross-sectional samples for the different survey time points comprised between n = 220 and 658 students. We observed a slight, but statistically significant, increase of empathy scores from t0 to t3 (t(43) = −3.09, P < .01). Across all analyses, a preference for a people-oriented specialty was associated with a higher JSE-S sum score, as well as being female, whereas we saw a negative association between HADS depression and JSE-S scores.
Conclusion:
In our study, empathy scores were shown to be relatively stable during medical education with a tendency to increase. In line with previous research, individuals preferring a people-oriented specialty and women showed higher empathy scores.
Introduction
Different stakeholder groups, such as doctors, patients, and students, consider empathy to be a core quality of a good doctor. 1 A general consensus among medical education leaders and professional organizations exists with regard to the great importance of empathy as a desired outcome of medical education. 2
Empathy is a core component of emotional intelligence. 3 Among the different definitions of empathy are those of Hojat et al 4 and Mercer and Reynolds. 5 Hojat et al defined empathy in the context of healthcare as “. . .a predominantly cognitive (. . .) attribute that involves an understanding (. . .) of experiences, concerns and the perspectives of the patient, combined with a capacity to communicate this understanding, and an intention to help.” Mercer and Reynolds defined empathy in the context of healthcare as a “complex, multidimensional construct that has moral, cognitive, emotive, and behavioral components,” stating that it “involves an ability to: (a) understand the patient’s situation, perspective, and feelings (. . .); (b) to communicate that understanding and check its accuracy; and (c) to act on that understanding with the patient in a helpful (therapeutic) way.” 5
Several studies have shown a positive influence of doctors’ empathy on health outcomes in patients.5-8 There is evidence of a positive association of medical student empathy scores with communication scores on objective structured clinical examinations (OSCEs) and clinical competence.4,9
Female medical students consistently show higher levels of empathy when compared to their male counterparts. 10 Other factors that have been associated with the empathy scores of medical students are specialty preferences and nationality.
The question as to whether empathy is a stable personality trait or an evolving and changing ability with cognitive and emotional aspects is discussed controversially. Data on the development of empathy during medical education is inconsistent. 10 We cannot be sure whether empathy scores are stable, increasing, or declining during medical education, and there is some criticism of self-report measures to assess empathy in medical students and candidates.10-13 Longitudinal studies are relatively scarce (24 of 30 studies in the review of Andersen et al were cross-sectional) and small (the median sample size in the review of Andersen et al was 142.5). 10
Objective
The objectives of our study were to describe the longitudinal development of empathy scores during medical education and to assess potential predictors of empathy scores at different time points in the course of medical studies.
Methods
Study design and setting
For the present study, we used data from an ongoing prospective, longitudinal, observational study on medical students’ health: the Lübeck University Students Trial (LUST). For a detailed description of the study design, see Kötter et al 14 The LUST study is being conducted at the public, life-sciences oriented University of Lübeck, Germany. About 1600 students are enrolled in the medical study program at the Lübeck Medical School (LMS) (189 freshmen/year); 67% of them are female. 15 The first LUST survey was conducted in 2011 and since then, every subsequent class has been followed up annually for the duration of their studies.14,16,17
Participant selection
We invited all medical students at LMS to complete the yearly LUST survey. There were no exclusion criteria.
Measures
Empathy was self-assessed using the Jefferson Scale of Empathy, Student Version (JSE-S; available in the German language).18,19 The JSE-S is an established instrument for the measurement of empathy in the context of medical education and patient care. 20 It comprises 20 items. Each item can be answered on a 7-point Likert scale (1 = “I strongly disagree”; 7 = “I strongly agree”). Possible sum scores range from 20 to 140. We assessed empathy at t0 (prior to the start of the course), t1 (at the end of the sophomore year), t2 (at the end of the fourth year of study), and t3 (at the end of the sixth year of study).
We assessed age (t0), gender (t0; female/male/diverse), prior completed vocational training (t0; yes/no), admission quota (t0; central allocation based on pre-university grade point average/university-internal selection procedure), specialty preferences (t0-t3; people oriented/technical oriented), anxiety (t0-t3), depression (t0-t3), perceived stress (t2-t3), and study-related behavior and experience patterns (t0-t3) as potential predictors.
In order to measure anxiety and depression, we used the Hospital Anxiety and Depression Scale in the German language (HADS-D).21,22 HADS was initially developed for clinical populations for the assessment of anxiety and depression, but it has been validated and widely used among students in general and medical students in particular.23,24 It comprises 14 items: 7 for each of the 2 sub-scales relating to anxiety and depression, respectively. Each item can be answered on a 4-point Likert scale (0 = “mostly”; 3 = “not at all”). Possible sum scores range from 0 to 21 for each sub-scale.
We measured perceived stress using the German language version of the Perceived Medical School Stress scale (PMSS-D). 25 The PMSS was first introduced by Vitaliano et al, has since then been broadly used, validated, and translated into different languages.25-29 It comprises 13 items, each of which can be answered on a 5-point Likert scale (1 = “I strongly disagree”; 5 = “I strongly agree”). Possible sum scores range from 13 to 65.
The AVEM (“Arbeitsbezogenes Verhaltens- und Erlebensmuster,“ [Work-related Behavior and Experience Pattern]) was originally developed to collect self-reported data about personal experiences with work-related stress and typical coping strategies. 30 We used the 44-item version adapted for students. 27 The AVEM has been used in numerous studies with (medical) students,16,27, 31 and comprises 11 separate dimensions representing behavior-related risks or resources. These scales cover the following 3 major domains: (1) professional commitment; (2) resistance toward stress; and (3) emotional well-being (in the context of studies). Each scale comprises 4 items, which can be answered on a 5-point Likert scale (1 = “I strongly disagree”; 5 = “I strongly agree”). By means of cluster analyses of the initial AVEM sample group, 4 different types of work-related experience and behavior patterns were identified: Pattern G (“health”: high but not excessive work commitment, high resilience, and positive emotion, including a high satisfaction with life and experience of social support), Pattern S (“unambitious”: reduced work commitment combined with a high ability to distance from work and a high satisfaction with life), Risk pattern A (“overexertion”: high work commitment but low resilience, satisfaction with life is impaired), Risk pattern B (“burnout”: low work commitment, especially low significance of work, and low career ambition, combined with low ability to distance from work, high tendencies to resign, low satisfaction with life). 32 We used the probability of the pattern allocation as a predictor in this study (possible range: 0-100).
Preventing selection bias
In order to reduce bias due to non-response, we offered all participants a reward in terms of a book or food voucher for the amount of 5 Euros.
Study size
The study size was predefined by the number of students per class at LMS.
Data management
Outcome data (JSE-S scores) and predictors were imported into IBM SPSS Statistics for Windows Version 25.0 (IBM Inc., Armonk, New York, United States) from a Microsoft Access 2010 database. Where possible, we substituted missing values following the rules provided in the handbooks for the instruments.
Statistical methods
Data analyses were conducted using IBM SPSS Statistics for Windows Version 25.0 (IBM Inc., Armonk, New York, United States). All statistical tests were performed 2-tailed with an α of .05. We used t-tests to compare means of continuous variables and report results as means (M) (standard deviation [SD]). For gender, data were analyzed using a chi-squared test, and the results are reported as a percentage. In order to express bivariate correlations, we used Spearman’s ρ. We used linear regression analyses in order to further analyze correlations between predictors and JSE-S-scores, including variables with a P-value of <.10 in the bivariate analysis. We conducted separate regression analyses for each survey time point, and we included age and gender in all analyses in order to control for them.
Ethical consideration
This study was approved by the Ethical Committee of the University of Lübeck (file reference: 11-010), March 2, 2011.
This report was written in consideration of the STrengthening the Reporting of OBservational studies in Epidemiology (STROBE) statement. 33
Results
Participants
After the exclusion of incomplete data sets, the cross-sectional samples for the different survey time points comprised between n = 220 (from 3 consecutive classes, t3) and n = 658 (from 4 consecutive classes, t0) students (between 66% and 73% female; see Table 1). A total of 43 students from 1 class participated in all 4 survey time points (longitudinal sample; 77% female; see Table 1). We saw a slightly higher share of female students in the cross-sectional samples at the later survey time points and in the longitudinal sample. We did not collect reasons for non-participation.
Sociodemographic characteristics and mean JSE-S sum scores of the study participants, cross-sectional and longitudinal samples.
Outcome
The course of the mean JSE-S sum scores in the longitudinal sample is displayed in Figure 1. The differences between t0 and t3 (t(43) = −3.09, P < .01) and between t1 and t2 (t(43) = −2.18, P = .03) were statistically significant.

Mean JSE-S sum scores at the different survey time points (t0 [prior to the start of course], t1 [at the end of the sophomore year], t2 [at the end of the fourth year of study], and t3 [at the end of the sixth year of study]; longitudinal sample).
Predictors
An overview of all descriptive results of potential predictors of the JSE-S sum score (t0-t3) can be found in Supplemental File 1. About 40% of the students reported a prior completed vocational training. About 60% of the students reported having been admitted via a university-internal selection procedure. The share of students preferring a people-oriented specialty ranged from 62% at t0 to 75% at t3. Both HADS-D anxiety and depression scores were highest at t1 and lowest at t3, with t0 and t2 scores lying in between. We also found the highest PMSS-D scores at t1 and the lowest at t3. The distribution of study-related behavior and experience patterns was best (highest mean probability for pattern G and lowest for risk pattern B) at t0 and worst at t1 (lowest mean probability for pattern G and highest for risk patterns A and B). At t3, we saw the highest mean probability for pattern S.
Correlation of predictors and empathy
An overview of all results of bivariate analyses can be found in Supplemental File 2.
Female students, students with a prior completed vocational training, and students preferring a people-oriented specialty showed statistically significant, higher JSE-S-scores at t0. Age and HADS-D depression score correlated significantly with JSE-S score at t0. Female students and students preferring a people-oriented specialty showed statistically significant, higher JSE-S-scores at t1. We found statistically significant correlations between the probability for AVEM risk pattern A, HADS-D anxiety score, HADS-D depression score, PMSS-D score, and JSE-S-score at t1. Female students showed statistically significant, higher JSE-S-scores at t2. Statistically significant correlations were seen between HADS-D depression score and JSE-S-score at t2, and female students showed statistically significant, higher JSE-S-scores at t3.
Linear regression analyses
We conducted cross-sectional linear regression analyses for t0, t1, and t2 (see Tables 2 to 4), including all predictors with statistically significant correlations in the bivariate analyses. Due to the small sample size, we did not conduct a cross-sectional linear regression analysis for t3. We conducted linear regression analyses for the longitudinal samples t0 to t1 and t1 to t2 (see Supplemental File 3). Earlier specialty preference proved to be a statistically significant predictor in both longitudinal analyses. In the t1 to t2 analysis, the PMSS-D score at t1 also remained significant in the regression model.
Cross-sectional linear regression analysis t0, Nagelkerke’s R2 = 0.11.
Cross-sectional linear regression analysis t1, Nagelkerke’s R2 = 0.14.
Cross-sectional linear regression analysis t2, Nagelkerke’s R2 = 0.10.
Discussion
In our prospective, longitudinal study, we found a slight increase in JSE-S sum score during 6 years of medical education. Across all analyses, the preference for a people-oriented specialty, as well as being female, was shown to be associated with a higher JSE-S sum score. Lower scores for depression were also associated with higher JSE-S sum score. The association between a preference for a people-oriented specialty could also be found in the longitudinal analyses. Additionally, a lower level of self-rated medical school stress at t1 (at the end of the preclinical phase) was associated with a higher JSE-S sum score.
Our results are in line with the results of the systematic review by Andersen et al 10 with respect to the tendency toward higher levels of empathy among female students. We did not find that “empathy declines with the level of training,” 10 and we even saw a slight increase in JSE-S-scores from the beginning of the clinical phase (t1) toward the practical year (t3).
The consistent finding that a preference for a people-oriented specialty was associated with higher JSE-S sum scores in our study may not seem surprising at first sight. It is, however, not in line with the majority of studies examining the association between specialty preference and empathy. 10 To our knowledge, only 2 previous studies have reported an association between specialty preference and JSE-S-scores, while 1 previous study reported an association between a preference for a people-oriented specialty and the score in another instrument for the appraisal of empathy (Measure of Patient-Centered Communication) and a preference for a people-oriented specialty.34−36 Given the evidence for the link between physician empathy and patient outcomes, assessing the specialty preference during medical education may be helpful in order to reach the right group of students with tailored interventions aimed at maintaining and fostering empathy.
We saw an association between depression scores and JSE-S-scores at several time points during our study. The negative relationship between depression and empathy has been examined extensively among physicians, as well as medical students. 37 Our results confirm earlier results on this relationship and underline the demand for interventions to foster mental health during medical education. Not only is the decline of mental health during medical education a personal tragedy for the affected students, but evidence suggests that the quality of patient care may also be impaired measurably. Among other factors, empathy may be a mediator between the suboptimal mental health of healthcare professionals and a suboptimal quality of their patient care. 38
Strengths and limitations
This study has several strengths and limitations. It is one of only a few longitudinal studies on the development of empathy during medical education, but the size of our longitudinal sample is comparably small, bearing a risk for selection bias. This and the single-centered nature of our study limits the generalizability of our findings. The higher share of female students in our longitudinal sample (77%) may reflect a higher willingness of female individuals in general to participate in this type of survey. Due to the nationally standardized curriculum, the central study place allocation, and the fact that the age (M = 23.1) and gender distributions resemble the distribution not only at LMS but also nationwide (67% female), our local findings may be generalized at least at a national level.15,39 We collected data using web-surveys, which may have reduced bias due to social desirability. The likelihood of type 1 error is increased due to waiving an adjustment for multiple testing. The findings, even if consistent within the different analyses in this study and with earlier research, therefore need to be interpreted with caution and have to be confirmed in larger studies.
Implications for research and practice
Medical students, especially male individuals, should be supported in maintaining and expanding their empathy. Self-awareness training and Balint groups could decrease the risk of depression and the level of perceived medical school stress. Tailored interventions on patient-physician communication throughout the curriculum, flanked by regular assessments of communication skills, could help to reduce the negative impact of this kind of stress on empathy. 40
Future studies should examine the association between specialty preference and empathy in more depth. Larger scale longitudinal studies, as well as qualitative approaches, could help to shed some light on the question of whether curricular elements/changes could prevent those well suited and motivated to work in a people-oriented specialty from changing their preference during medical education. Future research should focus on the development and evaluation of tailored interventions to foster empathy during medical education, focusing both on individuals at risk of a loss of empathy and on the curriculum itself.
Supplemental Material
sj-pdf-1-mde-10.1177_23821205211030176 – Supplemental material for The Development of Empathy and Associated Factors during Medical Education: A Longitudinal Study
Supplemental material, sj-pdf-1-mde-10.1177_23821205211030176 for The Development of Empathy and Associated Factors during Medical Education: A Longitudinal Study by Thomas Kötter, Leevke Kiehn, Katrin Ulrike Obst and Edgar Voltmer in Journal of Medical Education and Curricular Development
Supplemental Material
sj-pdf-2-mde-10.1177_23821205211030176 – Supplemental material for The Development of Empathy and Associated Factors during Medical Education: A Longitudinal Study
Supplemental material, sj-pdf-2-mde-10.1177_23821205211030176 for The Development of Empathy and Associated Factors during Medical Education: A Longitudinal Study by Thomas Kötter, Leevke Kiehn, Katrin Ulrike Obst and Edgar Voltmer in Journal of Medical Education and Curricular Development
Supplemental Material
sj-pdf-3-mde-10.1177_23821205211030176 – Supplemental material for The Development of Empathy and Associated Factors during Medical Education: A Longitudinal Study
Supplemental material, sj-pdf-3-mde-10.1177_23821205211030176 for The Development of Empathy and Associated Factors during Medical Education: A Longitudinal Study by Thomas Kötter, Leevke Kiehn, Katrin Ulrike Obst and Edgar Voltmer in Journal of Medical Education and Curricular Development
Footnotes
Funding:
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by a grant received by TK from Lübeck Medical School (E18-2011).
Declaration of conflicting interests:
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Author Contributions
TK: design, data collection, statistical analysis, data interpretation, and manuscript writing; LK: data collection, statistical analysis, data interpretation, and manuscript editing; KUO: design, data collection, statistical analysis, data interpretation, and manuscript editing; EV: senior supervision, data interpretation, and manuscript editing.
Availability of Data and Materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
