Abstract
Physician-oriented online education could be a pathway to improve care for patients with heart failure, however, it is difficult to measure the impact of such education. Self-efficacy is a potential outcome measure. In this article, we develop a methodology for analyzing an educational intervention for general practitioners (GPs) using self-efficacy as a concept. This study was partly conducted within the setting of an observational study, IMPACT-B, where we developed online education for GPs. We designed and refined a 24-item questionnaire using item analysis, and exploratory and confirmatory factor analysis. Ninety-one GPs completed the questionnaire before and after the online education. Follow-up data after 6 months was available for 13 GPs. Item analysis revealed a high degree of internal consistency (coefficient alpha 0.95) and validity. Each additional year of experience was associated with an average baseline self-efficacy score of 0.50 points (95% CI [0.21-0.80]), and each additional patient in HF follow-up with an average score of 2.0 points (95% CI [0.48-3.5]). Items that differentiated most between GPs with high and low self-efficacy were the treatment of congestion as well as titrating medication and MRA in heart failure with reduced ejection fraction. Factor analysis reduced the number of questions to 14, mapping to three factors (diagnosis, treatment, and follow-up), and improved the model fit as measured by the goodness-of-fit indicator comparative-fit-index (from 0.83 to 0.91). We demonstrated a method to assess the impact of online education on general practitioners. This led to a questionnaire that was reliable, valid, and convenient to use in an implementation context.
Introduction
Heart failure (HF) is an important and growing health concern. 1 Patients are mainly older and fragile. Thus, family physicians or general practitioners (GPs) are ideally placed to deliver care. 2 However, the adequate diagnosis and treatment of HF in this older and fragile primary care population presents GPs with unique difficulties, and they report a lack of knowledge of recent guidelines as well as diagnostic uncertainty.3–6 Several studies conducted in primary care settings therefore sought to improve HF outcomes at the patient level by offering tailored HF education to GPs in place of or in addition to patient education and self-management support, albeit with limited success.7,8
This type of physician-oriented medical education is an integral part of quality improvement initiatives targeting the healthcare provider aspect of the so-called quadruple aim. 9 It is, however, difficult to measure the effectiveness of these interventions. One particular outcome could be self-efficacy, defined as an individual's belief in their capacity to act in the ways necessary to reach specific goals. 10 A high level of self-efficacy indicates a higher ability to use certain skills, even in more demanding environments and vice versa. This outcome has been defined and used as a proxy for HF patient self-management for several decades11–13 and has recently been defined for caregivers as well. 14 Self-efficacy is also related to competence, since individuals with high self-efficacy are more likely to engage in challenging tasks, persist in the face of setbacks, and achieve their goals, leading to a positive feedback cycle on actual competence. In occupational settings, several meta-analyses demonstrated an association between higher self-efficacy and workplace motivation and performance.15–17 As with other self-beliefs, this relationship can be prone to overestimation. 18
However, to our knowledge, there is no instrument to measure GPs’ (or other physicians’) self-efficacy in heart failure, nor is there a clear methodology for developing instruments to assess educational interventions’ effectiveness. We therefore aim to develop a methodology for assessing educational interventions using self-efficacy as a concept.
Methods
Design and Recruitment
This study took place in Belgium from January 2020 until February 2021 and was partly conducted within the setting of an observational study, the IMPACT-B study. 19 This project aims to improve HF care by implementing, among others, GP HF education and reimbursement of natriuretic peptides. Each participating GP had to follow an online HF education session (a narrated slideshow video lasting half an hour). This education session (Supplemental File 1) summarized current HF guidelines (based on the guidelines available in 2020) and was reviewed by several general practitioners with expertise in heart failure. Inclusion criteria for participation in the study were simple: participants needed to be active GPs and needed to give informed consent. We had no predefined exclusion criteria. We recruited GPs from the IMPACT-B study, as well as by email and online from a network of different vocational training and professional institutions to complete the online HF education and test the corresponding questionnaire. Participants entered informed consent and data on the survey platform Qualtrics®. A first set of questions (pretest data) was followed by the education session after which a second set of questions was completed (posttest score). Completing this entire trajectory took about 45 min. After 6 months, we asked GPs participating in the IMPACT-B study to fill out the survey again via email or telephone and sent 2 reminders via email.
Development of Questionnaire
Design
We used Bandura's concept of self-efficacy, defined as an individual's belief in their capacity to act in the ways necessary to reach specific goals. 10 The design of our questionnaire to assess the impact of an educational intervention consisted of 2 steps. First, a literature review of articles that reported the results of online training on self-efficacy of healthcare providers to look for validated questionnaires (MEDLINE database, 2000-2021, varying Mesh terms based on the concepts of health personnel, online education, and self-efficacy). Second, the design of a first questionnaire based on the findings of the review, including a preliminary test survey among five GPs (without special expertise in HF) to check the questions’ clarity. We found 17 articles that explicitly reported self-efficacy questions (Supplemental File 2). They were heterogeneous in form and content. There was no standardized and/or validated self-efficacy questionnaire for evaluating the impact of online education. We therefore based the design of our self-efficacy questionnaire on the principles reported by Bandura 20 (and the contents of the educational session). We formulated 24 questions concerning 6 domains of heart failure care (see Supplemental File 3 for the full questionnaire). Five questions regarded the diagnosis of HF. Six questions regarded the treatment of all types of HF, 6 regarded HF with reduced ejection fraction (HFrEF) specifically. Three pertained to follow-up, 3 to patient communication and one to advanced care planning.
Refinement
We then conducted extensive testing of the questionnaire in the broader GP population and combined item and factor analysis to reduce the number of items in our questionnaire and test internal consistency. The goal of the item analysis was to describe the contribution of different test items to our instrument and the validity of our instrument. The goal of the factor analysis was to identify possible different subdomains or factors within our instrument (exploratory factor analysis) and to conduct formal hypothesis testing (confirmatory factor analysis). Finally, we explored the questionnaire's validity within a nested observational study.
Outcomes
Our primary objective was to develop a methodology to assess an educational intervention among physicians, in our case by developing a physician self-efficacy questionnaire for HF. The outcome for each survey item was a Likert scale between 0 (no competence) and 100 (full competence). Our main outcomes for the questionnaire were internal consistency, validity, availability, and usability. A secondary outcome was the impact of an educational module on self-efficacy.
Data Collection and Analysis
We used descriptive statistics (means, standard deviations, medians, and interquartile ranges where appropriate) for participant characteristics.
We used pretest data for item and factor analysis. We tested 3 statistics for the initial item analysis: item difficulty, discrimination and alpha-if-item deleted. Item difficulty was expressed as the mean score and standard deviation of each response item. Item discrimination was estimated using the correlation between item responses and construct scores and expressed in item-total (ITC) and corrected item-total correlations (CITC). Internal consistency was estimated using alpha coefficients and different item's contributions to internal consistency were measured by estimating alpha with that item removed from the set (alpha-if-item-deleted). We computed a matrix of Pearson's correlation coefficients based on all possible pairs of questionnaire items and calculated p-values. We used a hierarchical clustering method with 4 possible clusters to visualize items in a correlation plot. We then conducted a factor analysis to identify multiple unobserved variables, or factors, that explained the correlations among our observed variables (in this case test item scores). 21 In an exploratory factor analysis we explored the possible number and types of factors that explained correlations within the questionnaire and how these factors related to each other by calculating eigenvalues (numerical values that express the total variance explained by each factor). We first fitted a 10-factor structure and evaluated the results in a scree plot, which visualizes factors in decreasing order of eigenvalue, where values of 1 or more are considered the cutoff for acceptability. 21 We then decided upon the optimal number of factors based on the strength of underlying constructs (scree effect) and how well each item related or loaded, on these hypothesized factors. We therefore consecutively fitted a 1-factor, 2-factor, 3-factor, and 4-factor analysis respectively and discussed which structure was most appropriate to explain possible constructs or subdomains within the questionnaire. In confirmatory factor analysis, we investigated the appropriateness of a given factor structure and formally compared different hypotheses as to which items best assess the factors we’re interested in measuring. Our main metrics for comparing model fits were comparative-fit-index (CFI), and root mean square error of approximation (RMSEA). We conducted factor analyses using the “lavaan” package and “factanal” function in R 4.2.1 using a varimax rotation method and maximum likelihood estimator, with nonlinear minimization to optimize the confirmatory factor analysis. 22 We tried to improve model fits using modification indices (MI). These are values that indicate how much a model fit would improve when a particular path (such as a covariance), were added into the model. Since these are a form of post hoc model modification, we judiciously assessed any identified MIs on plausibility. Finally, we investigated validity and usability by testing the association between self-efficacy measures at baseline and experience with HF care, as well as longitudinal measurements on the self-efficacy instrument. These last measurements were assessed in an exploratory boxplot, depicting the mean and interquartile range for all given question responses across 3 different sections.
Ethics
Participants needed to explicitly provide informed consent to analyze their responses before completing the survey. This procedure was approved by the ethical committee of the University Hospital Leuven (MP012922) on April 21st, 2020.
Results
Step 1—Questionnaire Design and Testing
Participant Characteristics
Table 1 depicts the characteristics of participating GPs. There were 91 respondents, of which the majority were female. More than half of all participants (n = 50) were recruited from within the IMPACT-B study. Participants had experience with heart failure patients, with about half of all participants (n = 43) counseling 6 HF patients or more, and more than half (n = 47) having diagnosed at least 2 HF patients in the previous 3 months. Thirteen GPs from the IMPACT-B study repeated the questionnaire after 6 months.
Participant characteristics.
Abbreviations: GP, general practitioner; HF, heart failure.
n (%).
Item analysis
Table 2 depicts the item analysis for all questionnaire items. The consistency on individual items and on the entire questionnaire was very high with a total coefficient alpha of 0.95. The highest discriminating items (highest CITC scores) were the treatment of HFrEF with MRA, titration of HF medication in patients with HFrEF and adequate treatment of congestion.
Item analysis of the POSE-HF questionnaire.
Abbreviations: NT-proBNP, N-terminal pro brain-natriuretic peptide; NYHA, New York Heart Association scale for heart failure; HFrEF, heart failure with reduced ejection fraction; HFpEF, heart failure with preserved ejection fraction; HFmrEF, heart failure with mid-range ejection fraction; ACE, angiotensin-converting enzyme; Signifier, abbreviations for the different questions used in the manuscript; M, mean; SD, standard deviation; n, number of questionnaires; na, incomplete responses; ITC, item-total correlation; CITC, corrected item-total correlation; Aid, alpha-if-item-deleted.
Figure 1 demonstrates a correlation plot of the 24 different item responses, based on a correlation matrix (Supplemental File 4) computed by calculating the correlations (Pearson's correlation coefficients) between each pair of questionnaire items. In this plot, items have been ordered according to a hierarchical clustering algorithm with 4 possible clusters. Hues of blue indicate the level of correlation (Pearson's correlation coefficient) between items and asterisks the corresponding p-values. This correlation plot indicates one strong (dark blue) cluster for items 8-15 and 18, as well as 2 weaker (lighter blue) ones. Item 4 (coding) correlated poorly with most other response items.

Questionnaire correlation plot indicating the degree of correlation between scores on each pair of questionnaire item responses. Items are ordered according to a hierarchical clustering algorithm with four different clusters (black squares). Hues of blue indicate the degree of correlation (Pearson's correlation coefficient) between items. Asterisks indicate the degree of significance: <.001 (***), <.01 (**), <.05 (*).
Validity and Usability
We hypothesized that self-efficacy would be greater in those physicians with a higher degree of clinical experience, particularly with HF. Figure 2 demonstrates the relationship between average self-efficacy scores per item and clinical experience.

Association between average self-efficacy score per item and measures of clinical experience for participating general practitioners (n = 91). Each dot represents the mean self-efficacy score per item. HF, heart failure.
Table 3 quantifies the associations. Each year of extra experience was associated with a 0.50-point average score increase per item, each extra HF patient with a 2.0-point increase and more than 2 recent HF diagnoses with an average increase of 7.3 points.
Association and 95% confidence intervals (CI) for years of experience, number of HF patients in follow-up, new HF diagnoses and average total score per question item.
aCI, confidence interval.
With regard to usability, we used the questionnaire to estimate the efficacy of a training module on increasing self-efficacy. Figure 3 illustrates the capacity of the questionnaire to capture an increase in perceived self-efficacy after the completion of a training session on HF (based on factors identified in the factor analysis, as explained in Step 2). The median time to complete the HF questionnaire both before and after the session (omitting the time to view the session itself) was 18 min (IQR: 7.29).

Longitudinal overview of average scores per item per factor across 3 different time points: before- and immediately after an online training session and after 6 months follow-up.
Step 2—Questionnaire Refinement
After our item analysis revealed significant clustering, we were interested in whether we could remove superfluous questions and identify possible underlying factors, thereby improving the performance of our questionnaire in capturing a true self-efficacy construct and reducing time to completion.
Exploratory Factor Analysis
A scree plot analysis (Figure 4) revealed a strong scree effect with multiple factors having an eigenvalue higher than 1 (the cutoff for acceptability). After analyzing model fits for a 1-factor, 2-factor, 3-factor, and 4-factor exploratory factor analysis respectively, as well as the correlation plot, we hypothesized a 3-factor structure to be the most appropriate to explain possible internal constructs or subdomains within the item questionnaire. Item loadings for this 3-factor structure are depicted in Table 4.

Scree plot indicating eigenvalues of an exploratory analysis with 10 hypothesized factors and corresponding eigenvalues.
Exploratory factor analysis with factor loadings on a three-factor structure.
Factor loadings higher than 0.5 are in bold font. Uniqueness is an expression of the leftover unexplained variance for an item, after removing the shared variance explained by the factors.
We hypothesized that these different factors could be translated to have the following meaning: HF diagnosis (Factor 3), HF treatment (Factor 1), and HF follow-up (Factor 2). Figure 5 visualizes each item's loading on the 3 factors.

Bar chart of factor loadings for each item on 3 hypothesized factors.
Items 1-3 loaded strongly on the “diagnosis” factor. Items 8-16 and 18 loaded strongly on the “treatment” factor. Items 6, 7, 17, and 19-24 loaded strongly on the “follow-up” domain. The items corresponding to the EHR (4 and 5) loaded weakly on the “follow-up” factor.
Confirmatory Factor Analysis
Based on the findings of our exploratory analysis, we investigated 3 different models to determine whether our hypothesized factor structure adequately captured certain percentages of the variability in scores. A first 3-factor model without the “coding” and “parameters” queries since they did not map strongly to different factors and were the least discriminating items. A second reduced 3-factor model with queries with loading values of 0.5 or higher on the diagnosis factor and 0.6 or higher on the treatment and follow-up factors. This excluded 8 items (salt, vaccination, ace, hfrefspecialist, hfrefiron, titration, stable, types), thus leaving 14 items in the questionnaire. Third, a version of this second model that includes the five highest-scoring MIs pathways. These indices seemed plausible: for example, in our second model, the highest scoring MI was the correlation between the “acelabs” and “mralabs” questions, which was a valid MI to add (Table 5).
Confirmatory factor analysis and indices.
Abbreviations: df, degrees of freedom; CFI, confirmatory fit index; RMSEA, root mean square error of approximation; CI, confidence interval.
Using factor analysis, we could improve the CFI and RMSEA to values close to the recommended cut-offs of 0.95 and 0.06 for good model data fit. 23
Discussion
We demonstrated the creation of an instrument for measuring the impact of online education on healthcare practitioners’ self-efficacy following four steps: a review of the literature, design, testing and refinement. This led to a questionnaire that was reliable, valid, and convenient to use.
To our knowledge, this is the first study that concretely demonstrates how to conceive and analyze a questionnaire to assess online education for health care practitioners within an implementation context. Our study differs from other studies investigating the effects of online training on self-efficacy outcomes in healthcare practitioners identified in our review. First, with regard to data analysis. Only 3 such studies used analytic methods other than simple parametric or nonparametric null-hypothesis significance testing: Blackman et al used path analysis 24 ; Heard et al used mixed-methods ANOVA. 25 Leung et al used mixed methods modeling. 26 None conducted item or factor analysis of their questionnaire to assess the discriminating value of items, their match with a hypothesized underlying construct or mapping onto meaningful factors. In our study, for example, this was particularly useful since we saw that our question items did not map clearly to our first list of 6 hypothesized factors, but rather to 3 different ones. We also saw that the items corresponding to accurate registration of HF in the electronic health record, an important topic for quality improvement, 27 corresponded little to the general hypothesized “HF self-efficacy” construct. Such a finding can be useful within an implementation context since it suggests that GPs’ confidence in the achievement of goals in HF management stands separate from their confidence in accurately registering HF (or other chronic conditions). In this example, this might mean that adequate registration of HF and chronic disease should be the focus of a separate education and self-efficacy instrument. Second, all studies used Likert scales ranging from 3 to 10 points rather than a continuous numerical scale, which could impact results: scales with few response options are less sensitive and reliable due to the omission of differentiating information. 28 A self-efficacy scale with multiple gradations of strength of self-efficacy is therefore a stronger predictor of performance than one with only a few choices. 29
Since self-efficacy is an important contextual element in frameworks to evaluate implementation processes such as CFIR or PariHS,30,31 our methods to develop a tailored questionnaire can be useful for other researchers. Firstly, with regard to an analysis of the implementation process. Before implementation, a questionnaire can be used to identify possible targets for improvement or education. Questionnaire analysis can also be useful during the implementation itself. For example, in our case of HF, it could be useful to assess the adoption of new guidelines concerning the use of SGLT2 inhibitors in HF by adding a new item to the questionnaire. One would then initially expect a low level of consistency and factor loading. Longitudinal follow-up within the context of a quality improvement initiative aimed at stimulating guideline-recommended care might then demonstrate a gradual convergence and improved coefficient alpha and factor loading on the “treatment” domain. Such a longitudinal follow-up of self-efficacy scores can then also be used as an outcome measure for the program's impact. Secondly, with regard to the translation of research findings. Since implementation research has a cyclical nature, a preliminary questionnaire such as ours can be used in other programs to improve heart failure management by general practitioners as well. A first objective could be to validate the questionnaire by investigating whether the observed item analysis and factor structure generalize to other general practitioner populations. We conclude with a final note on the convenience of using the described methodology. We believe these results to be heartening to other researchers (in primary care or other specializations), since both the method of design 20 and analysis 21 consist of a simple and well-described set of principles and are grounded in a subjective assessment of significance by a team of researchers and health professionals who are most acquainted with routine clinical practice. In our view, this results in a powerful combination of flexibility and precision.
Strengths and Limitations
Our study has several strengths. First, we designed our questionnaire after a systematic literature review and tested it in a large and nationally representative sample of general practitioners. Second, we used and reported a method of data analysis that is original in research surrounding online education and self-efficacy in health care practitioners. Our research has three limitations. First, we did not translate and test our questionnaire outside Belgium. However, the goal was to obtain a tool to measure local implementation rather than to obtain an internationally validated questionnaire. This also limits the relevance of the relatively large fraction of GP trainees in our sample. Belgian GP trainees follow a 3-year postgraduate program 32 and largely operate as autonomous physicians. 33 They are therefore also important actors to consider in health implementation processes. Second, our question items sometimes straddle the gray area between the concepts of self-efficacy and self-competence, wherein self-efficacy is defined as a person's judgments of his or her capabilities to organize and execute courses of actions required to attain designated types of performance and perceived competence as the extent to which a person feels he or she has the necessary attributes to succeed. 34 However, the extent to which these dimensions overlap is unclear35,36 and the nuance between the 2 concepts has little relevance in the context of our research. Third, we did not perform any sample size or power analysis, since our aim was not to assess any sort of specific effect of online education, but rather to analyze the appropriateness of our questionnaire item structure.
Conclusion
We demonstrated the creation of an instrument for measuring the impact of online education on healthcare practitioners’ self-efficacy following four steps: a review of the literature, design, testing, and refinement. This led to a questionnaire that was reliable, valid, and convenient to use in an implementation context.
Supplemental Material
sj-docx-1-mde-10.1177_23821205241232497 - Supplemental material for How to Evaluate Online Education for General Practitioners: Development of a Tailored Questionnaire for Heart Failure Education
Supplemental material, sj-docx-1-mde-10.1177_23821205241232497 for How to Evaluate Online Education for General Practitioners: Development of a Tailored Questionnaire for Heart Failure Education by Willem Raat, Evelyne Housiaux, Miek Smeets, Stefan Janssens, Birgitte Schoenmakers and Bert Vaes in Journal of Medical Education and Curricular Development
Supplemental Material
sj-docx-2-mde-10.1177_23821205241232497 - Supplemental material for How to Evaluate Online Education for General Practitioners: Development of a Tailored Questionnaire for Heart Failure Education
Supplemental material, sj-docx-2-mde-10.1177_23821205241232497 for How to Evaluate Online Education for General Practitioners: Development of a Tailored Questionnaire for Heart Failure Education by Willem Raat, Evelyne Housiaux, Miek Smeets, Stefan Janssens, Birgitte Schoenmakers and Bert Vaes in Journal of Medical Education and Curricular Development
Supplemental Material
sj-docx-3-mde-10.1177_23821205241232497 - Supplemental material for How to Evaluate Online Education for General Practitioners: Development of a Tailored Questionnaire for Heart Failure Education
Supplemental material, sj-docx-3-mde-10.1177_23821205241232497 for How to Evaluate Online Education for General Practitioners: Development of a Tailored Questionnaire for Heart Failure Education by Willem Raat, Evelyne Housiaux, Miek Smeets, Stefan Janssens, Birgitte Schoenmakers and Bert Vaes in Journal of Medical Education and Curricular Development
Supplemental Material
sj-docx-4-mde-10.1177_23821205241232497 - Supplemental material for How to Evaluate Online Education for General Practitioners: Development of a Tailored Questionnaire for Heart Failure Education
Supplemental material, sj-docx-4-mde-10.1177_23821205241232497 for How to Evaluate Online Education for General Practitioners: Development of a Tailored Questionnaire for Heart Failure Education by Willem Raat, Evelyne Housiaux, Miek Smeets, Stefan Janssens, Birgitte Schoenmakers and Bert Vaes in Journal of Medical Education and Curricular Development
Footnotes
FUNDING
The authors received no financial support for the research, authorship, and/or publication of this article.
DECLARATION CONFLICTING INTERESTS
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Willem Raat, Evelyne Housiaux, Miek Smeets, Birgitte Schoenmakers, and Bert Vaes report no conflict of interest. S. Janssens is the holder of a named chair in Cardiology at the University of Leuven financed by AstraZeneca.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
