Abstract
Objective
The aim of the present study was to psychometrically test and validate the Well-being Coaching Inventory (WCI), a proposed measure of interconnected, whole-person well-being in the context of health and wellness coaching (HWC).
Methods
Initially 49 items, the WCI was conceived with 4 dimensions: Mind, Body, Work, and Life. The inventory was evaluated in 3 sequential studies to test: (a) face validity, (b) convergent validity, and (c) predictive validity. Expert judgment, correlational analyses, and factor analyses were techniques applied to collected WCI data.
Results
After statistical evaluation (n = 261) of fit to each dimension, the WCI was shortened to 20 items that demonstrated convergent validity. Further use of confirmatory factor analyses and exploratory structural equation model in a large sample study (n = 531) provided additional support for the inventory’s convergent validity. Through correlation analyses to theoretically related concepts predictive validity was established.
Conclusions
The WCI is a valid, applicable, and reliable scale for use in HWC research and practice. It is an instrument that will aid HWC practitioners and researchers as a central outcome measure for their practice.
Keywords
“The WCI query of love, hope, meaning, gratitude, and compassion is considered to embody an expression of spirituality.”
Introduction
Health and well-being coaching (HWC) is an increasingly adopted health care intervention demonstrating beneficial results in a variety of patient populations. The National Board for Health and Wellness Coaching (NBHWC) defines the role of the coach as one who “supports clients in activating internal strengths and external resources to make sustainable and healthy lifestyle behavior changes.” 1 Since 2017, NBHWC has certified over 10,000 HWC professionals. 2 In 2019, the American Medical Association (AMA) approved current procedural terminology (CPT) codes for HWC services. Moreover, in 2024 HWC services were added to the Medicare Telehealth Services list by the Centers for Medicare and Medicaid Services (CMS).
Examining collected HWC research, 3 HWC is effective at improving a wide diversity of health outcomes. These are usually patient-related, with body weight, blood pressure, and hemoglobin A1C measurements typically cited for obese, hypertensive, and diabetic patients, respectively. However, the coaching process impacts more than biometrics, potentially extending to psychological variables and many aspects of work-life. A good example of far-reaching coaching effects is seen in studies of clinician burnout.4-6 McGonagle et al. showed in 50 primary care physicians that 6 coaching sessions over 3 months significantly improved burnout scores, work engagement and psychological capital (i.e., resilience, hope, optimism, but not efficacy). 4 The authors recommended organizations make HWC available to promote clinician well-being. Outcomes from this study well-defined clinician burnout and HWC effects on important work-related functions. However, the beneficial effects of coaching on other critical aspects of work and life which impact health and performance, beyond burnout and psychological capital, were not as easily captured.
There is a growing interest in a whole-person approach to health and well-being that includes mind and body functioning at work as well as in personal life. 7 HWC impacts whole-person health and well-being but to date, there are no tools developed to measure these potential wide-reaching coaching effects. Specifically, an instrument is needed to target health and performance impact variables, that improve internal strengths and external resources of the whole client or patient, connected to both work and personal life.
The Well-being Coaching Inventory (WCI) was created to support coaching an interconnected whole person while addressing critical determinants of well-being that affect health and performance. The intention was to build an HWC-related assessment, distinct from other health risk or healthy lifestyle assessments. Leveraging a team of HWC experts and more than 30 recognized well-being measures, the WCI was developed as a set of 49 questions. The inventory is made up of 4 dimensions of personal health and well-being (Mind, Body, Work, Life) querying 47 well-established constructs. Note, the term “life” is not meant to subsume the totality of life but instead the “life” subdimension refers to one’s personal life, outside work or school. The complete structure of the WCI is illustrated in Figure 1. The WCI explores issues related to psychosocial theories, such as emotional and social intelligence, self-determination, mindfulness, and self-efficacy. Individually, these constructs are well-researched and most have an independent and validated measurement process. Collectively, they form an original group of HWC-related concepts highly connected to work well-being and overall well-being. The 47 psychological constructs addressed by the WCI are keys to one’s thriving and flourishing. As such, the WCI allows a much wider range of coaching benefits to be inspected than other questionnaires not specifically designed for HWC. Greater detail on how the WCI was created can be found in the Methods section. WCI Subdimensions and constructs.
The purpose of the present study was to test the 49 WCI items psychometrically and to establish various forms of instrument validity. If validated, the WCI can provide a psychometrically sound outcome measure, which captures coaching-specific effects more fully than any single existing measure. The WCI would serve to complement patient-specific outcomes (e.g., BP in hypertension) as a central measure in HWC research while also providing a useful tool for application by coaching professionals.
Methods
Well-Being Coaching Inventory (WCI)
The WCI was created by leveraging a core team (MM, EJ, RH) of HWC experts with more than 50 years of collective experience in HWC research, research translation, education, and clinical application. Before deciding to build a HWC inventory, the team conducted an extensive review of over 30 existing surveys with relevance to HWC and well-being. These surveys represented a diversity of approaches: government-developed or financed surveys (from countries including the United States, Canada, and Denmark), surveys developed by academics, and surveys developed by organizations. Some surveys addressed a particular construct (e.g., burnout, positivity, resilience, self-actualization, flourishing, job crafting) while others more broadly assessed health and well-being. None of these questionnaires, however, were specifically designed to assess HWC, as is the case when these constructs are brought together in the WCI.
Next, a group of experienced coaches and researchers were recruited to review the surveys collected and provide feedback on relevance to HWC. Considering this input, the core team concluded that no single existing survey fit HWC needs, and none were specifically designed for HWC interventions. It was determined that a new instrument with specific relevance to HWC needed to be developed, while drawing on the plethora of existing validated assessments. The core team initially developed a set of 49 questions with items expected to be highly relevant to HWC processes. This original version of the WCI can be accessed at https://survey.alchemer.com/s3/7971712/Wellcoaches-Well-being-Inventory.
The steps undertaken to test the psychometric features and validate the WCI in 3 sequential studies are described below. The overall strategy throughout the process was guided by recommendations from the psychometric literature.8,9 The STROBE guidelines for observational studies were used for the outlined projects. All studies were reviewed and approved by the Institutional Review Board at St. Francis Xavier University and all participants provided informed consent.
Three types of validity were investigated: (a) Face validity, or the measure’s ability to capture the true essence of the construct under examination. Expert judgments and ratings are commonly used to assess this type of validity, 10 which was done in Study 1. (b) Convergent validity, which is concerned with the selection of items that best represent an unobservable (perceived) variable. Correlation analysis and factor analytical techniques (e.g., confirmatory factor analysis, exploratory structural equation modeling), which were used in Studies 2 and 3, are the preferred choice to establish convergent validity. 10 (c) Predictive validity, or the degree to which the created scale is associated with other measures of different constructs, according to theory. These associations are commonly tested via correlation analysis, which is outlined in Study 3. 10
Study 1
Objective
The primary objective of Study 1 was to establish face validity of the 49 developed items of the WCI.
Participants and Procedure
To obtain the perspective of experts, email invitations were sent to experienced, credentialed coaches: 1. Health and well-being coaches trained and certified by Wellcoaches Corporation, a coaching school of health professionals. 2. International Coach Federation-certified coaches contracted by AceUp, a leadership coaching organization. In total, 52 (female n = 38; male n = 13, non-binary = 1, Did not report = 1) coaches viewed an online version of the WCI via Qualtrics.
On average the participants were 56.18 ± 7.43 years old. The majority of the responding participants were white (n = 41), followed by black (n = 3), Asian (n = 2), and other ethnicity (n = 3). The majority held a Master’s (n = 25) or doctoral degree (n = 18), as the highest degree earned, followed by a bachelor’s (n = 7) or a high-school degree (n = 2). Fifty participants practiced in the United States, spanning over 21 states. One coach practiced in Canada and one in Australia. On average, the participants had 11.65 ± 6.64 years of HWC experience.
Measures
The WCI consisted of 49 statements (e.g., “I handle setbacks as learning opportunities.”), comprising physical, mental, work, and life subscales. The participants were presented with each of the 49 statements and asked to rate them on single-item measures reflecting applicability (i.e., “This statement is applicable”), clarity (i.e., “This statement is clear”), and readability (i.e., “This statement is readable”). Each of the items was measured with a Likert scale, ranging from 1—Strongly disagree to 5—Strongly agree. After each statement, participants were invited to provide comments as to how the statement could be improved. At the end of the survey, an open-ended box for comments or suggestions for further inclusions was provided.
Statistical Analysis
Descriptive analysis (i.e., calculation of means and standard deviations) was conducted for each item (i.e., applicability, clarity, readability). In addition, the number of participants rating a particular item at 3 (neutral point on the Likert scale) or below was counted. Qualitative comments on rewording or addition of items were analyzed by frequency and content applicability.
Results & Discussion
Expert Ratings.
Items and Modifications by Studies.
Once the wording of all items was finalized, the research team discussed whether further face validity efforts were necessary. All team members agreed that no further changes to the content of the items were necessary, and the next phase of validation began.
Study 2
Objectives
The aim of Study 2 was the establishment of convergent validity of the WCI.
Participants and Procedure
An online survey was sent to practicing health and well-being coaches to forward to their clients. The survey was also shared with students enrolled in Wellcoaches courses to become a health and well-being coach. A total of 261 participants responded to the survey (161 clients and 100 students). The average age of participants was 50.6 ± 12.5 years old. The majority (n = 210, 78.1%) were female, while 47 (17.5%) identified as male, and 4 (1.5%) as non-binary. About 3 in 4 participants were white (n = 205, 78.8%), followed by black (n = 18, 6.7%), Asian (n = 12, 4.5%), Hispanic (n = 9, 3.3%) and multiple/other (n = 16, 6.1%). The respondents were mostly located in the United States (n = 238, 91.2%) and Canada (n = 12, 4.6%). No significant differences in demographic factors between client and student participants were detected.
Measures
As described above, the WCI with revised items was administered. Each item was measured on an 11-point Likert scale, ranging from 0—Never to 10—Always.
Statistical Analysis
A three-step approach to evaluating the items was applied. First, measures of central tendencies (item means and standard deviations), indicators of normality (skewness and kurtosis), along with frequency plots were evaluated. The goal of the evaluation was to check for non-normality and items reaching a potential ceiling effect. Non-normality was assumed if items had exceeded skewness values of 3 and kurtosis values of 10. 11 Items with a mean score of 9 out of 10 were considered problematic for a potential ceiling effect and removal was recommended.
The second step entailed the calculation of inter-item correlation to check for variance overlap between items. Items with a correlation over .8 were flagged for discussion. 12 In addition, corrected item-total correlations were conducted to evaluate the fit of the item with each intended dimension. Cut-off values for corrected item-total correlations are discussed in the literature. A conservative cut-off of .5 was set for the study to discuss removal of an item. 13
The last step included the use of factor analytical techniques to evaluate the theoretical structure of the inventory. Because a theoretical structure (dimensions and respective items) had been established for the present questionnaire by the core HWC team, confirmatory factor analysis (CFA) was chosen. 9 In addition, Exploratory Structural Equation Modelling (ESEM) was calculated as it permits the cross-loadings of items on several factors, as opposed to CFA, which forces cross-loadings to be 0. 14
Often CFAs are considered too stringent, whereas ESEM applies a less restrictive approach to estimations of model fit. Yet, ESEM and CFA should not be considered an either/or approach in preliminary analysis of items. Rather, “researchers should compare ESEM and CFA measurement models based on the constructs to be considered” (
Following established recommendations, individual items with factor loadings below .60 and/or modification indices over 10 from the CFA were considered for potential removal.19,20 Finally, the model fit for both CFA and ESEM were assessed using the following criteria: χ2/df ratio (acceptable fit 2-3, good fit <2) 21 , Incremental Fit Index (IFI, acceptable fit>.90, good fit >.95), 22 Non-normed Fit Index (NNFI or Tucker Lewis Index, acceptable fit >.90, good fit >.95), 23 Comparative Fit Index (CFI, acceptable fit >.90, good fit >.95), 23 and Root Mean Square Error of Approximation (RMSEA, acceptable fit .05-.08, good fit 0.00-0.05). 21 Reliability was examined using Cronbach’s α. All analyses were conducted in JASP 0.18.3 and MPlus 8.2.
Results & Discussion
Item Descriptives, Item-Total Correlations, and Factor Loadings – Study 2.
Next, a CFA was conducted with the remaining 41 items. The overall fit of the model was unsatisfactory (χ2/df ratio = 8.82, IFI = .81, NNFI = .80, CFI = .81, RMSEA = .09 95% CI = .08 - .09). Hence, the factor loadings and modification indices of each item were evaluated. This led to the removal of 2 further items (items 10 and 15) due to low factor loadings and 19 items (items 2, 4, 8, 11, 12, 13, 14, 19, 22, 23, 28, 29, 30, 31, 32, 33, 35, 37, and 38) due to high modification indices. In sum, 21 items were removed in this step.
While the reduction stage is intended to reduce the number of items, it should be noted that the removal of items based on modification indices is discussed in the literature and should only be done with theoretical justification. 13 As such, each item was evaluated carefully with an eye on content overlap to the items that were retained. In particular, the research team examined the semantic and theoretical closeness of 2 items before removal. Additionally, we calculated the correlation between the dimension scores with all 49 items and the scale without the removed items, following recommendations by Smith and colleagues. 24 The correlations between the dimensions exceeded .92, indicating that at least 83% of the variance of subscales was retained after item removal. The theoretical considerations and the evidence from the correlation provide support for the decision to remove the items due to high modification indices.
The remaining 20 items were subjected to another CFA and ESEM. The model showed a good fit of the data (χ2/df ratio = 1.21, IFI = .94, NNFI = .91, CFI = .94, RMSEA = .06 95% CI = .05 - .07). All items loaded well on their respective dimensions (>.57). The correlations between the dimensions ranged from .33 to .64, indicating sufficient uniqueness (shared variance between dimensions <42%) of each dimension. The exploratory structural equation model showed an excellent model fit of the data (χ2/df ratio = 1.54, NNFI = .96, CFI = .98, RMSEA = .05 95% CI = .03 - .06). Cross-loadings of items were minimal (<.2). The Cronbach’s α for each dimension ranged from .77 to .90, indicating satisfactory internal consistency.
The objective of the second study was to provide evidence for convergent validity, by selecting items that best reflect the unobservable, perceived construct. Reducing items should be done with caution, balancing brevity and time efficiency of the potential administration of the inventory with theoretical depth of content. In essence, it should be brief enough for time-efficient administration yet cover the fundamental aspects of the theoretical underpinnings of the phenomenon under investigation.
By reducing the number of items from 49 to 20, the scale is much shorter and will be easier to use, however, through theoretical discussions and statistical evidence face validity was preserved. In addition, the results from the CFA and ESEM showed satisfactory evidence for the factorial structure of the 20-item version of the scale, supporting its convergent validity. As such, we proceeded to the final phase of the validation process.
Study 3
Objectives
Confirming the convergent validity of the 20-item version of the WCI and testing the predictive validity of the scale.
Participants and Procedure
An online survey was shared broadly to include subscribers to a monthly Wellcoaches newsletter and on social media. The aim was to reach participants between 25 and 65 years of age and to capture a large sample of the full-time working population. A total of 531 participants responded to the survey. The average age of the respondents was 49.8 ± 10.6 years old. The majority (n = 473, 89.1%) were female, while 52 (9.8%) identified as male, and 4 (.8%) as non-binary. Over 80% of the participants were white (n = 430, 81.0%), followed by black (n = 19, 3.6%), Asian (n = 28, 5.3%), Hispanic (n = 18, 3.4%) and multiple/other (n = 32, 6%). The respondents were mostly located in the United States (n = 484, 91.1%) and Canada (n = 12, 2.3%). Most of the participants completed a Bachelor’s (n = 181, 34.2%), Master’s (n = 258, 48.6%), or Doctoral (n = 65, 12.2%) degree, while 25 participants held a high-school degree (4.7%). About 2 out of 3 participants were working full-time (n = 340), while 102 (19.4%) worked part-time and 18 (3.4%) were unemployed.
Measures: Well-being
Item Descriptives, Item-Total Correlations, and Factor Loadings – Study 3.
Life satisfaction
Life satisfaction was measured with the Satisfaction with Life Scale (SWLS) by Diener et al. 25 This five-item scale (example item: In most ways my life is close to my ideal), is measured on a 7-point Likert Scale ranging from 1—Strongly disagree to 7—Strongly agree. The SWLS is a widely used instrument with evidence of good reliability and validity. 26 In the present study, an excellent internal consistency (Cronbach’s α = .91) was detected.
Depression
The PHQ-8 27 is an 8-item scale which measures the prevalence of depressive symptoms over the past 2 weeks. It derived from a nine-item scale (PHQ-9), however, the removal of 1 item for brevity does not affect its clinical sensitivity and specificity. 28 On each item, the participants rate the frequency of the symptom, ranging from 0—Not at all to 3—Nearly every day (example item: Little interest or pleasure in doing things). The total score is the sum of all items. The PHQ-8 has demonstrated excellent reliability and validity. 29 In the present study, a satisfactory internal consistency (Cronbach’s alpha = .84) was detected.
Perceived Stress
Stress was measured with the Perceived Stress Measure (PSM-9), 30 a 9-item measure. The participants rated their symptoms of stress over the past 4 to 5 days. Each item (example item: I feel stressed) is measured on an 8-point Likert scale ranging from 1—Not at all to 8—Extremely. Evidence supports the reliability and validity of the PSM-9 as a measure of stress. 31 Accordingly, the PSM-9 showed satisfactory internal consistency (Cronbach’s α = .88) in the present study.
Statistical Analysis
To provide further support for the convergent validity and factorial structure of the 20-item version of the inventory from Study 2, the data was subjected to CFA and ESEM. The same goodness-of-fit indices as in Study 2 were used.
To provide evidence of predictive validity, measures of life satisfaction, depression, and stress were collected from the participants. Conceptually, life satisfaction and well-being should be positively related, 32 while depression 33 and perceived stress 34 should be negatively associated with well-being. Pearson correlation coefficients were calculated to test the relationships between the subdimensions of the WCI and life satisfaction, depression, and perceived stress.
Results & Discussion
All items were sufficiently distributed for parametric analyses (Kurtosis <4, Skewness <2). All item descriptives are summarized in Table 4. The model showed a satisfactory fit of the data (χ2/df ratio = 3.29, IFI = .92, NNFI = .91, CFI = .92, RMSEA = .07 95% CI = .06 - .07). All items loaded well on their respective dimensions (>.52). The correlations between the dimensions ranged from .43 to .61, indicating sufficient uniqueness (shared variance between dimensions <38%) of each dimension. The structural equation model showed a good model fit of the data (χ2/df ratio = 2.81, NNFI = .93, CFI = .96, RMSEA = .06 95% CI = .05 - .07).
Goodness of Fit Indices for Study 2 and Study 3.
Correlations Between Constructs.
Note: All correlations
General Discussion
The purpose of the 3 outlined studies was to provide psychometric testing and validation of the WCI, a proposed measure of whole-person well-being in the context of HWC. The first study outlined procedures for creating a measure with items that are readable, applicable, and understandable while reflecting the essence of well-being. This study’s results support the face validity of the WCI.
The second study included the statistical evaluation of the items and an examination of the fit to each dimension. This step resulted in a reduction of items from 49 to 20. The data showed evidence supporting convergent validity of the shortened scale. Using a large sample (N > 500), the last study provided more evidence for convergent validity (i.e., through the confirmation of the factorial structure of the previous step) and further supported the predictive validity of the scale, as anticipated correlations to other concepts (i.e., depression, stress, and life satisfaction) were found. Therefore, the WCI is a valid, applicable, and reliable scale for use in HWC research and practice.
A large heterogeneity in HWC outcome measures may be attributable to a lack of measures specifically designed for HWC. 35 Researchers in HWC are generally concerned with using valid and reliable measures of central outcomes of HWC interventions. Given the results of our study, it is recommended that researchers broadly adopt the 20-item version of the WCI as a key outcome measure in future HWC trials. The WCI serves as the first specifically designed HWC core outcome measure. Large data collection on the WCI would enable comparison of HWC effect sizes among different populations (e.g., obesity, cancer) and trial designs (e.g., length, frequency). In addition, meta-analytic evidence of HWC would benefit from less heterogeneity in outcome measures, as a core measure improves the comparability between studies. As such, the WCI will help further our understanding of the benefits of HWC.
From a practice perspective, health and well-being coaches are advised to consider using the 20-item version WCI or the longer 49-item version depending on their purpose. As a measurement tool, the 20-item version is recommended as a reliable and valid instrument that enables the comparison between subgroups and/or over time. For example, to simply determine the effectiveness of HWC services use the shorter WCI. In addition, if practitioners seek to compare measures among or between patients, the shortened version is again best used. However, the 49-item version of the WCI can be used as a powerful coaching tool. When applied as part of a patient or client’s intake process and later as part of their evaluation process, the longer WCI may be useful and preferred over the shorter version.
With a greater number of HWC relevant constructs in the 49-item version, the coach may find a topic that is salient to their client’s path to better well-being using the longer WCI. Coaches should consider the longer version of the WCI if they have the time and prefer greater depth of information for and from their clients. If the 49-item WCI is completed, only the scoring of the 20-item version of the scale is recommended for longitudinal tracking, practical comparison, and research purposes. Therefore, the WCI can be applied to an individual thinking about their own well-being, or as a population evaluation of well-being. The WCI is openly available online https://survey.alchemer.com/s3/7971712/Wellcoaches-Well-being-Inventory. Results are emailed to everyone completing the inventory, including average overall score, based on the core 20 items and average score in each subdimension. Results also include suggestions for where the individual is doing well, needs improvement, or may be near burnout (see Figure 2). For research purposes, the collection of large datasets for normative data would deliver benchmark scores for individual and collective comparisons. Such research could also inform the design of future interventional studies which may employ the WCI, as it provides comparative values for the assessment of well-being levels in potential participants. WCI sample results summary.
The results of the present series of studies should be interpreted considering several limitations. First, the WCI was developed by a team of researchers who bring extensive experience in HWC and questionnaire development. However, the team also lacked diversity, which is a limitation as bias in the creation of the items might have been present. The series of studies recruited more than 80% of its participants from a non-client population to ensure feasibility. Further validation efforts should test the validity and reliability of the WCI in a larger client population. In addition, the samples from all studies were predominantly working age, female, educated, and white. While we consider the WCI to be generalizable, additional testing in males, older (e.g., above 65) and younger (e.g., university students) people, diverse and minoritized groups, and less educated individuals may be desirable. In addition, it may be desirable to test the reliability and validity of the scale in countries outside of North America. In this context, it may be important to adapt the WCI-Work subscale to be relevant to school-age education or retirement activities. In addition, the longitudinal stability of the WCI should also be examined. Furthermore, future research should explore the relationship to other related constructs (e.g., thriving), which was beyond the scope of the present study. Finally, the WCI query of love, hope, meaning, gratitude, and compassion is considered to embody an expression of spirituality. However, spirituality, as related to religious practices or an independent concept, was not questioned in the WCI. This is a limitation given the important potential implications of religious practices for some groups, cultures, and individuals.
In summary, the WCI is a valid, applicable, and reliable tool to measure well-being as a core HWC outcome variable. The WCI is also a coaching tool allowing clients to identify relevant categories/items addressable during the coaching process to encourage individual thriving and flourishing. The inventory is available free of charge and can be found in the appendix to this study. Researchers are encouraged to use the WCI in all HWC studies while practitioners should use the WCI widely as a coaching tool and outcome measure.
Supplemental Material
Supplemental Material - The Well-Being Coaching Inventory (WCI): Questionnaire Development and Validation
Supplemental Material for The Well-Being Coaching Inventory (WCI): Questionnaire Development and Validation by Sebastian Harenberg, Gary Sforzo, Rosie Hunter, Erika Jackson, and Margaret Moore in American Journal of Lifestyle Medicine
Footnotes
Declaration of Conflicting Interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: SH and GS are consultants with Wellcoaches Corporation, EJ and MM are full-time employees of Wellcoaches Corporation, RH is an employee of Wellcoaches Corporation.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Wellcoaches Corporation which provided funding support for this study. SH received consulting fees from Wellcoaches for this study. SH holds funding through the Social Sciences and Health Research Council in Canada.
Ethical Statement
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
