Abstract
Background
Mindfulness meditation is ubiquitous in health care, education, and communities at large. Mindfulness-Based Interventions (MBIs) are the focus of hundreds of NIH-funded trials given the myriad health benefits associated with this practice across multiple populations. Notwithstanding, significant gaps exist in how mindfulness concepts are measured using currently available self-report instruments. Due to the number of available mindfulness measurement tools, each measuring different aspects, it is difficult to determine the extent to which individuals develop comparable mindfulness skills and attitudes and which health benefits can be attributed to which gains in mindfulness. The Patient-Reported Outcomes Measurement Information System (Puerto RicoOMIS®) has established a rigorous instrument development methodology to create brief, precise, and clinically relevant outcomes tools.
Objective
This is the first of 4 papers representing an NCCIH-funded initiative (R01AT009539), which has applied Puerto RicoOMIS® instrument development methodologies to mindfulness measurement to improve the rigor, relevance, and reproducibility of MBI research results.
Methods/Results
This introductory paper sets the stage for why improved mindfulness measurement tools are needed and briefly describes the Puerto RicoOMIS® development approach. The second 2 papers highlight results from a national survey, focus groups, and expert interviews to identify and organize relevant mindfulness concepts, domains, and items for eventual item bank creation. The fourth paper reviews the item writing and development process of these new instruments, including results from stakeholder cognitive interviews and a translatability review.
Conclusion
Together these papers feature the rigorous development approach, rationale, logic, and significance that supports the development, calibration, and validation of new Puerto RicoOMIS® measures of mindfulness and related concepts.
Keywords
Introduction
Although the practice of mindfulness 1 has a historical presence of more than 2500 years, 2 mindfulness-based interventions (MBIs) were first introduced into American medical settings less than 40 years ago. At that time, Dr Jon Kabat-Zinn, a researcher from the University of Massachusetts, created a structured group-based behavioral medicine intervention called Mindfulness-Based Stress Reduction (MBSR).3,4 This intervention offered patients with chronic illness an intensive training in mindfulness meditation that was anchored in Buddhist teachings, but taught in a secular manner and consistent with Western adult learning styles. 5 Currently, MBSR is taught worldwide, as are similar evidence-based MBIs and related interventions such as Mindfulness-Based Cognitive Therapy (MBCT), 6 Dialectical Behavior Therapy (DBT) 7 and Acceptance and Commitment Therapy (ACT). 8 Research evidence for the efficacy of MBSR and other MBIs for improving health is mounting, particularly in the areas of chronic pain reduction,9,10 improvement in quality of life,11-13 and reduction of symptoms of depression and anxiety.14,15 Many researchers are investigating biological mechanisms by which MBIs influence health. For example, neuroendocrine16,17 inflammatory18,19 and neural pathways20,21 appear to be altered following MBIs. However, to fully characterize the active mechanisms of MBIs, it is essential to accurately measure participants’ assimilation of mindfulness skills, attitudes, and behaviors, and their association with health outcomes.
Challenges and Limitations of Self-Reported Mindfulness Measures
It is difficult to gauge effects and mechanisms of MBIs across studies due to the variety of measures currently assessing mindfulness. 22 As a construct, mindfulness is understood as multidimensional. While its dimensions are not universally agreed upon by researchers, teachers, and practitioners, many mindfulness measures assess concepts such as present moment focus; observation of one’s bodily sensations, emotions, and thoughts; and a non-judgmental and compassionate stance toward oneself and others. Currently there are more than 12 different mindfulness measurement tools, each 1 assessing different aspects of these dimensions in different ways. 23 This poses serious limitations to inter-study understanding of similar constructs, and perpetuates a patient-reported outcome (PRO) Tower of Babel, where each measure speaks its own outcomes language, but none understands the other. 24
Further, there is no 1 gold standard to measure mindfulness. Mindfulness researchers have created a variety of questionnaires to quantify purported dimensions of mindfulness, such as non-judgment, observing, describing, awareness, attention, and acceptance. Although Kabat-Zinn operationally defined mindfulness simply, as “moment-to-moment, nonjudgmental awareness”,
25
the construct is complex, multidimensional, and challenging to describe, with important phenomena occurring at pre-, peri-, and post-awareness levels. A recent meta-analysis of the effects of mindfulness training on self-reported mindfulness found small to moderate effects for some, but not all, dimensions, and results were inconsistent across studies.
26
This lack of agreement across dimensions of mindfulness has led to a plethora of measures that overlap and differ slightly in domains, yet may exhibit only modest associations with 1 another
27
(see Figure 1). Overlapping domains of mindfulness constructs from existing measures.
In addition to the problem of lack of standardization, there are several methodologic and psychometric limitations of existing mindfulness measures. One limitation is the lack of theoretical conceptualization of mindfulness. 27 Existing measures differ in the extent to which they are based upon classical Buddhist descriptions of mindfulness (eg, Freiburg Mindfulness Inventory 28 ) or operationalized based on skills thought to be developed through contemporary behavioral interventions such as Dialectal Behavioral Therapy (Kentucky Inventory of Mindfulness Skills 29 ).
Another critique is the relative inability of existing mindfulness measures to demonstrate individual item level discrimination and information parameters, including whether certain items function differently because of membership in particular groups (referred to as differential item functioning or DIF). Item response theory (IRT), a modern measurement development approach, offers a unique window into the item-level performance characteristics of individual items in relation to the construct being measured. 30 For example, a recent investigation of the Mindful Attention Awareness Scale (MAAS) using IRT demonstrated that 10 of the 15 items did not differentiate respondents at varying levels of trait mindfulness (ie, these items provided little useful information). 31 Also, even the 5 best-performing items were not able to differentiate high from very high levels of mindfulness, nor low from very low levels.
Another IRT investigation of the popular 5 Facet Mindfulness Questionnaire (FFMQ) 32 found response bias or DIF between meditators and non-meditators, which occurs when persons who are at the same level on the trait measured (eg, mindfulness) have a different probability of endorsing a particular response on an individual item based upon their membership in a particular group, such as being a meditator or non-meditator. For an item to perform well, it should be invariant across group affiliations. The response bias appeared to be related to the wording of questions: on negatively worded items that showed DIF, meditators scored lower in mindfulness than non-meditators who were comparable on their overall FFMQ score.
Another potential challenge lies in a measurement phenomenon called response shift, 33 which has received very little if any empirical attention in mindfulness measurement, despite being frequently observed anecdotally. Response shift occurs when there has been a change in the meaning of a person’s self-evaluation of a measured construct, such as a self-appraisal of how mindful 1 is. This change usually follows a recalibration of a respondent’s internal standards, a reprioritization of their values, or a reconceptualization of the construct itself. 34 For example, it is not uncommon for mindfulness-naïve individuals at the start of a mindfulness course to overestimate their self-reported mindfulness. However, once they experience some of the challenges inherent in the practice, including gaining a greater awareness or appreciation for how unmindful they may have been, at a follow-up assessment this newfound understanding and humility may lead to a downward shift in their responses so that mindfulness scores are lower than the baseline assessment, despite the fact they have likely become more mindful through the training. 35 A final challenge relates to ceiling and floor issues in measurement, whereby measures are relatively too easy or too difficult, resulting in large clusters of respondents at the same level. For example, Morone and colleagues 36 found that older adults aged 65+ who were naïve to meditation endorsed levels of mindfulness on the MAAS similar to norms for experienced meditators, both pre- and post MBSR. This may be due to increased mindfulness with age, or the inability of the tests to distinguish the highest levels of mindfulness.
Learning From the Patient-Reported Outcomes Measurement Information System (Puerto RicoOMIS®)
Measurement development methodologies from the Patient-Reported Outcomes Measurement Information System (Puerto RicoOMIS®) hold great potential to improve and standardize self-reported measurement of mindfulness and related concepts. Puerto RicoOMIS® is a product of a multi-year cooperative agreement between the NIH and several research institutions and academic medical centers to help build a technological infrastructure that supports the conduct of NIH-funded clinical investigators across Institutes, disciplines, diseases, and subpopulations. 1 Puerto RicoOMIS® began as a health domain-focused, rather than a disease-focused measurement system, however over time, others have added to the system with disease-relevant domains. To ensure comparable data and the accumulation of knowledge across patient subgroups and therapies, Puerto RicoOMIS® refers to aspects of functioning and well-being that are relevant across most chronic conditions (eg, cognitive functioning or fatigue). Puerto RicoOMIS® measures of health domains use “banks” of questions to address different domains of health (eg, sleep quality, pain, social functioning) wherein each response category to each question is calibrated to have a precise value on the continuum of health for that particular topic. These item banks were developed following rigorous protocols that involved extensive formative research and statistical analysis.
Since its beginnings in 2004, the influence of Puerto RicoOMIS® has significantly expanded both in the United States and internationally, with current research focusing on the validation of existing item banks, 37 development of new item banks using Puerto RicoOMIS® methods, 38 and applying Puerto RicoOMIS® item banks in specific populations.39,40 The systematic and rigorous methodology of Puerto RicoOMIS® includes creating banks of items based upon conceptual models informed by existing literature, interviews with content experts, and feedback from stakeholder focus groups. New items are written and iteratively reviewed by content experts and by patients or community members of varying education levels for clarity and relevance. Items are also reviewed for linguistic and cultural translatability, so that known difficult-to-translate words or regional colloquialisms are replaced with more appropriate choices. Puerto RicoOMIS® analytic methods include calibration on large, nationally representative samples, assessment of unidimensionality of items, followed by IRT analyses. Items are retained in final banks only if they contribute useful information about the domain or dimension and the respondent’s level on the domain, do not exhibit DIF based on variables such as age, gender, or education level, and do not overlap psychometrically with other items. Calibrated item banks may be used to create static short forms and computer adaptive tests (CATs) and allow for brief, precise, and conceptually relevant measurement of latent mindfulness domains with scored based on a common measurement metric, with mean of 50 and standard deviation of 10.
The National Center for Complementary and Integrative Health (NCCIH) Puerto RicoOMIS Mindfulness Measurement Study “COMMENCE”
With NCCIH-funding support (R01AT009539), our multidisciplinary team engaged in a 6-year instrument development and validation initiative called COMMENCE (Creating and Optimizing Mindfulness Measures to Enhance and Normalize Clinical Evaluation). This study was approved by the IRB at Northwestern University (STU00206019). COMMENCE closely followed Puerto RicoOMIS® measurement development principles and utilized mixed methods and classical and modern test construction approaches to create self-administered item banks, short-forms and CATs of mindfulness, its sub-domains, and related constructs. COMMENCE included the 3 aims briefly described below of which the subsequent papers in this collection will expand upon Aim I specifically.
Aim I: Item Bank Development
This included a comprehensive literature search and review of domains and measures of mindfulness and related concepts using known databases (eg, PubMed, OVID PsycINFO, CINAHL). Concurrently, we administered an online survey to 50 mindfulness specialists comprised of mindfulness teachers, leaders, practitioners, and researchers to identify: 1) key concepts and domains of mindfulness and related concepts that are important for a self-reported measure to assess; 2) criteria for ensuring acceptance of a new measurement tool by the mindfulness research community; and 3) potential challenges and barriers to mindfulness measurement. We used findings from these sources as the basis of conducting six focus groups and 12 individual interviews with mindfulness specialists to explore these areas more deeply. Results informed the selection of mindfulness concepts and candidate items. From Aim I activities, our team developed initial mindfulness item pools using existing items and measures as the basis of new item writing in addition to new content generated from focus groups and the online survey. This work included an expert item review, a cultural and translatability review, and cognitive interviews in which each item was reviewed by up to 5 mindfulness meditators and 5 meditation-naïve individuals in ‘think aloud’ individual interviews. Following these formative activities, item pools were ready for Aim II calibration testing.
Aim II: Calibration Testing and Score Linking
Calibration involved testing the new item pools alongside legacy mindfulness measures in a large (n = 4200) online general population sample that included mindfulness naïve (n = 1500) and mindfulness experienced (n = 1500) respondents, as well as a separate online sample of mindfulness teachers and meditators from across the United States and Canada (n = 500). Calibration analytic steps included calibrating item banks using models from IRT and selecting items for short forms and simulating CATs.
Aim III: Validation
Validation testing included comparison of the new mindfulness short forms and legacy mindfulness measures and psychosocial measures in a sample of 300 persons participating in university and community-based mindfulness courses at multiple sites across the country (including Pittsburgh and Chicago) and through partnering groups such as Mindful Leader. Measures were collected at baseline (T1), 8 weeks (T2), and 16 weeks (T3). This investigation evaluated construct validity (concurrent/discriminant, known groups) and responsiveness to change.
The Promise of Puerto RicoOMIS®-Based Mindfulness Measures
Self-report instruments are only 1 of many measurement approaches in behavioral interventions and observational studies, and attempting to quantify a subjective construct such as mindfulness raises several challenges, both practical and conceptual. However, this conundrum is no different from other self-report measurement efforts to define, measure, and characterize complex, subjective latent traits such as pain, depression, or fatigue, for which no definitive objective test currently exists.
Precise self-report measures that can quantify the full range of mindfulness-related concepts and attitudes are needed to advance the field by strengthening and unifying MBI research outcomes. The COMMENCE project lays the groundwork to address complex issues such as establishing a common mindfulness measurement lexicon and addressing DIF and response shift. COMMENCE will also result in scales that are translatable into different languages. These new Puerto RicoOMIS® mindfulness measures hold great promise to elevate the science of mindfulness and related interventions through a rigorously developed and tested measurement system and set of tools that are clear, conceptually relevant, and psychometrically sound.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Center for Complementary and Integrative Health (R01AT009539).
