Abstract
The objective of this study was to compare unannounced standardized patient (USP) and patient reports of care. Patient satisfaction surveys and USP checklist results collected at an urban, public hospital were compared to identify items included in both surveys. Qualitative commentary was reviewed to better understand USP and patient satisfaction survey data. Analyses included χ2 and Mann-Whitney U test. Patients provided significantly higher ratings on 10 of the 11 items when compared to USPs. USPs may provide a more objective perspective on a clinical encounter than a real patient, reinforcing the notion that real patients skew overly positive or negative.
Keywords
Introduction
Improving the patient experience of health care delivery is a core element of the Triple Aim, the Institute for Healthcare Improvement's three-dimensional framework for optimizing health system performance (1). To accomplish this aim, healthcare systems pursue patient-centered care by assessing patient satisfaction and patient experience (2). Although satisfaction and experience are often used interchangeably in the context of quality assurance, these terms represent distinct yet intertwined concepts (3).
Patient satisfaction is inextricably linked to the patient's expectations and preconceptions of the visits, and though satisfaction scores may be used to assess care quality, they are not methodologically robust measures (4). Questionnaires asking patients to rate their care in terms of how satisfied they are tend to elicit overly positive ratings and may not reflect the specific processes that affect the quality of care delivery (5). Patient experience is a more explicit measure of healthcare quality and processes, as defined by the Agency for Healthcare Research and Quality standard assessment tool the Consumer Assessment of Healthcare Providers & Systems Hospital Survey (CAHPS) as “including several aspects of health care delivery that patients value highly when they seek and receive care, such as getting timely appointments, easy access to information, and good communication with health care providers.”(6) Assessments of patient satisfaction and patient experience often capture both clinical and nonclinical factors and are influenced by the specific measures used as well as by patient, setting, and provider characteristics (7).
Since its development in 1995, CAHPS has become a widely used method of measuring patient experience for quality improvement efforts. In some hospital systems, insurance reimbursement is adjusted based upon the quality scores received from these surveys (8). However, CAHPS may not be entirely reliable as patients tend to provide high scores; one study documented 79% of patients rated their providers as 9 or 10 on a 10-point scale (9). Other issues with traditional patient experience surveys include the impact of survey administration timing on results (i.e., proximity to time of visit and collection method) as well as patient age, gender, and other demographic and personal characteristics (10). Furthermore, there are significant costs to fielding surveys in order to obtain sample sizes that ensure reliable results.
Unannounced standardized patients (USPs) provide an innovative method for assessing patient experience and satisfaction, bypassing many of the methodological challenges associated with patient surveys. USPs are “secret shoppers,” actors extensively trained to portray actual patients’ incognito and document their experiences of routine healthcare visits. USPs can be used in a variety of clinical settings to assess a range of measures including communication skills, clinical assessment and care management, physical examination skills, diagnostic measures, and laboratory testing (11,12). They also assess the entire “door to door” experience of a clinical visit from the vantage point of the patient, including visit flow, interaction with office staff, and patient safety indicators such as handwashing (12). Using a behaviorally anchored checklist, USPs can be trained, tested, and calibrated to make reliable and valid assessments of the full range of patient experiences before, during, and after a clinical encounter (13).
Although USPs provide reliable ratings of providers and clinic experience, differences between the perspectives of USPs and real patients are not well understood. USP feedback may provide more objective or negative feedback on clinical practice than current patient surveys. The few studies that have explored this question focus solely on the assessment of communication skills and suggest that USP are more discerning and have higher reliability of ratings when compared to real patients (14–16).
To date, there have been no studies examining the congruence between USP and real patient ratings of clinic experience. To address this gap in literature, we used cross-sectional survey data to compare the experiences during primary care clinic visits as assessed by both real patients and USPs. We hypothesized that USP-reported experience would be somewhat consistent with real patient ratings, except with more explicit and actionable feedback on the encounter.
Methods
Patient experience feedback surveys were collected from 2 populations including (1) real patients who spoke either English or Spanish and (2) USPs who completed visits to English-speaking clinicians in the NYC Health + Hospitals/Bellevue Adult Primary Care Clinic (APCC), between 2017 and 2018. The APCC is a busy clinic within an urban, safety net hospital system in New York City that provides clinical care to approximately 22,000 adult primary care patients (17). Patient experience and satisfaction surveys were completed as part of an ongoing quality improvement initiative. A convenience sample of all patients, including both new patients and revisits, were approached upon completion of their primary care visit by research assistants (RAs) and asked to answer questions focused on their experience at the clinic. The RAs excluded patients who declined to participate or those who indicated that they did not speak either English or Spanish. All surveys were anonymous.
USPs participated in a minimum of 6 h of character and checklist training prior to completing a clinic visit. The USP visits were part of a larger, ongoing educational evaluation of internal medicine residents in the NYU Langone Health system who receive feedback on their performance throughout the duration of their training program (18,19). The APCC uses USPs routinely to evaluate the patient experience and reviews aggregate USP performance data periodically with clinic staff and leadership. USPs were registered as new patients at the clinic to minimize their detection by clinic staff, who were informed that USPs would be visiting the clinic at random for quality improvement purposes. Upon visit completion, USPs returned to a trained research assistant's office and completed a post-visit checklist and debrief. Residents were asked to consent to have their routinely collected educational data entered into a research registry during orientation (NYU IRB #i06-683) and only USP visits with residents who consented were included in the current study.
Measures and Data Collection
The real patient survey consisted of 37 items describing the visit, including registration, screening by medical assistant, encounter with physician, checkout, and general clinic functioning. Response options included yes/no items and 3- and 5-point Likert-style, category rating scale items, with behavioral anchors for each point on the scale. Patients self-administered the survey on paper or an iPad in the waiting room. Patients were also given the option to have the RA administer the survey verbally.
The USP post-visit checklist focused on the encounter with the physician (assessment of physician communication, history gathering, counseling, and treatment plan/management) (20) and the patient experience from the time they entered the clinic until they left (and, in some cases, follow up telephone calls to address patient questions). For most visits at the APCC, clerical associates and medical assistants are the primary team members who interact with a patient prior to the physician, so nurses were excluded from the assessment. Experiences captured in the checklist included interactions with staff, patient safety items, ease of navigation of system, and impressions of clinic microsystem functioning. In our work, these data are traditionally gathered to provide a “clinical microsystem” report that is shared with clinic staff and clinic administration as a quality improvement measure. The questions on the checklist included dichotomous yes/no response options, 3- or 5-point Likert-style behavioral anchors, and open-ended text responses for USPs to further describe their experiences.
For the purposes of the current study, we included only the 11 questions that appeared on both assessments in our analyses (Table 1). These items targeted 4 areas: ratings of clinic staff (5 items), physician (2 items), team functioning (3 items), and clinic atmosphere (1 item) (21). Response categories across both RP and USP surveys included yes/no questions, and 3- and 5-point category response scales. We also examined USP open-ended text responses across 4 items (Table 1).
Questions Appearing on Both Patient Survey and Unannounced Standardized Patient Checklist.
Data Management and Analysis
All survey data were entered and managed in the REDCap application hosted by NYU Langone Health System (22). Differences between USP and real patient survey responses were compared using χ2 analyses and a Mann-Whitney U (Wilcoxon rank sum) test. Open-ended text responses were coded by 2 coders (LA and SC) using a deductive thematic approach, following a typical timeline of activities during medical visits (eg, registration with clerical associates, assessment by medical assistants and physicians, and overall team and clinic atmosphere).
Results
A total of 190 real patients completed the survey; 55% of respondents were female and 45% were male. The age range of the cohort was 21 to over 65 (2% under 25 years, 27% 25-44 years, 55% 45-64 years, and 16% over 65 years). Only 14% were new visits; most were established patients. Forty-one USPs completed checklists after 177 distinct visits. USPs portrayed patients in their mid 20s to 50s (exact ages of the USPs are not available). Detection rates remained low throughout (less than 10% of visits were detected).
Quantitative Results
USP and real patient ratings of the clinic differed significantly on 10 of the 11 items, with real patients consistently providing higher (more positive) ratings than the USPs. Real patients’ satisfaction ratings of the medical assistant were twice as high as the USPs’ (Figure 1). Further, real patients were significantly more likely to report that their clinician gave them sufficient information regarding follow-up (85% RP vs 53% USP, P = .001), that the clinician had answered all of their questions (89% vs 55%, P = .001), and that the clinician had spent enough time with them (86% vs 74%, P = .020). Nearly 75% of the real patients noted that the clinic staff “functioned well as a team” while less than half of the USPs reported the same. Real patients were also more likely to recommend the clinic to a friend when compared to the USPs (Figure 2).

Percent Reporting that the Clerical Associate and Medical Assistant (MA) were helpful and explained instructions clearly: USPs versus real patients.

Percent endorsement of USPs versus real patients on items with 3-point response options.
Results of the Mann-Whitney U test indicated that USPs reported significantly higher mean ranks on chaos in the clinic atmosphere than real patients (P = .001). Real patients were more likely to report the clinic as calm (1 or 2 category scale response) when compared to the USPs (54% RP vs 32% USP), while USPs were more likely to report the clinic as chaotic (4 or 5 on category scale response) compared to real patients (26% USP vs 7% RP). The median response for RPs and USPs was 2 and 3 on category scale, respectively, indicating USPs have the propensity to report marginally higher levels of clinical busyness.
Qualitative Commentary
Open-ended text responses from USPs provided detailed descriptions of their experience of being a patient in the clinic (Table 2). A total of 537 comments (N = 104 on the clerical associate, 129 on the medical assistant, 129 on the provider, and 175 on clinic function) were collected by USPs across 4 open-ended items included in the checklist. Although real patients were provided with an open-ended section, few provided actionable information. USP responses noted that processes and procedures were completed during visits, described interpersonal interactions with staff and clinicians, and covered overall impressions of individual, team, and clinic functioning. Comments were often quite detailed, for example, describing which types of patient identifiers were used and the types of screening procedures that were completed or omitted. They also noted how clinic staff responded to unique challenges, such as computer crashing. They reported exemplary behavior exhibited by clinic personnel, including interactions they witnessed between staff and other patients (“offering newspaper to patient who had no reading material”). The specificity of comments also revealed opportunities for improvement (eg, improved signage, changes to checkout procedures, and gaps in patient education about follow-up plans) and supported the notion that USP checklists are a comprehensive measure of an entire clinical encounter.
Examples of USP Commentary by Assessment Domain and Actionable Recommendations.
Discussion
Although USPs cannot replace the direct report of patient experience as captured through patient surveys, this study demonstrates that they can accurately reflect real patient perspectives and provide additional valuable information about clinical teams and systems. USP ratings of clinical encounters were similar to real patient ratings, but they were both lower and more discerning than real patient reports. Trained to observe specific behaviors and provided with feedback during their training in order to ensure consistency of ratings, USPs may be more objective than real patients. They also typically have exposure to a diverse range of health-related scenarios as SPs in other contexts and can provide an informed, standardized perspective on care quality, minimizing differing expectations as well as the “halo effect.” The larger range of USP scores compared to RP scores suggests USPs may be more useful for quality improvement assessment as they do not have the “ceiling effect” that real patient surveys have. North et al. note that CAHPS scores lack specificity and items are highly inter-correlated, making it difficult to delineate concrete and actionable behaviors to target for quality improvement efforts for providers (23). They posit that this relationship is due to the time lag between visit and completion of the survey, but patient expectations and the potential for cognitive dissonance may also impact patients’ ability to be critical of their clinicians. Patients at safety net sites and those with low socioeconomic status may be resigned to less optimal service or have difficulty voicing critiques due to power differentials or fear of mistreatment by the health system. It is also possible that the limited qualitative data provided by real patients compared to USPs is a result of lower health literacy and language barriers in the survey population. USPs’ training and standardized health literacy can illuminate gaps in care that patients might not be able to articulate or acknowledge.
The breadth of USP ratings as opposed to real patients’ rating clustering is consistent with previous work; Fiscella and colleagues (16) compared USPs and real patient's ratings of patient-centered communication and found mean USP scores were lower and standard deviation was greater than real patient scores and standard deviation was greater. Spearman rank-order correlations between USP and real patient ratings were positive though small. Similarly, work by Rezai et al. (14) found that only 12% of real patients rated the provider as “poor” in communication skills, while 47% of USPs did, further reinforcing the notion that USPs are more discerning.
Although not reported in this paper, USPs can report on the whole visit experience. In our USP program, telephone calls to the clinic after a visit afforded information about the accessibility of the care team for follow-up questions and behaviorally specific information and actionable improvement targets for the site and staff. In prior work, we describe the impact of systematic USP data about the implementation of patient-centered care and patient safety measures for improving outcomes (12,24–27).
Despite the advantages of the USP methodology, surveys of real patient satisfaction and experience are irreplaceable and provide essential information that is sensitive to local and cultural norms and perspectives. USPs can offer a more comprehensive assessment suitable for targeted quality improvement efforts and can provide follow-up on targeted issues detected by traditional patient surveys or identify new issues not typically addressed by patient surveys. For example, we have used USP visits to gather information about the clinical system's capacity to address social determinants of health and implement workflows for patient safety (25,28). With respect to feasibility, SP use has a modest cost (i.e., actor time) and many hospitals have simulation centers that can support the training and preparation of actors. The use of both USP and real patient surveys in tandem can often be the best method of obtaining quality information on care teams and systems. This indicates future quality improvement and research on patient experience and satisfaction should include both USP and real patient surveys to gain a multifaceted understanding of the patient journey in all care settings. Further, both types of surveys should include the same quantitative and qualitative sections and questions in order to elucidate the differences between responses.
Limitations
This is a study at a single institution, thus, results may not be as representative of other health systems. Future work will expand on our cross-institutional survey research (29) to include patients and USP visits in other clinics and types of hospital systems (private and federal). Additionally, USPs and real patients were not matched for age, education, or type of visit (new vs established patient) at this ambulatory care clinic. Although USPs provided rich details on experience, real patients provided minimal usable qualitative data, making comparisons challenging. Although we were not able to match specific demographics, the overall findings of both USP and real patient responses aligned with other site-specific surveys completed during the time period. Real patients surveyed were those that were returning for visits, whereas all USP visits are first patient visits. This may contribute to higher overall scores or satisfaction in real patient surveys versus USP checklists. Further, though real patient surveys were anonymized, patients may still feel pressure to overreport satisfaction or literacy rates. Subsequent research will design USP cases to better reflect patient demographic information and create more comprehensive patient surveys to describe differences in demographics and visit contexts.
Conclusions
Results of this study indicate that real patients tended to provide higher ratings on patient satisfaction and experience measures than USPs. USPs provide detailed, critical, and objective evaluations of the entire clinical encounter, enhancing efforts to further understand and improve the experience of real patients in clinical settings. USPs are a useful tool for answering quality improvement-oriented questions and may provide more nuanced information about clinicians and the clinical systems than real patients, though they are not a substitute.
Footnotes
Authors’ Note
This is project was designed as a quality improvement study through the NYU Grossman School of Medicine IRB process. That is, it aims “to assess a process/program/system” and “to prompt adoption of results to local site.” Our QI goal was to compare feedback from USPs and patients to better understand the value of these different evaluation methodologies. We have attached the NYU Grossman School of Medicine IRB attestation form here. We made every effort to maintain anonymity and participant confidentiality. Written informed consent was obtained from the USPs and patients for their anonymized information to be published in this article.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Health Resources and Services Administration Primary Care Training and Enhancement, Agency for Healthcare Research and Quality (grant number T0BHP28577, 5R18HS021176, 5R18HS024669).
