Abstract
Background:
The use of telemedicine for continuity of care during disruptions in health care delivery and routine primary care is now well known. Insufficient scientific evidence from assessing telemedicine persists, with widespread use of nonvalidated questionnaires resulting in the inability to pool data on the quality of telemedicine encounters from numerous small-sample studies. This study examines the comprehensiveness of the recently developed Telemedicine Assessment Toolkit (TAT), among articles published after the toolkit was created.
Methods:
We conducted a PubMed search for articles that used questionnaires to assess telemedicine encounters, published between November 1, 2021, and July 31, 2023. We extracted individual questions from nonvalidated questionnaires and analyzed them to determine their similarity with TAT items. We used a statistical proportion test to see if rates of inclusion were similar across the initial TAT questionnaires and the follow-up set. We calculated p-values for proportions.
Results:
The database search had an initial yield of 277 articles in which there were 21 articles each with its nonvalidated questionnaire and 348 individual questions. Test of proportions revealed no statistical difference between initial and follow-up articles tested, adjusted for multiple tests. Further analysis found that 85.6% (298 of 348) of the questions used in the follow-up articles match items in the TAT.
Conclusion:
These results provide insight into the comprehensiveness and potential suitability of the TAT as a toolkit for telemedicine assessment. Future work to standardize TAT involving expert, cognitive testing, and weighting toward a single score would improve telemedicine encounter assessment.
Introduction
Telemedicine is the exchange of medical information from one site to another via electronic communications to improve a patient’s clinical health status. 1 Despite the growing adoption of audiovisual telemedicine, insufficient scientific evidence assessing telemedicine is a persistent problem. 2 Despite several validated questionnaires specific to telemedicine such as the Telemedicine Satisfaction Questionnaire (TSQ), the Telehealth Usability Questionnaire (TUQ), the Telemedicine Satisfaction and Usability Questionnaire (TSUQ), the Telehealth Satisfaction Scale (TESS), and the Patient Assessment of Communication in Telemedicine (PACT), their widespread use has been limited for a variety of reasons. Neglect of predictors for telemedicine encounter success,3,4 lack of inbuilt mechanisms for continual improvement, 5 and lack of comprehensive assessment 5 are examples of barriers to using validated questionnaires.
The identified drawbacks among validated questionnaires have led to the widespread use of nonvalidated, customized questionnaires to assess audiovisual telemedicine encounters.5–7 Eschewing uniformly accepted measures reduces the capacity to pool samples and data from studies having similar designs but lacking. Also, nonvalidated questionnaires might exhibit technical issues and shortcomings such as lack of response variability and poor reliability. Other shortcomings include inappropriate scale design, lack of construct validity, inadequate selection of constructs, and ambiguous questions.
The Telemedicine Assessment Toolkit (TAT) has been developed to enhance the comprehensive assessment of audiovisual telemedicine encounters. A two-step conceptual approach was used to first identify potential domains and then to subsequently identify subdomains by qualitative analysis of a pool of questions from studies published from January 1, 2016, to November 1, 2021. 8 The TAT toolkit is composed of 11 domains to assess audiovisual telemedicine encounters: “comparison to standard (in-person) care,” “patient costs,” “patient feeling,” “patient perspectives,” “patient–provider interaction,” “patient satisfaction,” “privacy,” “qualitative feedback,” “technology,” “telemedicine readiness,” and “usability.” These are further divided into 26 subdomains: audio quality, benefits, clarity of communication, clinical examination, comfort with encounter, comparison with standard care, convenience, device usability, ease of setup, ease of use, equipment readiness, future use intent, improvement suggestions, likelihood to recommend others, money saved, overall satisfaction, patient–provider relationship, preference, previous experience, provider’s skills and knowledge, respect for privacy, satisfaction with process, satisfaction with system, shared decision making, time saved, and video quality.
The TAT toolkit has a core block of 30 questions, preceded by a block aimed at retrieving demographic/patient characteristics (Block 1), and followed by a customizable clinical outcomes block (Block 3).
TAT was designed to apply to a wide range of telemedicine topics and thus be broadly accepted by researchers, with relevant questions (item measures) placed within coherent domains to provide a comprehensive set of measures to assess audiovisual telemedicine encounters.
The goal of this study is to examine the comprehensiveness of the TAT within the articles published between November 1, 2021, and July 31, 2023.
Methods
We conducted a systematic search of the PubMed database to identify articles that used questionnaires to assess patients’ experiences with synchronous, audiovisual telemedicine consultations. To achieve this, we followed the same methodology used in the development of the TAT, 8 and employed a search strategy encompassed terms such as “telemedicine,” which is a medical subject heading for enabling semantic search, combined with the keywords “surve*,” “questionnaire*,” or “poll*” by a Boolean operator AND. We refined the search strategy by applying the subsequent filters: species—human, age +19, and English articles. The time frame targeted articles published between November 1, 2021, and July 31, 2023, which we term the “follow-up cohort” of articles published after the creation of the TAT. Two authors (G.D.) and (S.H.) screened the titles and abstracts of the identified articles, removing those that meet the exclusion criteria. Two authors (G.D.) and (S.H.) reviewed the full text of the remaining articles to ensure they were eligible articles, specifically primary research about patient experience of telemedicine. Finally, articles that used validated measures or were included in the initial development of the TAT were removed. Figure 1 illustrates the screening process.

Articles screening process.
The individual questions from the questionnaires used in the included articles were assessed to determine similarities with items from the TAT. Two authors (G.D. and R.A.) reviewed the questions and categorized each question as either similar to the thirty TAT core items or not within the scope of TAT. Items considered similar in text or meaning were then categorized with the associated TAT item and subdomain from the related TAT literature. To categorize the question stems, author 1 (G.D.) reviewed the nonsimilar items coded by author 2 (R.A.). Disagreements were resolved by consensus based on a discussion of the keywords associated with each TAT item. Items that were considered demographic or routinely provided by the system during usual care were categorized into Block 1, while clinical assessment items were categorized as Block 3 items. We identified questions not within the scope of telemedicine encounters by consensus among the authors (Table 3). For question matching items in the TAT, we combined the following TAT pairs: TAT 1 and 2, TAT 10 and 11, TAT 12 and 13, as well as TAT 17 and 18, to be consistent with the analysis grouping used in creating the TAT. 8 For each question type, we used a statistical proportion test to see if rates of inclusion were similar across the initial set of questionnaires and the follow-up set. We calculated p-values for proportions and further compared them to values after adjusting for the number of tests. Statistical proportion test analyses were performed using R (version 4.2.0, R Foundation for Statistical Computing, Vienna, Austria). A p-value of 0.05 was considered significant. We also investigated which and how many TAT questions matched questions from one of the previously validated assessment tools.
Results
We identified 277 articles from the initial search. Screening of titles and abstracts removed 226, reading full text removed another 20. Of the remaining 31 full-text articles that used questionnaires, two articles were used among the initial set of articles used in creating the TAT, while eight articles (25.8%) used validated questionnaires specific for telemedicine and excluded these from analysis, being outside the objective of the current study. Ultimately, we found 21 articles each with its own nonvalidated questionnaire and 348 individual questions.
In reviewing the 348 individual questions, we found that 298 (85.6%) had similar meaning to question items in the TAT. Twenty questions (5.8%) elicit routine health system information collected in the precursor (Block 1) of the TAT. Thirty questions (8.6%) of the 348 questions are considered outside the scope of TAT. For instance, the question: Were you provided with a call back number in case you had an inquiry, was considered a question about “follow-up” activities not applicable for most telemedicine encounters in which respondents are surveyed before the end of the encounter session. Tables 1–3, respectively, list the questions for each type.
Description of Individual Questions That Match TAT with Authors’ Names, Publication Year, Specialty, and Country of Study
TAT, Telemedicine Assessment Toolkit.
Patient/Encounter Characteristics Questions Matching Routinely Generated Health Systems Information (Block 1)
Questions Outside the Scope of TAT
Table 1 lists each of the 21 articles, presenting information on the authors’ names, publication year, and country of publications, and each question specifies the matching TAT item. Table 2 includes questions assessing patient and encounter characteristics routinely generated from health systems. Table 3 includes all the questions that are outside the scope of audiovisual telemedicine encounters and do not meet the criteria for inclusion in the TAT.
Table 4 provides information on the TAT items and the distribution (percentages) of questions from the 21 included articles that match each. The TAT items that match the most questions were “Time saved” (n = 34; 11.4%), followed by “Overall satisfaction” (n = 24; 8.05%), “Clarity of communication” (27, 9.06%), and “Comparison with in-person care” (n = 21; 7.05%). For each question type, we used a statistical test of proportions to see if rates of inclusion were similar across the initial set of surveys and the follow-up set. p-Values for these tests are reported in Table 4. The test of proportions reveals no statistical difference between initial and follow-up articles for 24 of the 26 items tested. Two items, TAT 17 with 18 (combined) and TAT 25 are statistically significant at the unadjusted p < 0.05 level. However, these outliers are also not significantly different when analyzed using threshold p-values adjusted with Bonferroni–Holm corrections for multiple hypothesis testing: items 17 with 18 “Clarity of communication” (padj = 0.0019) and item 25 “Time” (padj = 0.002), respectively, and thereafter, reported as not statistically different.
Frequency of Items: Initial TAT Articles Versus Follow-Up Articles
Adjusted threshold for statistical significance for items 17 with 18 “Clarity of communication” (padj = 0.0019).
Adjusted threshold for statistical significance for item 25 “Time” (padj = 0.002).
We analyzed the relative relevance of individual TAT items to researchers. Compared to the initial group of articles used in creating the TAT, the percentages of 10 TAT items showed increased preference, but not statistically significant, for use by researchers in the follow-up set of articles: TAT 9, TAT 17, TAT 18, TAT 19, TAT 22, TAT 25, TAT 27, TAT 28, TAT 29, and TAT 30. These items relate to 10 subdomains linked to clarity of communication, preference for telemedicine, overall satisfaction, and ease of use of audiovisual telemedicine. Other preferred subdomains include time saved, money saved, comparison to in-person care, respect for privacy, comfort with encounters, and qualitative feedback (improvement suggestions).
Ten TAT questions (TAT 6, TAT 9, TAT 12, TAT 13, TAT 14, TAT 16, TAT 19, TAT 20, TAT 22, TAT 27) matched a question from one of the previously validated assessment tools. We found four questions from the TSQ, two questions from TUQ, one from PACT, and three questions from the International Telemedicine Union (ITU) real-time video communication scale. Among the 10 TAT questions above, we found 116 (38.9.%) questions in the follow-up articles and 244 (39.8%) in the initial set of articles.
Discussion
We reviewed 21 articles with nonvalidated questionnaires for telemedicine encounters out of 31 articles, reinforcing the need for standardized tools to solve the problem of small diverse studies with no standardized, uniformly adopted assessment questionnaire.5,7,9–11 In our screening of the PubMed database, the proportion of articles that used validated questionnaires compared to nonvalidated or generic questionnaires: 25.8% (8 of 31) was also comparable to earlier studies which found 32% 7 and 22.5%, 6 respectively.
We observed that two articles with nonvalidated questionnaires that were initially screened for inclusion were found to have been included in the original cohort of articles used to create the TAT.12,13 We attributed this recurrence to overlaps in journal publication dates (advance electronic vs. print publication dates) and removed them from the analysis of the results to avoid double counting with questions repeated in both cohorts.
We acknowledge that this research may not have captured all questionnaires during the search for articles and is limited to one database, PubMed. This may increase the likelihood of selection bias through the exclusion of studies not indexed within the database. Also, in defining the scope of this research, only patient-focused questionnaires were included while provider assessments were excluded.
We found that 85.6% (298 out of 348) of the questions used in the articles in the extended cohort from November 2021 to July 2023 match items in the TAT, establishing the comprehensiveness of and potential suitability of the TAT as a toolkit for researchers coming after the initial articles aggregated to develop the TAT.
Thirty questions (8.6%) were not within the scope of TAT, due to several reasons. Half (15 of 30) of these questions asked respondents about the “attire” and the “background” and “follow-up” after the encounter. Questions about encounter follow-up could not be expected to reasonably assess most telemedicine encounters in which respondents are surveyed before the end of the encounter session. Questions about the background and attire used during the encounter predominantly came from one study which was focused on aesthetics and was excluded to avoid nonrepresentative items that are also used for in-person visits. One question each was about trust, respect, and blood tests. Our opinion is that these questions were not specific to telemedicine, being questions relating to every type of visit, were not relevant to the toolkit, and could add more respondent burden without eliciting useful information. Importantly, these questions were not in comparison to in-person encounters, offering little value in evaluating telemedicine.
The TAT contains one-third of questions that come from previously validated instruments indicating that researchers have the need to use a larger set of questions to successfully evaluate telemedicine encounters. However, we found that both the initial and follow-up articles used approximately 40% of questions from validated questions suggesting that, in the future, the TAT questions may be weighted differently.
Considering the relevance of individual TAT items to researchers, the increase in relative percentage use of 11 TAT items suggests a high likelihood of being adopted by more recent researchers (TAT 9, TAT 17, TAT 18, TAT 19, TAT 22, TAT 25, TAT 26, TAT 27, TAT 28, TAT 29, TAT 30). These items relate to 10 subdomains linked to clarity of communication, preference for telemedicine, overall satisfaction, and ease of use of audiovisual telemedicine. Other subdomains linked to increased preference by authors include time saved, money saved, comparison to in-person care, respect for privacy, comfort with encounters, and qualitative feedback (improvement suggestions). Nine of these subdomains reflect major themes that are also found in the major existing validated telemedicine questionnaires, TSQ, TUQ, TESS, and PACT. 8
The next logical steps for future research in standardizing TAT for uniform adoption should involve the evaluation of items by experts, with focus groups and cognitive interviews among the intended target population of audiovisual telemedicine encounter users. Additionally, the remaining questions should also be validated. Finally, the questions could be weighted to produce a single score.
Footnotes
Acknowledgments
The authors would like to thank the staff of the Karolinska Institutet, Sweden, and Greenblatt Library at Augusta University for the use of library materials and assistance in obtaining full-text articles.
Authorship Contribution Statement
All authors contributed meaningfully to the conceptualization, article selection, writing, reviewing, and editing of this article.
Author Disclosure Statement
The authors declare no competing, personal financial interests or relationships impacting this study.
Funding Information
This research received no specific grant from any funding agency, commercial, or not-for-profit sectors.
