Sage Journals: Discover world-class research

Abstract

Speech recognition software can increase the frequency of errors in radiology reports, which may affect patient care. We retrieved 213,977 speech recognition software–generated reports from 147 different radiologists and proofread them for errors. Errors were classified as “material” if they were believed to alter interpretation of the report. “Immaterial” errors were subclassified as intrusion/omission or spelling errors. The proportion of errors and error type were compared among individual radiologists, imaging subspecialty, and time periods. In all, 20,759 reports (9.7%) contained errors, of which 3992 (1.9%) were material errors. Among immaterial errors, spelling errors were more common than intrusion/omission errors (p < .001). Proportion of errors and fraction of material errors varied significantly among radiologists and between imaging subspecialties (p < .001). Errors were more common in cross-sectional reports, reports reinterpreting results of outside examinations, and procedural studies (all p < .001). Error rate decreased over time (p < .001), which suggests that a quality control program with regular feedback may reduce errors.

Keywords

PowerScribe quality control radiology report report errors speech recognition

Introduction

Although there are many ways for radiologists to provide the results of an imaging examination to referring clinicians, the signed written report remains the primary and often sole means of communication.^1,2 In many radiology practices, transcribed dictation by a professional transcriptionist has been replaced by real-time speech recognition and self-editing of reports. In general, speech recognition software (SRS) greatly improves turnaround times for reports compared with remote transcription and allows for more immediate control over report editing than traditional paper markup or asynchronous transcription modification.^3–6 However, this often leaves the radiologist as the sole author and editor of the final text that is placed in the radiology information system or electronic medical record.

Most currently available commercial SRS products rely on nearest match for each word transcribed and do not check for logical relevance or perform natural language processing for real-time recognition and transcription of dictation. Self-editing can be prone to typographic or other errors that may not be noticed by the radiologist. Consequently, persistent undetected report errors that could impede understanding or lead to erroneous conclusions are much more common than seen with expert transcription by a trained professional with excellent typing skills and an understanding of medical language and context. In addition to errors in radiologic diagnosis, simple errors in syntax and grammar that occur during report transcription can have dire consequences for which the signing radiologist is liable.⁷ Communication errors are extremely common and one of the top reasons radiologists are sued for medical malpractice.⁸ Self-editing errors by a radiologist represent a mitigable threat to appropriate patient care.

The primary reason for the emergence of errors in syntax and semantics in radiology reports is the relatively recent application of SRS and consequent assimilation of transcription duties by the radiologist.^4,6,9 Examples of such errors fall into several categories: (1) omission of appropriate words/phrases, which includes deletions and missing words; (2) intrusion of incorrect words/phrases, which includes interjection, incorrect words, wrong word substitution, insertions, or right–left substitutions; and (3) spelling errors, which includes word truncation, most likely due to the manual editing of text by a radiologist through typing errors or inaccurate selection of text to be removed or edited. Additional errors that do not necessarily fit into the above categories include incorrect dates, image/series numbering errors, measurement scale errors (e.g., cm vs. mm), template errors, and punctuation errors.¹⁰ Different combinations of these error types may lead to nonsense phrases with variable effects on interpretation and comprehension of the final report. Some omissions or intrusions (particularly the word “no” in various contexts) and word substitutions (such as “new” rather than “no”) can potentially affect patient care.¹

At our institution, a portion of all the signed radiology reports generated with SRS are regularly audited by transcriptionists and assessed for syntactic and semantic errors. Any potential errors comprising apparently illogical or inappropriate words/phrases, misspellings, or other errors discovered by the transcriptionist trigger a notification of the staff radiologist with an opportunity to correct the report and notify the referring service. We reviewed the data from these audits to investigate any possible patterns that could help us improve our report quality in the future. Specifically, we investigated four hypotheses, on the basis of our clinical experience: (1) error rate and type vary by radiologist; (2) error rate and type vary by type of imaging examination—we expect more complicated and longer reports, such as cross-sectional (CS) imaging or procedural reports, to contain more errors than shorter plain radiography (CR) reports; (3) the implementation of a quality control program with regular feedback should decrease errors over time; and (4) a Dictaphone hardware upgrade should decrease error rates.

Methods

Our institutional review board approved the study protocol. Informed consent was waived because of the retrospective nature of the study and deidentification of the patients and radiologists involved in the analysis. As part of our department’s radiology report quality assurance, an in-house transcriptionist reads every single report dictated using an SRS dictation system (PowerScribe; Nuance Communications, Inc) 2 days per month for every staff radiologist and evaluates these reports for potential errors. If the percentage of reports containing at least one error exceeds 3 percent on either day for a particular radiologist, then all of their reports are similarly scrutinized every subsequent day until their error rate drops below 3 percent. All radiologists with error rates below 3 percent continue to have all their reports audited, on average, 2 days per month. A total of 13 trained medical transcriptionists with experience ranging from 1 to 23 years participated.

Errors are categorized as “material” or “immaterial.” Material errors are believed by the transcriptionist to potentially alter interpretation of the radiology report. Material errors trigger an email notification of the staff radiologist to allow him or her to correct or revise the report and notify the referring service. Immaterial errors are further subcategorized as spelling mistakes or intrusion/omission errors. Reports with multiple errors are only counted once and are classified by the most egregious error type (material > intrusion/omission > spelling). Incorrect date, incorrect measurement, and left–right substitution are all classified as material errors. Punctuation errors are ignored. Errors in radiologic interpretation were not included in this study. The use of macros or standard templates was not recorded.

In addition to type of error and radiology staff member, the date of the report and imaging subspecialty of the examination (ISE) were captured. Imaging methods included computed tomography (CT), magnetic resonance imaging (MRI), plain radiography (CR), nuclear medicine (NM), neuroradiology (NR), and ultrasonography (US). Specific ISEs recorded were “CT Body” (computed tomography of the chest, abdomen, pelvis, or extremities), “CT Neuro” (computed tomography of the head, neck, or spine), “CR,” “MR Body” (magnetic resonance imaging of the chest, abdomen, pelvis, or extremities), “MR Neuro” (magnetic resonance imaging of the head, neck, or spine), “NM,” “NR” (NR procedures such as lumbar puncture and myelography), “US,” “V&I” (vascular or interventional procedures), and “OS” (reinterpretation of any radiology study from another/outside institution of any body part or modality). For analysis purposes, several groups of ISEs were created, including a cross-sectional (CS) imaging group (CT Body, CT Neuro, MR Body, MR Neuro, and US), a procedural group (V&I and NR), and a diagnostic group (CT Body, CT Neuro, CR, MR Body, MR Neuro, US, and NM).

We retrospectively retrieved all reports generated by SRS and signed by 147 different radiologists from 3 January 2011 through 16 April 2014. Mammography reports were excluded because only a small fraction of these examinations are interpreted and reported using SRS at our institution. Similarly, many of our CR examinations and procedures are transcribed without SRS, either by immediate direct transcription in the room, or asynchronously through a digital dictation and remote transcription system. The main reason for the use of direct transcription over SRS with CR examinations is that direct transcription generates a finalized report faster, and CR results are often emergently required. Therefore, the number of CR reports completed with SRS is far less than the total number of these types of examinations reported at our institution. Radiologists were grouped into four categories on the basis of their total error percentage quartiles over the entire time period: group 1 (<5.5% total errors), group 2 (5.5%–7.9% total errors), group 3 (8.0%–10.5% total errors), and group 4 (>10.5% total errors).

Reports were divided into four time periods of exactly 300 days to analyze trends over time: 3 January to 29 October 2011, 30 October 2011 to 24 August 2012, 25 August 2012 to 20 June 2013, and 21 June 2013 to 16 April 2014. A total of 36 radiologists were excluded from the time analysis portion only because of an insufficient number of reports (<100) reviewed by transcription as part of the quality control project in any of the time periods to mitigate confounding of time-based trends by individual radiologists.

Our department updated the dictation microphones from the PowerMic I to PowerMic II Dictaphone (Nuance Communications, Inc) during August and September 2013. To test for differences in error rate as a result of this hardware modification, we also compared reports created in the 6 months immediately before (1 February through 31 July 2013) and immediately after (1 October 2013 through 31 March 2014) the upgrade.

Descriptive categorical data are presented using counts and percentages. Contingency (χ²) analysis and multiple logistic regression were used as appropriate for comparing nominal data, with calculation of odds ratios (ORs) and 95 percent confidence intervals (CIs). p values <.05 were considered statistically significant. Analyses were performed using JMP version 9.0.3 (SAS Institute, Inc).

Results

Errors by radiologist

A total of 213,977 reports were retrieved. Among these, 20,759 (9.7%) had errors, including 3992 (1.9%) with material errors. The mean (standard deviation (SD)) total error percentage by radiologist was 8.7 percent (5.0%; range, 0.8%–35.1%), and the percentage differed significantly among radiologists (p < .001; Table 1). The mean (SD) percentage of material errors per radiologist (out of total errors) was 16.2 percent (7.9%; range, 0.0%–38.7%), which also varied significantly among radiologists (p < .001). Among all immaterial errors (n = 16,767; 80.8% of all errors), spelling errors (n = 10,151; 60.5%) were more common than intrusion/omission errors (n = 6616; 39.5%; p < .001).

Table 1.

Percentage of errors per radiologist.

Total errors per radiologist (%)	No. (%) of radiologists (N = 147)
0–2.5	3 (2.0)
2.6–5.0	29 (19.7)
5.1–7.5	34 (23.1)
7.6–10.0	38 (25.9)
10.1–12.5	19 (12.9)
12.6–15.0	11 (7.5)
15.1–17.5	6 (4.1)
17.6–20.0	1 (0.7)
20.1–22.5	2 (1.4)
22.6–25.0	2 (1.4)
25.1–27.5	1 (0.7)
27.6–35.0	0 (0)
35.1–37.5	1 (0.7)

Errors by exam type

When the data were separated by ISE category, the mean (SD) overall error percentage per category was 11.4 percent (5.2%), material error percentage was 20.8 percent (4.6%) of total errors, and spelling error percentage was 63.1 percent (7.1%) of immaterial errors. These error rates also varied significantly by ISE (p < .001). The ISEs NR and MR Body had the most errors, and US and CR had the fewest (Table 2).

Table 2.

Errors by ISE category.

ISE	Material errors^a (n = 3992)	Immaterial errors^b		All errors/no. of reports (%) (N = 20,759)
ISE	Material errors^a (n = 3992)	Omission/intrusion (n = 6616)	Spelling (n = 10,151)	All errors/no. of reports (%) (N = 20,759)
NR	96	92	208	396/2010 (19.7)
MR Body	455	870	1221	2546/14,079 (18.1)
OS	669	836	964	2469/17,924 (13.8)
V&I	100	49	185	334/2506 (13.3)
CT Body	1205	2179	3553	6937/55,094 (12.6)
MR Neuro	438	920	1197	2555/25,303 (10.1)
CT Neuro	316	506	823	1645/17,296 (9.5)
NM	187	295	553	1035/12,162 (8.5)
US	279	492	843	1614/28,925 (5.6)
CR	247	377	604	1228/38,678 (3.2)

ISE: imaging subspecialty of the examination; NR: neuroradiology; MR Body: magnetic resonance imaging of the chest, abdomen, pelvis, or extremities; OS: reinterpretation of any radiology study from another/outside institution of any body part or modality; V&I: vascular or interventional procedures; CT Body: computed tomography of the chest, abdomen, pelvis, or extremities; MR Neuro: magnetic resonance imaging of the head, neck, or spine; CT Neuro: computed tomography of the head, neck, or spine; NM: nuclear medicine; US: ultrasonography; CR: plain radiography.

Material errors are those believed by the transcriptionist to potentially alter interpretation of the radiology report.

Immaterial errors are those believed not to alter interpretation of the radiology report (i.e. intrusion/omission errors or spelling mistakes).

Percentages of errors for different types of reports are shown in Table 3. Compared with in-house dictations, reports dictated on outside examinations (OS category) were significantly more likely to result in an error (OR, 1.55; 95% CI, 1.48–1.62) or material error (OR, 1.67; 95% CI, 1.52–1.84; p < .001). CS reports were much more likely to contain an error (OR, 3.72; 95% CI, 3.51–3.95) than CR reports (p < .001), although material errors were more likely with CR than CS reports (OR, 1.18; 95% CI, 1.02–1.36; p = .03). There was no difference in spelling errors between CS and CR reports (p = .55). Total errors (OR, 1.91; 95% CI, 1.76–2.07), material errors (OR, 1.69; 95% CI, 1.43–2.00), and spelling errors (OR, 1.79; 95% CI, 1.47–2.17) all were more common in the procedural group than the diagnostic imaging group (all p < .001).

Table 3.

Comparison of errors by report type.

Report category	Total reports	Errors
Report category	Total reports	All^a	Immaterial^b	Material^b
Origin of examination
In-house	196,053	18,290 (9.3)	14,967 (81.8%)	3323 (18.2%)
OS	17,924	2469 (13.8)	1800 (72.9%)	669 (27.1%)
Imaging type
CS	140,697	15,297 (10.9)	12,604 (82.4%)	2693 (17.6%)
CR	38,678	1228 (3.2)	981 (79.9%)	247 (20.1%)
Study type
Procedural	4516	730 (16.2)	534 (73.2%)	196 (26.8%)
Diagnostic	191,537	17,560 (9.2)	14,433 (82.2%)	3127 (17.8%)

OS: reinterpretation of any radiology study from another/outside institution of any body part or modality; CS: cross-sectional imaging group (see the “Methods” section); CR: plain radiography.

Number (%) of reports with an error.

Number (%) of error type per all errors.

Error type trends

The four groups representing quartiles of radiologists, comprised 38 radiologists in group 1 (<5.5% total errors), 36 in group 2 (5.5%–7.9% total errors), 37 in group 3 (8.0%–10.5%), and 36 in group 4 (>10.5% total errors). With the exception of comparisons between adjacent groups 4 and 3 (p = .15) and 2 and 1 (p = .45), all other comparisons between groups demonstrated significantly increased probability of material error with increasing total error percentage (p < .001; Figure 1). The largest difference was between groups 4 and 2 (OR, 1.59; 95% CI, 1.42–1.79) and the smallest was between groups 3 and 1 (OR, 1.41; 95% CI, 1.23–1.62). There was no association between radiologist group and proportion of spelling errors (p = .12).

Figure 1.

Percentage of material errors by radiologist error quartile (N = 147). Boxes show the median, interquartile range, and range of percentage of material errors out of total errors by radiologist for each radiologist group. Radiologist groups with higher total percentage of errors generally also had a higher percentage of those errors being material.

Since all types of errors were more common in the procedural group of reports, and the distribution of radiologists is known to differ between this group and the diagnostic imaging group, we hypothesized that the different radiologist makeup of each group may explain this disparity. Testing for associations between radiologist group and ISE group (χ²) as potential covariates not surprisingly uncovered a possible relationship between the procedural reports and higher error rate radiologist group (Table 4; p = .004). To control for this potential confounding, a multiple logistic regression model was performed demonstrating that dictating procedural reports remains an independent predictor of total error rate (p < .001), with an adjusted OR of 1.27 (95% CI, 1.17–1.38) for a procedural report versus a diagnostic report, regardless of radiologist group. All other ISE groups and radiologist groups did not covary.

Table 4.

Errors by radiologist group: procedural versus diagnostic study type.

Study type	Errors by radiologist group^a,b
	Group 1 (n = 1732)	Group 2 (n = 2596)	Group 3 (n = 4492)	Group 4 (n = 9470)
Procedural (n = 730)	11/147 (7.5)	69/854 (8.1)	64/554 (11.6)	586/2961 (19.8)
Diagnostic (n = 17,560)	1721/47,710 (3.6)	2527/39,166 (6.5)	4428/50,206 (8.8)	8884/54,455 (16.3)

Values are all errors/number of reports (%).

Group 1 (<5.5% total errors), group 2 (5.5%–7.9% total errors), group 3 (8.0%–10.5% total errors), and group 4 (>10.5% total errors).

Error rate over time

The overall error rate decreased significantly over time when comparing either of the first 2 time periods with any later time period (p < .001). The largest decrease in total error rate occurred between the first and third periods (OR, 0.68; 95% CI, 0.65–0.71), with an actual mean (SD) error percentage change from 10.1 percent (6.7%) to 7.4 percent (4.3%). The error rate stopped decreasing between the last two time periods, with a significant increase in error rate only among group 4 radiologists (p = .003; Figure 2). The total error percentage for the 6-month period after upgrade to the PowerMic II Dictaphone was 9.0 percent, compared with 8.5 percent before the hardware change, which was not significantly different (p = .06).

Figure 2.

Box plot of total errors by time period (n = 111). Boxes show the median, interquartile range, and range of error rates for each time period. Lines show overall mean values and mean values of radiologists in the first and fourth error groups. The time-dependent variability is greater among radiologists with the most overall errors (group 4).

Discussion

SRS-related error rates reported in the literature vary between 4.8 and 38 percent among finalized radiology reports.^1,4,10,11 Higher error rates, in general, have been reported in studies that examined only CS modalities. For example, Pezzullo et al.⁴ reported a total error rate of 35 percent using SRS to interpret spine MRI, and Quint et al.¹⁰ reported 22 percent total errors in CT of the head, neck, chest, abdomen, and pelvis. However, only a 6 percent error rate was reported by Chang et al.¹ among radiography (“CR group”) reports, compared with a 38 percent error rate in their “non-CR” group. A study by McGurk et al.¹¹ that excluded MRI reports found a 4.8 percent error rate. Our results examining nearly a quarter-million reports confirm this trend, with a lower error rate of 3.2 percent for CR, compared with 11.0 percent for CS (OR, 3.72). Chang et al.¹ calculated a relative risk of error in their “non-CR” group of 3.5 compared with CR. Our reported OR converts to a relative risk of 3.42 (95% CI, 3.23–3.62), showing excellent agreement with Chang et al.¹ despite large differences in error rates for each modality group between the two studies. This is good evidence for a real increase in the probability of SRS-related error in CS imaging reports versus CR reports (3.4- to 3.5-fold increased risk). This effect may persist even when macros are used.⁵

There are many reasons for differences in error rate between various studies in the literature, not the least of which is heterogeneity of particular software vendors, versions, and equipment.¹² Our errors may be at the low end of the spectrum because of a strict quality control policy in place. There is evidence for this in the time-dependent decrease in error rate since the transcriptionist-auditing program began essentially at the beginning of our study period (Figure 2), with the notable exception that we do not have data predating the quality control program to compare. Of interest, the decrease did not continue throughout, suggesting that there may be a lower limit to the error rate achievable at a large institution. The significant variation in error rate among radiologists may play a role. The group 4 radiologists (with the highest and most variable error rates) actually had increased total error rates in the last time period (Figure 2), which suggests that routine feedback regarding errors does not affect every radiologist in the same way, at least over time. Another possibility may be degradation of speech recognition for some radiologist voice models over time, or perhaps aging hardware/microphones in some areas frequented by these radiologists. Radiologist-dependent variability in error rate has been documented by others, as well,¹ ranging from 0 to 100 percent.¹⁰

Despite the increased probability of error in CS compared with CR studies, a greater proportion of material errors were present among the CR reports. Similarly, Rana et al.⁵ found a greater incidence of “major errors” among their CR reports compared with CS, despite greater total errors in the CS cohort. Although CS reports may theoretically be more likely to contain an error because they contain more words and phrases than CR reports,⁵ this does not explain the opposite discrepancy in material errors. In fact, Chang et al.¹ found the opposite, with “very significant” errors found in 8 percent of reports in the “non-CR” group and 0.5 percent in the “CR group.” It is possible that the relatively shorter ratio of interpretation time to report dictation/editing time in CR examinations than in CS examinations results in lower awareness of significant intrusions, omissions, or other errors in CR reports. Because of the large number of reports in our data set, we did not quantify length of report, although we would expect that it might correlate with total error frequency.⁵

Despite this notable exception in CR reports, we generally found a greater fraction of material errors to be associated with greater total error rates. This was specifically the case among OS reports, procedural reports, different radiologist groups, individual radiologists, and ISEs. We also found increased error rate in procedural reports (e.g. vascular interventional, lumbar puncture) compared with diagnostic reports (OR, 1.91) and OS compared with in-house examinations (OR, 1.55). We could not find other reports of similar findings in the literature. Although we did not specifically track the use of templates, the CS ISE reports with the fewest errors, US, were frequently made with the use of templates.

Surprisingly, spelling errors were the most common type of immaterial error and usually did not significantly differ between group comparisons. SRS systems do not make spelling mistakes. This means that many of our radiologists type their reports or make edits when proofreading their reports rather than use the SRS process. The spellcheck function in our software is not automatic and must be manually triggered by the user before signing the report, no doubt further contributing to spelling errors in a busy environment. This strongly contradicts earlier work demonstrating decreased spelling errors with SRS.¹³ Ironically, then, a consequence of using a technique that should not result in spelling errors has been a rather large increase in spelling errors in an environment in which spellcheck is not mandatory or visible in real-time as highlighted or otherwise marked text. The human interaction component to technology cannot be overlooked. Unfortunately, we do not know the proportion of spelling errors that contributed to material errors, and therefore the clinical consequences are unknown. Contributors to this phenomenon, as well as other errors, may include cursory report editing due to pressure for quick turnaround time or other failures in the proofreading process, as well as an underestimation of actual error frequency.¹⁰ Busy inpatient working environments and nonnative English-speaking status have also been linked with increased error rates.^11,14 We did not look for associations between error rate and radiologist experience level or presence of trainees, although others have found no such relationships.^5,10,11

Our study has several limitations. Report errors were not automatically parsed by a computer but were elucidated via human proofreading, with its inherent fallibility and subjectivity. This may result in underestimation of the number of errors, as well as misclassification of types of errors. This limitation is likely to be small, however, given that experienced professional medical transcriptionists were used. Another limitation is the absence of subclassification of material errors. Whereas spelling errors are the most common immaterial error, it is unclear whether this is also true among the material errors, which are more likely to obfuscate report meaning to the extent of complicating or altering patient management. Although such judgments were made by transcriptionists, it was not feasible for a quarter-million reports to be re-reviewed by radiologists or other physicians. A related limitation is the difficulty in quantifying and comparing “very significant,” “major,” and “material” types of errors. Our use of “material” error, defined as any error that could potentially impede understanding of any part of the report, may then include errors that would not necessarily be categorized as “major” or “very significant” in other publications.

Other potential biases are related to the retrospective nature of the study and data collection. The most significant may be that reports from radiologists with error rates greater than 3 percent were reviewed more frequently than those with lower error rates, possibly skewing the data. However, since only 8 of 147 radiologists had an average total error rate less than 3 percent, this skew is likely mild. We also note that most CR reports at our institution are transcribed directly to a transcriptionist, without the use of SRS, which may bias our results for studies that are performed off-hours or in areas where CR is not routinely reported.

SRS has clear advantages, including ease of integration with the radiology information system and picture archiving and communications systems,⁴ decreased report turnaround time^{3–5,9,13,15,16} as dictation and transcription processes are combined,⁶ and shorter reports.¹³ Claims of cost-effectiveness^13,17 can be dubious depending on which costs are included in the analysis and how effectiveness is defined.¹⁸ SRS has been shown to contribute to decreased radiologist productivity.^9,19,20 Those who attempt to account for this fact in economic analyses find that SRS results in net increased cost.^4,20 SRS has been demonstrated to decrease productivity in other fields as well, such as among endocrinology and psychiatry secretaries.²¹ This may someday be overcome with continued improvements in SRS technology. For example, natural language processing software or use of “send-to-editor” functionality (e.g. report review by transcriptionist after SRS recognition) may potentially increase accuracy and efficiency of the radiologist.²² Probably the most serious downside to SRS is the higher error rate.^4,5,11,23 The American College of Radiology²⁴ recommends that radiologists proofread their final reports to minimize these types of semantic and syntactic errors.

Future research

Our results suggest several potential areas of focus for future research. Given the variability in error rates among different radiologists and different types of imaging examinations, departments with limited resources may wish to take a more targeted approach to the problem. Selective auditing of CS and procedural reports, for example, could potentially have a larger impact on overall error rates. Further study of differences in error frequency between reports generated using templates and macros, compared with traditional SRS, is certainly warranted. In addition to our US data results, other studies have found the regular use of macros or templates to be potentially helpful for reducing report errors.^11,25 We are currently expanding our use of templates into other divisions, as are many other departments throughout the world, to determine whether error frequency can be further decreased.

Given the high incidence of spelling errors in our data, we have decided to permanently enable the spellcheck feature of our SRS, so that it is no longer optional. It will be interesting to note any future changes in error frequency and report turnaround time. As suggested earlier, further research regarding the “send-to-editor” functionality of some SRSs is greatly needed. Although, in theory, the “send-to-editor” function may combine the advantages of SRS and transcription, there are also potential disadvantages, including some potential loss of efficiency and increased turnaround time.²² As SRS technology progresses, new implementations of hardware and software must be tested in real clinical environments before being widely adopted, particularly given the associated cost. For example, our Dictaphone hardware upgrade was expected to improve speech recognition, presumably based on vendor testing, but in our hands it had no effect on report error rates.

Conclusion

SRS-related errors are more common in CS reports (compared with CR), OS reports (compared with in-house examinations), and procedural studies (compared with diagnostic). When the total error rate increases, the fraction of material errors usually increases as well, except in the case of CR, in which material errors were more common than in CS reports. Error rates are highly variable among radiologists. Spelling errors are the most common type of immaterial error when automatic spelling correction is not mandatory, which suggests that editing radiologists often type rather than use SRS for report editing. A hardware upgrade from PowerMic I to PowerMic II Dictaphones had no effect on error rate. A quality control program with regular feedback can decrease errors over time, but there may be a limit.

For departments that use SRS, we recommend the following actions: (1) regularly audit reports, with feedback, for quality control; (2) focus efforts and resources on reports that are longer and more technical—in radiology departments, these include CS and procedural reports; (3) use automation whenever possible, including templates, macros, and automatic mandatory spellcheck; (4) perform trials of all costly hardware and software upgrades in your environment under your conditions; and (5) regularly retest the system for efficacy after any substantial changes.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

References

Chang

Strahan

Jolley

. Non-clinical errors using voice recognition dictation software for radiology reports: a retrospective audit. J Digit Imaging 2011; 24(4): 724–728.

Kushner

Lucey

; American College of Radiology. Diagnostic radiology reporting and communication: the ACR guideline. J Am Coll Radiol 2005; 2(1): 15–21.

Prevedello

Ledbetter

Farkas

. Implementation of speech recognition in a community-based radiology practice: effect on report turnaround times. J Am Coll Radiol 2014; 11(4): 402–406.

Pezzullo

Tung

Rogg

. Voice recognition dictation: radiologist as transcriptionist. J Digit Imaging 2008; 21(4): 384–389.

Rana

Hurst

Shepstone

. Voice recognition for radiology reporting: is it good enough? Clin Radiol 2005; 60(11): 1205–1212.

Langer

. Radiology speech recognition: workflow, integration, and productivity issues. Curr Probl Diagn Radiol 2002; 31(3): 95–104.

Smith

Berlin

. Signing a colleague’s radiology report. Am J Roentgenol 2001; 176(1): 27–30.

Physician Insurers Association of America. PIAA data sharing reports. Rockville, MD: Physician Insurers Association of America, 1985–2003.

Mehta

McLoud

. Voice recognition. J Thorac Imaging 2003; 18(3): 178–182.

10.

Quint

Myles

. Frequency and spectrum of errors in final radiology reports generated with automatic speech recognition technology. J Am Coll Radiol 2008; 5(12): 1196–1199.

11.

McGurk

Brauer

Macfarlane

. The effect of voice recognition software on comparative error rates in radiology reports. Br J Radiol 2008; 81(970): 767–770.

12.

Devine

Gaehde

Curtis

. Comparative evaluation of three continuous speech recognition software packages in the generation of medical reports. J Am Med Inform Assoc 2000; 7(5): 462–468.

13.

Ramaswamy

Chaljub

Esch

. Continuous speech recognition in MR imaging reporting: advantages, disadvantages, and impact. Am J Roentgenol 2000; 174(3): 617–622.

14.

Kanal

Hangiandreou

Sykes

. Initial evaluation of a continuous speech recognition program for radiology. J Digit Imaging 2001; 14(1): 30–37.

15.

Lemme

Morin

. The implementation of speech recognition in an electronic radiology practice. J Digit Imaging 2000; 13(2 Suppl. 1): 153–154.

16.

Zick

Olsen

. Voice recognition software versus a traditional transcription service for physician charting in the ED. Am J Emerg Med 2001; 19(4): 295–298.

17.

Sferrella

. Success with voice recognition. Radiol Manage 2003; 25(3): 42–49.

18.

Reinus

. Economics of radiology report editing using voice recognition technology. J Am Coll Radiol 2007; 4(12): 890–894.

19.

Hayt

Alexander

. The pros and cons of implementing PACS and speech recognition systems. J Digit Imaging 2001; 14(3): 149–157.

20.

Issenman

Jaffer

. Use of voice recognition software in an outpatient pediatric specialty practice. Pediatrics 2004; 114(3): e290–e293.

21.

Mohr

Turner

Pond

. Speech recognition as a transcription aid: a randomized comparison with standard transcription. J Am Med Inform Assoc 2003; 10(1): 85–93.

22.

Williams

Kori

Williams

. Journal Club: voice recognition dictation: analysis of report volume and use of the send-to-editor function. Am J Roentgenol 2013; 201(5): 1069–1074.

23.

Al-Aynati

Chorneyko

. Comparison of voice-automated transcription and human transcription in generating pathology reports. Arch Pathol Lab Med 2003; 127(6): 721–725 (also published Erratum in: Arch Pathol Lab Med 2003; 127(10): 1348).

24.

ACR practice parameter for communication of diagnostic imaging findings. Resolution 11, http://www.acr.org/~/media/C5D1443C9EA4424AA12477D1AD1D927D.pdf (accessed 30 September 2014).

25.

White

. Speech recognition implementation in radiology. Pediatr Radiol 2005; 35(9): 841–846.

Syntactic and semantic errors in radiology reports associated with speech recognition software