Effect of Error Correction Strategy on Speech Dictation Throughput

Abstract

The eight participants in this experiment used two different commercially available speech recognition dictation systems to complete a variety of reading transcription tasks. Participants enrolled fully in both systems. They received training in two correction strategies for both systems: multimodal correction (voice plus mouse plus keyboard) and hands-free correction (voice-only), and used both strategies during the experiment. The key findings were:

•

Both dictation systems were equally accurate.

•

Throughput (corrected words per minute) was significantly (63%) faster using multimodal correction.

•

Speaking rates were the same for both systems and correction strategies, averaging around 105-110 utterances (words and commands) per minute.

•

Correction speeds for the multimodal correction strategy (13.2 seconds per correction) were significantly faster than (a little more than twice as fast as) those for hands-free correction (29.1 seconds per correction).

•

At the end of the experiment, participants indicated they significantly preferred the multimodal correction strategy.

Get full access to this article

View all access options for this article.

References

Card

S. K.

Moran

T. P.

Newell

(1983). The psychology of human-computer interaction. Hillsdale, NJ: Lawrence Erlbaum.

Lewis

J. R.

(1995). IBM computer usability satisfaction questionnaires: Psychometric evaluation and instructions for use. International Journal of Human-Computer Interaction, 7, 57–78.

Massaro

D. W.

(1975). Preperceptual images, processing time, and perceptual units in speech perception. In Massaro

D. W.

(ed.), Understanding language: An information-processing analysis of speech perception, reading, and psycholinguistics (pp. 125–150). New York, NY: Academic Press.