Sage Journals: Discover world-class research

Abstract

Empirical studies comparing newer text-to-speech (TTS) synthesis systems to older systems are lacking. This study compared two speech synthesizers, DECtalk ‘Perfect Paul,’ one of the most popular ‘older’ synthesizers, and a ‘newer’ synthesizer, AT&T's Natural Voices ‘Mike,’ for intelligibility utilizing the Modified Rhyme Test (MRT). Each system was evaluated at three speech-to-noise (S/N) ratios: −5 dB, −8 dB, and −11 dB, in a within-subjects design. Aircraft engine noise at 85 dB(A), produced by a Cessna 172R flight simulator, served as background noise. Normal hearing non-pilots served as subjects. Results indicated differences in intelligibility between the two speech synthesizers at each speech-to-noise ratio, with the AT&T product showing significantly better intelligibility than the DECtalk product. Potential applications of this research include guidance for the integration of automated voice technologies in the cockpit and in similar systems that present elevated levels of background noise during normal communications and auditory display operations.

Get full access to this article

View all access options for this article.

References

American National Standards Institute (ANSI) (1989). Method for measuring the intelligibility of speech over communication systems, ANSI S3.2-1989. New York: American National Standards Institute, Inc.

Francis

A. L.

Nusbaum

H. C.

(1999). The effect of lexical complexity on intelligibility. International Journal of Speech Technology, 3 pp. 15–25.

Greene

B. G.

Logan

J. S.

Pisoni

D. B.

(1986). Perception of synthetic speech produced automatically by rule: Intelligibility of eight text-to-speech systems. Behavior Research Methods, Instruments, and Computers, 18 (2), p. 100–107.

Kryter

K. D.

(1985). The Effects of Noise on Man. Orlando, FL, Academic Press, p. 61.

Paris

C. R.

Thomas

M. H.

Gilson

R. D.

Kincaid

J. P.

(2000). Linguistic cues and memory for synthetic and natural speech. Human Factors, 42 (3), pp. 421–431.

Ricard

G. L.

Meirs

S. L.

(1994). Intelligibility and localization of speech from virtual directions. Human Factors, 36 (1), pp. 120–128.

Rehmann

A. J.

(1996). Airborne data link study report (Tech. Report DOT/FAA/CT-TN95/62). Pleasantville, NJ: Computer Technology Associates, Inc.

Rehmann

A. J.

(1997). Human factors recommendations for airborne controller-pilot data link communication (CPDLC) systems: a synthesis of research results and literature (Tech. Report DOT/FAA/CT-TN97/6). Atlantic City International Airport, NJ: William J. Hughes Technical Center.

Venkatagiri

H. S.

(2003). Segmental intelligibility of four currently used text-to-speech synthesis methods. Journal of the Acoustical Society of America, 113 (14), p. 2095–2014.

Comparison of Two Voice Synthesis Systems as to Speech Intelligibility in Aircraft Cockpit Engine Noise

Abstract

Get full access to this article

References