What Comes After /f/? Prediction in Speech Derives From Data-Explanatory Processes

Abstract

Acoustic cues are short-lived and highly variable, which makes speech perception a difficult problem. However, most listeners solve this problem effortlessly. In the present experiment, we demonstrated that part of the solution lies in predicting upcoming speech sounds and that predictions are modulated by high-level expectations about the current sound. Participants heard isolated fricatives (e.g., “s,” “sh”) and predicted the upcoming vowel. Accuracy was above chance, which suggests that fine-grained detail in the signal can be used for prediction. A second group performed the same task but also saw a still face and a letter corresponding to the fricative. This group performed markedly better, which suggests that high-level knowledge modulates prediction by helping listeners form expectations about what the fricative should have sounded like. This suggests a form of data explanation operating in speech perception: Listeners account for variance due to their knowledge of the talker and current phoneme, and they use what is left over to make more accurate predictions about the next sound.

Keywords

speech perception anticipation predictive coding generative models social expectations auditory processing open data

Get full access to this article

View all access options for this article.

References

Apfelbaum

K. S.

Bullock-Rest

Rhone

Jongman

McMurray

(2014). Contingent categorization in speech perception. Language, Cognition and Neuroscience, 29, 1070–1082.

Bates

Sarkar

(2011). lme4: Linear mixed-effects models using S4 classes (Version 1.1-7) [Computer software]. Retrieved from https://cran.r-project.org/web/packages/lme4/index.html

Beddor

P. S.

Harnsberger

J. D.

Lindemann

(2002). Language-specific patterns of vowel-to-vowel coarticulation: Acoustic structures and their perceptual correlates. Journal of Phonetics, 30, 591–627.

Blank

von Kriegstein

(2013). Mechanisms of enhancing visual–speech recognition by prior auditory information. NeuroImage, 65, 109–118. doi:10.1016/j.neuroimage.2012.09.047

Blumstein

S. E.

Stevens

K. N.

(1979). Acoustic invariance in speech production: Evidence from measurements of the spectral characteristics of stop consonants. Journal of the Acoustical Society of America, 66, 1001–1017.

Clark

(2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral & Brain Sciences, 36, 181–204.

Cole

J. S.

Linebaugh

Munson

McMurray

(2010). Unmasking the acoustic effects of vowel-to-vowel coarticulation: A statistical modeling approach. Journal of Phonetics, 38, 167–184.

Daniloff

Moll

(1974). Coarticulation of lip rounding. In Lass

N. J.

(Ed.), Experimental phonetics (pp. 100–114). New York, NY: MSS Information Corp.

Fowler

C. A.

Dekle

D. J.

(1991). Listening with eye and hand: Cross-modal contributions to speech perception. Journal of Experimental Psychology: Human Perception and Performance, 17, 816–828.

10.

Fowler

C. A.

Smith

(1986). Speech perception as “vector analysis”: An approach to the problems of segmentation and invariance. In Perkell

J. S.

Klatt

(Eds.), Invariance and variability in speech processes (pp. 123–136). Hillsdale, NJ: Erlbaum.

11.

Gagnepain

Henson

R. N.

Davis

M. H.

(2012). Temporal predictive codes for spoken words in auditory cortex. Current Biology, 22, 615–621.

12.

Goldinger

S. D.

(1998). Echoes of echoes? An episodic theory of lexical access. Psychological Review, 105, 251–279.

13.

Gow

D. W.

Jr. (2001). Assimilation and anticipation in continuous spoken word recognition. Journal of Memory and Language, 45, 133–159.

14.

Gow

D. W.

Jr. (2003). Feature parsing: Feature cue mapping in spoken word recognition. Perception & Psychophysics, 65, 575–590.

15.

Gow

D. W.

Jr. McMurray

(2007). Word recognition and phonology: The case of English coronal place assimilation. In Cole

Hualde

J. I

(Eds.), Laboratory phonology 9 (pp. 173–200). New York, NY: Mouton de Gruyter.

16.

Hay

Drager

(2010). Stuffed toys and speech perception. Linguistics, 48, 865–892.

17.

Houde

J. F.

Nagarajan

S. S.

Sekihara

Merzenich

M. M.

(2002). Modulation of the auditory cortex during speech: An MEG study. Journal of Cognitive Neuroscience, 14, 1125–1138.

18.

Johnson

K. C.

Strand

E. A.

D’Imperio

(1999). Auditory-visual integration of talker gender in vowel perception. Journal of Phonetics, 24, 359–384.

19.

Jongman

Wayland

Wong

(2000). Acoustic characteristics of English fricatives. Journal of the Acoustical Society of America, 106, 1252–1263.

20.

Kleinschmidt

Jaeger

(2015). Robust speech perception: Recognize the familiar, generalize to the similar, and adapt to the novel. Psychological Review, 122, 148–203.

21.

Liberman

A. M.

Mattingly

(1985). The motor theory of speech perception revised. Cognition, 21, 1–36.

22.

Martin

J. G.

Bunnell

H. T.

(1981). Perception of anticipatory coarticulation effects. Journal of the Acoustical Society of America, 69, 559–567.

23.

McClelland

J. L.

Elman

J. L.

(1986). The TRACE model of speech perception. Cognitive Psychology, 18, 1–86.

24.

McClelland

J. L.

Mirman

Holt

L. L.

(2006). Are there interactive processes in speech perception? Trends in Cognitive Sciences, 10, 363–369.

25.

McMurray

Jongman

(2011). What information is necessary for speech categorization? Harnessing variability in the speech signal by integrating cues computed relative to expectations. Psychological Review, 118, 219–246.

26.

Miall

R. C.

Wolpert

D. M.

(1996). Forward models for physiological motor control. Neural Networks, 9, 1265–1279. doi:10.1016/S0893-6080(96)00035-4

27.

Nearey

T. M.

(1990). The segment as a unit of speech perception. Journal of Phonetics, 18, 347–373.

28.

Norris

McQueen

Cutler

(2000). Merging information in speech recognition: Feedback is never necessary. Behavioral & Brain Sciences, 23, 299–370.

29.

Nygaard

L. C.

Sommers

M. S.

Pisoni

D. B.

(1994). Speech perception as a talker-contingent process. Psychological Science, 5, 42–46.

30.

Oden

Massaro

D. W.

(1978). Integration of featural information in speech perception. Psychological Review, 85, 172–191.

31.

Perrachione

T. K.

Del Tufo

S. N.

Gabrieli

J. D. E.

(2011). Human voice recognition depends on language ability. Science, 333, 595. doi:10.1126/science.1207327

32.

Pickering

M. J.

Garrod

(2013). An integrated theory of language production and comprehension. Behavioral & Brain Sciences, 36, 329–347. doi:10.1017/S0140525X12001495

33.

Rao

R. P. N.

Ballard

D. H.

(1999). Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2, 79–87.

34.

R Development Core Team. (2008). R: A language and environment for statistical computing. Retrieved from http://www.R-project.org

35.

Rhodes

Leopold

D. A.

(2011). Adaptive norm-based coding of face identity. In Calder

A. J.

Rhodes

Johnson

M. H.

Haxby

J. V.

(Eds.), The Oxford handbook of face perception (pp. 263–286). Oxford, England: Oxford University Press.

36.

Salverda

A. P.

Kleinschmidt

Tanenhaus

M. K.

(2014). Immediate effects of anticipatory coarticulation in spoken-word recognition. Journal of Memory and Language, 71, 145–163.

37.

Smits

(2001). Evidence for hierarchical categorization of coarticulated phonemes. Journal of Experimental Psychology: Human Perception and Performance, 27, 1145–1162.

38.

Strand

(1999). Uncovering the role of gender stereotypes in speech perception. Journal of Language and Social Psychology, 18, 86–100.

39.

von Kriegstein

Smith

D. R. R.

Patterson

R. D.

Kiebel

S. J.

Griffiths

T. D.

(2010). How the human brain recognizes speech in the context of changing speakers. The Journal of Neuroscience, 30, 629–638. doi:10.1523/jneurosci.2742-09.2010

40.

Wolpert

D. M.

Flanagan

J. R.

(2001). Motor prediction. Current Biology, 11, R729–R732.

41.

Yeni-Komshian

G. H.

Soli

S. D.

(1981). Recognition of vowels from information in fricatives: Perceptual evidence of fricative-vowel coarticulation. Journal of the Acoustical Society of America, 70, 966–975.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

1.18 MB

1.90 MB

0.07 MB