Abstract
The position of the tongue relative to the upper and lower jaws is regulated in part by the position of the hyoid bone, which, with the anterior and posterior suprahyoid muscles, controls the angulation and length of the floor of the mouth on which the tongue body ‘rides’. The instantaneous shape of the tongue is controlled by the ‘extrinsic muscles’ acting in concert with the ‘intrinsic’ muscles. Recent anatomical research in non-human mammals has shown that the intrinsic muscles can best be regarded as a ‘laminated segmental system’ with tightly packed layers of the ‘transverse’, ‘longitudinal’, and ‘vertical’ muscle fibers. Each segment receives separate innervation from branches of the hypoglosssal nerve. These new anatomical findings are contributing to the development of functional models of the tongue, many based on increasingly refined finite element modeling techniques. They also begin to explain the observed behavior of the jaw-hyoid-tongue complex, or the hyomandibular ‘kinetic chain’, in feeding and consecutive speech. Similarly, major efforts, involving many imaging techniques (cinefluorography, ultrasound, electro-palatography, NMRI, and others), have examined the spatial and temporal relationships of the tongue surface in sound production. The feeding literature shows localized tongue-surface change as the process progresses. The speech literature shows extensive change in tongue shape between classes of vowels and consonants. Although there is a fundamental dichotomy between the referential framework and the methodological approach to studies of the orofacial complex in feeding and speech, it is clear that many of the shapes adopted by the tongue in speaking are seen in feeding. It is suggested that the range of shapes used in feeding is the matrix for both behaviors.
(I) Introduction
Interest in the oropharyngeal complex has been rising. For a long time largely ignored, it is now receiving serious attention. A fully functional oropharyngeal complex is essential for normal feeding, breathing, and speech sound production. Degradation of any oropharyngeal function is a serious medical management challenge and a major quality-of-life issue for the patient. The tongue is a very difficult organ to examine. However, new technology, e.g., the invention of advanced imaging techniques, supports highly sophisticated analyses of tongue movement and shape. There is also growing interest in the biomechanics of feeding behavior and speech and their neural control. Clinical issues, such as the role of the tongue in obstructive sleep apnea, are receiving new emphasis. The need for improved synthesized speech has also contributed to these developments.
The mammalian tongue has vital functions in feeding: It plays a major role in ingestion, as in licking, lapping, and browsing; and it moves food distally through the oral cavity from the incisors to the post-canines for chewing, and then to the pharynx for bolus formation and swallowing. In dogs, the tongue has a thermoregulatory function in panting. Tongue position relative to the posterior pharyngeal wall is important in respiration. Chemo-receptors and mechanoreceptors in the tongue surface sense the nature and mechanical properties of ingested food, and prevent the digestion of noxious substances. In addition, tongue shape and position in the oral cavity influence the shape and dimensions of the airway between the palate and the tongue surface in mammals with Type I tongues (Doran and Baggett, 1971). Given the dominant role of speech in human interactions, an overwhelming proportion of the research on tongue movement focuses on its role in speech, specifically vowel and consonant production. [Nor is it surprising that speech research uses an internationally recognized alphabet of notations and abbreviations that are difficult for the oral biologist to follow. Where possible, those specialized usages have been avoided here.] Until recently, studies of the patterns of tongue movement in feeding were focused on non-human mammals (see Hiiemae and Crompton, 1985; Hiiemae, 2000). Since 1992 (Palmer et al.), there have been four reports on tongue, hyoid, and jaw movements in human subjects (complete sequences from initial ingestion to terminal swallow) when consuming foods of different initial consistencies (Palmer et al., 1997; Palmer, 1998; Hiiemae and Palmer, 1999; Hiiemae et al., 2002). In the same period, there have been significant developments in the approaches to tongue behavior in speech.
Doran (1975) and Doran and Baggett (1971) identified two types of mammalian tongues: Type I is the spatulate fleshy tongue found in almost all mammals. It can protrude up to 50% over its resting length and is capable of fairly complex movements. Type II tongues are the highly flexible whip-like organs found in anteaters and other myrmecophagous mammals. [For reviews of tetrapod feeding mechanisms, including a chapter on mammals, see Schwenk, 2000.] Although differing in their details, all Type I tongues share common characteristics and general architectures. Non-human mammals—e.g., opossums, rats, mice, rabbits, cats, tenrec, and a range of primates, but particularly the macaque—have been used in studies of feeding. Cats, rabbits, guinea pigs, and rats have also been used in studies investigating the neural control of that rhythmic activity. Clearly, tongue movements in speech can be studied only in humans.
There is almost no common ground between the ‘feeding/physiology’ literature and that focusing on speech, but a bridge may be appearing. MacNeilage (1998) hypothesizes that the movements of the tongue and jaw in speech (which he terms ‘cyclicities’) evolved from their movements in infantile babbling. This idea has its supporters and detractors but is superficially very appealing. No one has yet attempted, as far as we know, to test it experimentally. This idea is particularly relevant if one is interested in the evolution of speech, since, as should become clear within the body of this review, human tongue behavior in feeding builds on the patterns of movement in the hyolingual complex observed in other mammals. It is, therefore, reasonable to hypothesize that the matrix of tongue movements during human speech was derived from the wide variety of tongue movements found in suckling and feeding, although this view is controversial.
(II) The Hyolingual Complex
The tongues of mammals share certain important characteristics, but there are also important differences. The mammalian tongue cannot be viewed as a ‘freestanding’ organ. Rather, for almost all its functions, it depends on its linkages with the hyoid apparatus and lower jaw (Fig. 1). This is the hyolingual complex. There is a fundamental anatomical difference between the non-human tongue and that of humans. Non-human mammals have flat hard palates, mostly with well-developed rugae, long tooth rows, and a long flat tongue (small vertical dimension). The hyoid is behind rather than below the oropharyngeal surface of the tongue, with the epiglottis extending dorsally and coming into contact with the soft palate on its pharyngeal surface (see Hiiemae and Crompton, 1985). The non-human larynx is linked to the hyoid bone but positioned behind rather than under it. Human neonates have a comparable relationship among tongue, hyoid, larynx, epiglottis, and soft palate (Negus, 1949). Within a few months of birth, however, the neck begins to elongate, culminating in the ‘descent of the larynx’, creating the vertical component of the supra-laryngeal vocal tract (Fig. 2). As a result, the posterior oral seal and the upper esophageal sphincter (UES) become widely separated, and a ‘bend’ develops in the tongue surface, creating both oral and oropharyngeal surfaces. This change in humans has been attributed to the development of speech and is considered a prime cause of the morbidity and mortality associated with disturbances of swallowing (Palmer et al., 1992).
(1) The functional anatomy of the hyolingual complex
Unlike the elephant’s trunk (Kier and Smith, 1985; Smith and Kier, 1989), the mammalian tongue is short and is anchored to the mandible, the hyoid, and cranial base by its extrinsic muscles. Although still an hypothesis, the evidence points to the biomechanics of the mammalian (and human) tongue as being consistent with that predicted for a muscular hydrostat: i.e., the tongue is of fixed incompressible volume such that distortion in one direction/axis affects the other two [see below]. Stone and Lundberg (1996) explain tongue shape based on this principle. Takemoto (2001) bases his recent analysis of the tongue musculature on the ‘hydrostatic’ interpretation of the anatomy of the tongue. Even when it acts as a hydrostat, the movements and shape changes of the human tongue occur in a space whose dimensions are dictated by movements of the jaw and hyoid.
Tongue behavior cannot be divorced from hyoid movement, which is directly linked to motion of the mandible. The length and angulation of the floor of the mouth on which the tongue body rides are dictated by that linkage. Movements of the tongue surface can occur independently of hyomandibular movement within a limited range of jaw motion. [This relationship can be described as one in which a ‘tapered sausage’ is attached to a mobile surface (the oral floor formed by the hyomandibular muscles) so that the ‘sausage’ can change its shape as the floor moves.]
In a trenchant review of the ‘muscles of the mandible’, Last (1954) lays out the principles of these functional relationships. Last does not call these linkages a ‘kinetic chain’, but the thrust is just that. This concept can be represented as a series of linked muscle groups acting on two mobile skeletal elements, the hyoid and mandible (Fig. 1). Their relative positions determine the length and orientation of the floor of the mouth, and so the gross vertical and antero-posterior position of the tongue body relative to the hard palate. This concept is important, because many studies on the jaw musculature in the dental and related literature focus only on the adductors (i.e., temporalis, masseter, medial pterygoid). Some refer to the digastric as the primary abductor of the mandible (see Miller, 1991). By and large, the hyoid complex is ignored. The speech literature is quite different, in that the focus is on the relative position and movement of the articulators, emphasizing the tongue surface, palate, and lips (Fig. 2; also see Folkins and Kuehn, 1982). Folkins and Kuehn advance the concept of ‘bidirectionality’, in which they recognize that movement in one part of the system affects all the others. The literature on the anatomy of the tongue (e.g. Lowe, 1981) dismisses the hyoid muscles as ‘belonging to the floor of the mouth’. It is essential to emphasize that global tongue position and so its movements in feeding and speech are directly correlated with the length and orientation (position) of the floor of the mouth (the base of the tongue body), i.e., hyoid position.
(2) The kinetic chain
The muscular linkages among the mandible/lower jaw, the hyoid, the cranial base (see Fig. 1), and sternum have the following properties:
• First, during feeding, the hyoid is in continuous motion, so that the relationship between it and the lower jaw changes constantly. Hyoid movement is linked to that of the opening and closing of the jaws (the masticatory/chewing cycle) and therefore to activity in the mandibular adductors (Figs. 1, 2, 3, 4). Hyoid motion results from change in the relative positions, and distance between, the hyoid and the mandibular symphysis, which, in turn, depends on mandibular position relative to the cranium. Analysis of the experimental data shows that the hyoid can travel upward and forward toward a slowly opening jaw, and that it can be pulled sharply backward from a jaw held in a wide gape (Hiiemae et al., 2002; also see Carlsöö, 1956; Pancherz et al., 1986). It is also clear that the geometry of the relationship between the hyoid and the mandible in man is such that their relative positions are affected by the direction and amplitude of jaw movement (Folkins and Kuehn, 1982). It has been argued (Thexton and McGarrick, 1988, 1989) that if cinefluorographic (CFG) or videofluorographic (VFG) data are examined with the lower (mandibular) occlusal plane as the reference plane, then hyoid and tongue movements confined within the tongue body can be analyzed without the distorting effect of jaw movement. This is simply not the case for man, where the jaw and tongue are relatively much shorter than in other mammals—e.g., opossum, cat, and macaque (see Hiiemae and Crompton, 1985; Hiiemae, 2000)—and the hyoid with the larynx lies below the posterior tongue rather than behind it.
• Second, the relative positions of the hyoid and mandible directly affect the length and angle of the floor of the mouth on which the tongue mass ‘rides’: Increasing the distance between the hyoid and the mandibular symphysis lowers and lengthens the floor of the mouth relative to the lower jaw, and so the relative position of the tongue body will be lower relative to the tooth rows. Conversely, raising the hyoid can shorten the floor of the mouth. Such a movement facilitates tongue-palate contact. The effect of such shortening depends on whether the movement is simply vertical or upward and forward as in swallowing (Ishida et al., 2002).
• Third, the complex biomechanics of the hyolingual apparatus have not been thoroughly studied. An issue here is hyoid movement and the correlated position of the oropharyngeal surface of the tongue. It cannot be assumed that the two exactly parallel each other. The tongue can ‘bunch’, ‘heap’, and twist, increasing its vertical dimension while shortening its postero-anterior axis.
(III) Measuring Tongue Movements
The tongue moves rapidly in both speech and feeding. Abd-el-Malek (1939, 1955) provided the first description of its movements (1955) after studying its anatomy (1939). Using naked eye observation and still camera exposures to record what he determined to be the core tongue shapes involved in feeding, he demonstrated, in a human subject, that the tongue could protrude, retrude, twist, and produce a variety of ‘intrinsic’ shape changes (see Fig. 5). He could not provide data on the rate of tongue movement or how it changed from one posture to another. Beyond such straightforward descriptions, the issues in studying the tongue hinge on the question(s) to be addressed and the length of the behavioral sample needed to examine the problem at issue, e.g., whether: (a) actual motion is to be examined during the course of complete behaviors, such as a feeding sequence or reading a test paragraph with almost all the vowels and consonants in American English (e.g., the ‘Grandfather Passage’, Darley et al., 1975; also see Hiiemae et al., 2002); or (b) whether the changes in tongue body and surface shape produced during the production of a vowel sound or a consonant-vowel (C-V) phoneme are the subject of enquiry (e.g., Perkell, 1969; Kent, 1972; Stone and Lundberg, 1996). If the objective is to examine global measures such as rigid body motion, range of motion, and repetitive patterns, then long recordings such as those needed for (a) would be the choice. If local tongue shape is to be examined in depth, then approach (b) would be optimal. Another approach is to model the tongue, but that can only follow the acquisition of data on basic anatomy (Takemoto, 2001) and tongue function, such as electromyography (EMG) of tongue and hyoid musculature or its shape derived from images and tissue-point tracking (see Stone, 1990; Wilhelms-Tricarico, 1995; Akgul et al., 1999). Whatever the approach, some reference system is essential if shape changes are to be plotted, analyzed, and compared.
The history of tongue movement studies is correlated with the focus of the pioneers in the field who used what was then ‘state-of-the-art’ technology to address their questions. The classic swallowing paper (Ardran and Kemp, 1955) addressed the movements of the tongue using the earliest cinefluorographic (CFG) systems. The earliest CFG studies in speech focused on the supralaryngeal vocal tract, and particularly tongue-palate-lip interactions in distinct isolated phonemes or syllables (Perkell, 1969; Kent, 1972). These early and quite different disciplinary foci led to a situation where there were essentially no studies comparing movements in the two behaviors using the same subjects. The only comparative study (Hiiemae et al., 2002) shows, with subjects as their own controls, how mandibular, hyoid, and gross tongue movements differ between eating and speaking (Figs, 3, 4).
There are two continuing problems with investigations of tongue motion: first, the 2D representation of 3D events when standard imaging techniques are used (see Stone, 1990); and second, the speed with which these events can be recorded relative to their actual time course. These issues are discussed below in the context of each of the major data acquisition methods currently in use.
(IV) The Moving Tongue
(1) Methods of data acquisition
Major advances in the methods available for studying the tongue ‘in action’ have occurred in the last 15 or so years with the development of digital technology coupled with sophisticated computer software for data acquisition and reduction. However, there are important differences in the range of methodologies available to the speech language community as compared with those available to oral biologists interested in feeding mechanisms when intra-oral events (such as tongue movement) are to be examined. It is impractical to use intra-oral sensors to study feeding on solid foods.
(2) Electropalatography (EPG)
Electropalatography (EPG) uses intra-oral sensors. Subjects wear an individualized thin (5 mm thick) plastic ‘base-plate’ over the hard palate anchored to the maxillary teeth. Variable numbers of sensors which respond to tongue contact are embedded in the device (typically 32, 64, or 128 [Folkins and Kuehn, 1982]) or in the EPG3 device, which had 62 sensors (Hardcastle et al., 1991). [The commercially available EPG instrument (Kay Elemetrics Palatometer, 6300 Lincoln Park, NJ, USA) has 96 sensors.] The device has limitations, since the actual movement of the tongue is not measured, only the points of contact between its surface and the hard palate.
Although attempts to use EPG to measure tongue-palate contacts in feeding on solid foods failed (Heath, personal communication; Heath et al., 1980), Jack and Gibbon (1995) successfully measured tongue-palate contacts during the consumption of milk (liquid), yogurt (thick and creamy, but semi-liquid), and ‘jelly’. Chi-Fishman and Stone (1996) argue that EPG can be successfully used to study swallowing. However, the greater value of this method in speech research, when used in conjunction with other methods, is clear from Stone and Lundberg (1996), who compared the data obtained with the results of ultrasound in a study of tongue shape relative to palate. Similarly, the ‘glossometer’ used by Flege (1988) had intra-oral sensors (2 x 3 x 6 mm) embedded in a thin (3 mm) plastic ‘pseudopalate’ (comparable with the EPG device). Each sensor assembly has an LED and paired phototransistor. During data acquisition, the LEDs are pulsed in rapid succession, sending a beam of infrared light downward in a plane perpendicular to the occlusal plane of the teeth, so that the light is reflected from the tongue surface. The method cannot be used if anything is between the tongue and the palate, thus making it unacceptable for studies of feeding.
(3) Electromagnetic articulometer (EMA)
Another intra-oral technique, the electromagnetic articulometer (EMA), designed for use in the transduction of articulatory movements during speech production, relies on the attachment of tiny transmitter coils (4 x 4 mm base with a thickness of 2.5 mm) to the tongue surface, lips, and velum (see Fig. 1 in Perkell et al., 1992). Coils of that size would rapidly detach if used during feeding on foods other than liquids, since the tongue surface twists toward the post-canine teeth in every chewing cycle (see Fig. 5). This device requires scrupulous calibration and much manipulation of the data obtained. Recently, Kaburagi and Honda (2001) have used an EMA system to obtain articulatory data to test their dynamic model of the tongue. An equivalent electromagnetic system (the Sirognathograph; see Hiiemae et al., 1996; Kazazoglu et al., 1994) accurately records jaw movement in 3D but cannot be used for the tongue, given the problem of intra-oral transducers when feeding. Other devices, such as strain gauges (Muller and Abbs, 1979), used to measure force or displacement of the lips and mandible, are viable tools for some speech research but, again, are unsuitable for feeding studies, because such methods cannot be applied to the tongue (Folkins and Kuehn, 1982).
(4) Applied diagnostic cineradiography (CFG) and videofluorography (VFG)
This technique (CFG) became available in the 1950s and was used for the earliest studies of human swallowing (e.g., Ardran and Kemp, 1955). Perkell (1969) performed an exhaustive analysis of tongue movements in a single male subject while recording 13 ‘nonsense’ utterances (each with an unstressed followed by a stressed syllable in combinations of 7 vowels and 6 consonants as well as a single short sentence). The first clinical cameras were slow (25–30 frames per sec), and they also used 35-mm film, which had to be laboriously analyzed with special equipment. Radiation exposure for human subjects soon became a concern. After an initial flurry of activity, human studies (CFG) effectively ceased, only to resume in the late 1980s, when videofluorography (VFG), which requires much lower radiation levels, became a standard radiological diagnostic tool. Hiiemae (1967, 1968; Hiiemae and Ardran, 1968) used a 35-mm CFG diagnostic machine to analyze patterns of mandibular motion in rats. That pioneering study was followed by a series with opossums and then other non-human mammals (see Hiiemae and Crompton, 1985; Hiiemae, 2000). The duration of single masticatory cycles in humans ranges from about 450 to 1000 msec, with swallowing cycles the longest. A cycle 600 msec long recorded at 30 fps would include about 18 frames of film, or 36 interlaced videofields. Chewing cycles in small mammals are much faster, i.e., on the order of 250–350 ms; 9 frames or 18 videofields are inadequate for the study of such movements. Many of the early records (see Hiiemae and Palmer, 2001) were jerky and difficult to interpret. If such rapid motion was to be investigated, recording speed had to increase. Cinefluorographic facilities for animal studies were installed, first at the Yale Peabody Museum and then at the Museum of Comparative Zoology at Harvard. Those dedicated systems, filming at 100 fps, provided the basis for a series of studies in which the complete feeding process in a wide variety of mammals, including the role of the tongue in food transport, was described (Hiiemae et al., 1978; Hiiemae and Crompton, 1985, et seq.). Those mammalian studies formed the basis for the Process Model of Feeding in humans [discussed below (Hiiemae and Palmer, 1999)].
Those early efforts highlighted the need for reproducible standardized reference points within a complex system which has all its parts in motion. The first markers for the measurement of jaw movement were simple amalgam fillings on the buccal surface of canines or molars which appeared as black dots in the films. To examine tongue surface motion in animals (opossums, cats, hyraces, and macaques), investigators used a hypodermic needle to insert small metal ‘pellets’ just under the gustatory mucosa in anesthetized animals (see Hiiemae and Crompton, 1985, for specific references). Our recent human studies (e.g., Palmer et al., 1997) have used small lead discs (4 x 0.4 mm) cemented to upper and lower teeth, and to the tongue. Similar markers have also been used by Kuehn (1976), Tomura et al.(1981), and Stone and Lele (1992); also see Gay et al.(1994). Gold pellet tongue markers were used at the Microbeam Facility at the University of Wisconsin (Hamlet, 1989; Westbury et al., 2000; Tasko et al., 2002). However, with that technique, the only images were of actual marker positions (see below).
Data reduction
To plot movements of markers over time in lateral projection radiographs (Figs. 3A, 3B), one must establish the Cartesian coordinates for each marker and then manipulate them to give its position relative to a reference plane within the orofacial complex (Fig. 2B). We have traditionally used a palatal reference with the X axis defined as the line between upper canine and molar markers (representative of the occlusal plane of both the upper post-canines and of the hard palate). This choice was dictated by the functional relationship between the tongue surface and hard palate in feeding. It works equally well for speaking (Fig. 3), since that also depends on the changing relationship between the tongue and hard-palate articulators. The ‘mandibular plane’ is defined by the line between lower canine and lower molar markers, and is perpendicular to the sagittal plane. This reference plane was used in some of the animal studies as a means of examining tongue movement ‘in isolation’ (see Thexton and McGarrick, 1998 see Thexton and McGarrick, 1999). [Details of the data reduction methods used in VFG studies on non-human mammals can be found in the references in Hiiemae and Crompton (1985) and Hiiemae (2000).]
Lateral projection motion recording
Lateral projection motion recording of the orofacial complex provides a 2D image of 3D events (Hiiemae et al., 2002). This issue has been discussed by Stone (1990). However, many of the animal studies used a conventional 16-mm cinecamera, synchronized to the fluoroscopic camera, to record the animals in frontal view to provide a measure of medio-lateral jaw motion and to identify active and balancing sides in chewing.
In practice, research with human subjects can use one of two VFG projections: The lateral projection allows movements in the vertical and horizontal planes to be measured; the postero-anterior (P-A) projection, medio-lateral and vertical movements. (It should be noted that our human subjects research review boards approved protocols [Institutional Review Board, IRB] allowing us a lifetime total of 5 min of VFG recording per normal subject.) However, the rate of data acquisition is still 30 fps. This creates a problem when VFG is being recorded with other signals, such as EMG. Each videoframe is acquired over the entire 33.33-ms period as the videocamera tracks across and down the screen. Digital data, usually acquired at minimally 500 Hz, must be manipulated to reconcile with the VFG frame period. This means that an EMG event can be identified with a specific frame but not precisely where it occurs within the frame (see Palmer et al., 1992). High-speed digital cameras are now available but are not yet used for routine diagnostic VFG testing and so are not available for experimental purposes.
(5) X-ray microbeam (XRMB)
An important data resource for tongue movement studies was created by the development of the x-ray microbeam. Invented in the early 1970s (Fujimura et al., 1973; Kiritani et al., 1977), this technology uses much lower levels of radiation than VFG. The limitation is that it images only the position of the gold pellets glued to the tongue and teeth. Additional instrumentation is needed to capture tongue surface information—for example, a sagittal ultrasound recording of the same utterance was recorded for each subject immediately after the microbeam record was obtained and matched to the pellet positions (see Stone, 1991). A large database (58 subjects) recorded by means of this instrument is now publicly available (Westbury, 1994). It has been used by Westbury et al.(2000) and Tasko et al.(2002) to examine tongue kinematics during speech and swallowing, respectively. Tasko et al. found so much variability in pellet trajectories among 12 subjects that it was remarkably difficult to develop a generalized description of tongue kinematics in liquid swallows.
(6) Ultrasonography (US)
Ultrasound (US) images soft tissue in real time (Sonies et al., 1981; Keller and Ostry, 1983; Stone et al., 1983; Stone and Shawker, 1986; Stone and Lundberg, 1996). It has several advantages over VFG: (a) There is no ionizing radiation, and (b) midline submental transducer placement minimizes masking of the tongue by the hard tissues (mandible and teeth). Recordings can be made at a 30-fps frame rate (30 Hz). Submental recordings show the changing shape of the tongue surface, although the presence of air under the anterior tongue and its lateral margins can prevent their imaging. To quantify tongue surface movements, investigators have used a ‘marker’ pellet technique (Shawker et al., 1983, 1985). A major disadvantage of US is the absence of spatial information on the relationship between the visualized tongue surface and the rest of the vocal tract. Moreover, during the rapid pharyngeal portion of the swallow, posterior tongue motion is faster than the available frame rate. No one appears to have used US for complete masticatory sequences.
Ultrasound is widely used in speech studies. It was used to ‘fill in’ the tongue profiles in the microbeam data (Stone, 1991). Combined with electro-palatography (EPG) and jaw motion recording, the interactions of the tongue, palate, and mandible have been explored in speech production (Stone and Vatikiotis-Bateson, 1995). The use of US in studies of feeding has been largely confined to the analysis of tongue movement in the liquid swallow (Shawker et al., 1983; Stone and Shawker, 1986; Chi-Fishman and Stone, 1996). Imai et al.(1995) imaged the tongue in real time in normal subjects who ate six foods of very different consistencies and were able to report on the tongue’s role in turning the food (toward the occlusal surface of the teeth on the active side; see Fig. 5), mixing it with saliva, sorting unsuitable particles (presumably too big to be swallowed), and contributing to bolus formation. They report that vertical motion of the tongue had two phases: sorting and bolus formation.
Stone and Lundberg (1996) generated elegant 3D models of tongue surface configuration for a substantial range of vowels and consonants (Fig. 6). They found that four classes of tongue shape were sufficient to account for and categorize all the sounds they imaged. The single female subject was asked to produce vowel and consonant sounds and sustain them for 15 sec to encompass the 10-second recording time needed. Although not a normal behavioral pattern, it was necessary if good experimental records were to be obtained. To develop the 3D images/models, the investigators reconstructed the data using essential parameters from the recording system and sophisticated software.
It is clear that new and very sophisticated ultrasound technology can generate the data to produce 3D models of the tongue surface (Stone and Lundberg, 1996). Equally, if the research objective is to understand tongue behavior in feeding and speech as a basis for developing either better clinical diagnostic tests or treatment approaches, this technique needs simplification.
(7) Magnetic resonance imaging (MRI)
MRI is a newer method for examining soft tissues for diagnostic and research purposes (see Lufkin et al., 1986). Readers are referred to the papers cited below for the details of the methods (signal generation, signal acquisition, and data reduction) used in each specific study. MRI has serious limitations as a research tool for studies of speech (phoneme production) or deglutition. First, the subject is supine, a particular problem for studies of feeding. Second, MRI data acquisition is slow when compared with the duration of normal feeding and speaking events, and especially with the pharyngeal transit time for a liquid bolus. The ‘rate of data acquisition problem’ can be ameliorated for short speech productions: The subject is asked to repeat the utterance several times, and images are obtained with the use of a timed trigger at various stages of the utterance (gated data acquisition). The data are then pooled to reconstruct the tongue shape for that utterance. The best example of this is reported by Stone et al.(2001a), whose single subject was asked to repeat each of 6 consonant-vowel (C-V) combination syllables 96 times in succession to allow for 32 repetitions for each of three MRI slices. This subject’s heroic effort did provide the basis for an evaluation of the method for the delineation of tongue surface shapes. However, the authors report that the study clearly demonstrated the potential problems with this method in any clinical context. The number of repetitions needed per slice continues to decrease with improved MRI methods. Stone et al.(2001b) report data using 13 and 4 repetitions for each slice. However, the biggest single problem remains the mandatory supine position.
Gilbert et al.(1998) used echoplanar MRI to examine tongue behavior (lingual tissue deformation) in swallowing by supine subjects who took 5 mL of water into their mouths through a plastic tube, swallowing the whole volume on command. Their results confirmed what is known from previous VFG and US studies (which had subjects seated upright). The MRI study, however, did produce time-varying geometric maps of the subsurface lingual tissue. Dry (saliva) swallows were examined by Napadow et al.(1999) to obtain data on the intra-lingual deformation of the tongue using eight normal human subjects. They developed a model for intra-lingual strain during these swallows. However, the somewhat ‘global’ areas for strain in their figures appear to have little correlation with the known anatomy and intrinsic structure of the human tongue. Rather, they provide the basis for a novel approach to testing the ‘muscular hydrostat’ model (Kier and Smith, 1985).
(8) Summary
VFG remains the ‘gold standard’ for the study of orofacial and pharyngeal behaviors in feeding. The authors are the only investigators to have used it for an extended speech passage (Hiiemae et al., 2002). That initial exploratory experiment could usefully be repeated with a design to allow for the ‘dissection’ of jaw movement in the context of the phonemes produced. For speech [and feeding] studies, VFG has the disadvantage that the 2D image ‘collapses’ valuable 3D data. It is clear that the other methodologies (US and MRI) can offer both the speech language community and the oral biology community methods by which the former, in particular, may be able to investigate appropriate and narrowly defined questions. For feeding studies, the current MRI data provide a demonstration of possibilities rather than any novel insights.
(V) Tongue Movements in Feeding and Speech
In addressing tongue ‘movement’, it is essential that one distinguish between gross change in the tongue’s center of mass (gross position in space) and local changes in surface shape regardless of that gross position. In feeding, the gross position and shape of the tongue relative to the palate change with jaw movement. In speaking, changes in tongue shape occur with relatively little jaw movement and so little change in the gross spatial position of the tongue. The jaw and tongue, therefore, have more independence during speech. While much is known about movement direction and associated tongue shape in both feeding and speaking, tongue kinematic data are limited. Westbury et al.(2000) review x-ray microBeam (XRMB) data and discuss the issues in kinematic event pattern analysis. Peng et al.(2000), using a ‘cushion-scanning’ US technique for echocardiography, obtained the speed of tongue surface movement in the five stages of the liquid swallow they recognized: Mean values for all phases were 10.34 mm/sec, SD 2.10, with a range of 2.10 (minimum) to 32.43 (maximum), N = 165. They report the calculated single fastest speed as 305.67 mm/sec in the first phase of transport. It is not clear whether Peng et al. used a standard set of spatial (rather than temporal) references to obtain their speed data. Using XRMB archive records, Tasko et al.(2002) obtained maximum speeds of 200 mm/sec for the trajectory of the tongue pellets during swallowing. These two datasets, given the differences in the methods used, are not inconsistent. Clearly, more work on tongue kinematics is urgently needed.
When the spatial domains used by the jaw, hyoid, and tongue markers in feeding and speech are compared (Fig. 4), there are clear differences. All markers show larger ranges of motion in feeding than in speech, at least in the sagittal plane. Tongue-palate contact is also less in speech. Our lateral projection images may mask movements of the lateral margins of the tongue, because of tooth radiopacity. These lateral tongue-palate contacts are important for certain phonemes (Stone and Lundberg, 1996). Gibbs and Messerman (1972) assert that the amplitude of jaw movement in speech is much smaller than in feeding, and this is confirmed by our data. When the centroid positions of the jaw in feeding and speech were compared, the difference was quite small (average, 1.2 mm). The centroid positions of anterior and posterior tongue markers also differed by only 1.1 and 0.8 mm, respectively. For the hyoid bone, however, the centroid position for speech was 10.2 mm antero-inferior to that for feeding (see Fig. 4). Analysis of the data presented in Hiiemae et al.(2002) shows that jaw and tongue marker movements in speech occur within the sagittal domains used for feeding, but that hyoid domains are significantly different. The data shown in Fig. 4 ‘collapse’ temporal data from long sequences (more than 30 sec) to give the spatial domains (centroids). Centroid analysis is limited in that it omits consideration of the time domain. Future studies (planned and under way) will address the dynamics of the system.
The Process Model of Feeding (Palmer et al., 1997; Hiiemae and Palmer, 1999) describes four main sequential stages: Stage I transport, in which ingested food is moved from the incisal area to the post-canine teeth (premolars, molars) for processing; Processing, in which the food is reduced; Stage II transport, in which triturated food is moved through the fauces for bolus formation; and, last, bolus formation and deglutition. Specific jaw and tongue movements are associated with each stage.
(1) Stage I transport
Our experiments use pre-cut, standard weights/volumes of the test foods. Subjects ‘deposit’ the food onto their tongues by using their fingers or by pulling the food off a cocktail stick with their anterior teeth. At the time of ingestion, the jaws are maximally open and the lips apart. As soon as the food is deposited, the ‘bite’ is cradled on the anterior-middle tongue surface, and the posterior oral tongue is ‘heaped’. The tongue surface is rapidly depressed to the level of the mandibular occlusal plane as the hyoid and tongue body are pulled sharply backward and somewhat downward. This hyoid movement has two results: First, the oropharynx is almost closed (at least in lateral projection); and second, the bite is carried bodily backward on the retracting tongue. As the jaws start closing, the tongue starts to rise. The bite, pulled back to the level of the last molars, is carried forward and upward toward the first upper molars as the jaws approach minimum gape. There is usually a further lower-amplitude open-close movement before the bite is finally positioned on the mandibular occlusal plane of the presumptive active side by a twisting tongue movement (Fig. 5). We are describing the retraction of the tongue-hyoid-jaw complex in this behavior as ‘pull-back’.
(2) Processing
Tongue movements occur in both the sagittal and coronal planes. In the sagittal plane, the hyoid—and with it, the tongue surface—‘cycles’ (Palmer et al., 1997). As shown in Fig. 7, the anterior tongue marker orbits so that it moves from a downward position at maximum gape, upward and backward as the mandible moves up in the closing stroke. The tongue marker reaches its most backward position during closing, continuing to rise to reach its most palatal position just after the teeth reach occlusion. In the macaque, this upward movement was suspended for a few moments as the teeth reached occlusion at the end of the power stroke (Hiiemae et al., 1995). [Informally, we hypothesized that this pause explained the rarity of tongue biting during feeding!] During the intercuspal phase and as the jaws start to open, the tongue continues to cycle forward and then downward. As shown in Fig. 7, the orbit of the tongue surface cycle rises as processing proceeds, bringing the tongue surface progressively closer to the palate, culminating in palatal contact in the swallow. This cycling has the effect of moving chewed food progressively anteriorly. Intermittently, the tongue tip is elevated and used to collect this food from the anterior surface of the hard palate; as the jaws begin to separate, that bolus is then returned to the molar region, often by the pull back mechanism, as the jaw reaches the following maximum gape. Sagittal tongue cycling is found in all mammals studied with VFG. The amplitude of the vertical component is greatest in man, but the pattern is common across all mammalian groups studied (Hiiemae and Palmer, 2001).
Tongue movements in the coronal plane are important in processing. The tongue can twist about its antero-posterior axis to turn its gustatory surface toward one or the other post-canine tooth row (see Fig. 5). At the end of Stage I, bites of meat or cookie are essentially ‘tossed’ onto the occlusal surface of the mandibular post-canines. As processing proceeds, the tongue continues to maintain inadequately triturated food on the occlusal table, repositioning it during late opening and early closing. The interactive relationship between tongue and cheeks had not been documented until recently. Mioche et al.(2002) showed that as the tongue pressures food laterally to maintain it on the occlusal table, it also pushes it progressively into the cheek. About every 3 tongue-jaw cycles, the buccinator (the muscle of the cheek) contracts, pushing the food back toward the midline. Food is moved across the midline by a reverse longitudinal rotation, carrying material to the erstwhile balancing side. This reverse tongue rotation (away from the teeth on the active side) occurs during jaw opening (Mioche et al., 2002).
The relatively tight linkage between jaw-hyoid and tongue movements seen in processing often loosens after the first swallow in the sequence. The amplitude of jaw movement decreases and becomes irregular. At the same time, the tongue twists and turns. This period of clearance is used for the tongue to clear fragments of food from the vestibules of the cheeks and the floor of the mouth. Often one or more boli are formed during clearance, or a second processing sequence, and are then swallowed. Multiple swallows are normal in feeding sequences, particularly with harder or fibrous foods. Each sequence ends with a terminal swallow.
(3) Stage II transport
Stage II transport is defined as the movement of material through the pillars of the fauces or the ‘posterior oral seal’ (Dua et al., 1997). This movement marks the start of the liquid swallow and the beginning of bolus formation in the oropharynx (Hiiemae and Palmer, 1999). The mechanism is simple: The tongue rises with the tip and anterior surface, coming into contact with the anterior hard palate. This contact then spreads posteriorly, ‘squeezing’ the food distally behind the contact (as in finger compression on a toothpaste tube). Note that the tongue itself does not move backward; rather, points on the tongue sequentially move upward to come into contact with the palate. This mechanism is called ‘squeeze back’ and was first described in the opossum as ‘squeeze-wedge’ (see Hiiemae and Crompton, 1985). There is one very important difference between Stage II in non-human mammals and in man. In the former, the incipient bolus passes through the fauces during the late (fast) opening and early (fast) closing phases of the jaw movement cycle, whereas in man it occurs in early opening. This subtle difference may affect interpretations of neurophysiological data on swallowing control mechanisms.
(4) Bolus formation and deglutition
The liquid swallow is the most intensively studied feeding behavior. The typical paradigm calls for a subject to hold a bolus of liquid in the mouth and swallow on command (Dodds et al., 1990). In this context, most subjects will form a bolus between the surface of the tongue and the palate, but some will hold the liquid at the floor of the mouth (respectively, the so-called ‘tipper’- and ‘dipper’-type swallows). In the ‘swallow-ready’ position, the tongue perimeter forms a seal around the bolus anteriorly and laterally on each side. A posterior seal formed between the surface of the tongue and the palate at the junction of the hard and soft palates prevents premature passage of liquid into the pharynx. The tongue accommodates larger boluses by forming a deeper cavity (Kahrilas et al., 1993). When the command to swallow is given, the anterior area of tongue-palate contact expands posteriorly, squeezing the bolus toward the pharynx, and the back of the tongue drops, eliminating the posterior oral seal. These motions comprise the oral stage of swallowing. Note that the tongue motion is nearly identical to the ‘squeeze-back’ mechanism of Stage II transport (Palmer et al., 1992).
Once the bolus passes into the pharynx, the pharyngeal stage of swallowing is immediately initiated (Dodds et al., 1990). The larynx folds shut, and the velopharyngeal isthmus closes. The pharyngeal surface of the tongue pushes posteriorly (so-called tongue base retraction), making contact with the contracting pharyngeal walls. This action pushes the bolus through the pharynx and the upper esophageal sphincter, which opens actively at the onset of the pharyngeal stage. Bolus propulsion is assisted by elevation of the pharynx and larynx as well as by sequential (cephalo-caudal) contraction of the pharyngeal constrictor muscles. Bolus propulsion by the tongue is most effective with large bolus volumes, but the pharyngeal constrictors have a larger role for small volumes (Kahrilas et al., 1993).
The swallowing of semi-solid and chewed solid foods is quite different (Palmer et al., 1992; Palmer, 1998; Hiiemae and Palmer, 1999). As discussed above, triturated food is pushed/propelled into the pharynx by the tongue during Stage II transport cycles. A bolus accumulates in the oropharynx during multiple transport cycles (oropharyngeal aggregation time, which may last up to about 10 or 12 sec in healthy individuals). When the swallow is finally triggered, the pattern is very much like that described for liquids: The tongue surface sweeps remaining food from the oral cavity into the pharynx (squeeze-back), and the pharyngeal surface of the tongue pushes backward to propel food through the pharynx (tongue base retraction).
Chi-Fishman and Sonies (2000) studied rapid sequential swallowing of liquid. They report drink and swallow cycles with repeated sequences of tongue propulsion. Some of their subjects merged two successive boluses in the hypopharynx before the onset of a pharyngeal response, while holding the larynx closed continuously to prevent aspiration. These sequential swallows of liquid resembled swallows of triturated solid food, in that the bolus was formed in the pharynx before swallow onset.
(5) Tongue shapes in feeding
The drawings included in Fig. 5 illustrate the appearance of the tongue at various stages in feeding (Abd-el-Malek, 1955). The depression of the anterior surface and the heaped posterior surface of the tongue we have recorded in Stage I transport are shown. Similarly, the lower pair of drawings shows the twisting movement used to place and then maintain food on the occlusal plane. These shapes represent the changes in gross tongue-surface morphology seen in the lateral and antero-posterior VFG recordings. They also show two important features of tongue movement in feeding: First, the lateral margins of the tongue can move independently of the mid-body; second, the anterior and middle segments can move independently to produce anterior hollowing synchronously with posterior ‘heaping’. What Abd-el-Malek was unable to do was measure dimensional changes within the tongue. Expansion and contraction of the tongue surface, measured by changes in the relative positions of tongue markers in protrusion and retraction, have been reported in the macaque (Hiiemae et al.,1995). There is agreement in the human literature (see below) that such differential segmental behavior occurs in man (Stone, personal communication).
(VI) Tongue Movements in Speaking
In direct contrast to the paucity of sources/descriptions of tongue shapes in feeding, there is a voluminous and increasingly technically sophisticated literature on the shapes adopted by the tongue in the production of vowels and consonant-vowel (C-V) combinations. The converse is also true: We have not been able to find any reports of gross tongue movements in speaking which relate to hyoid position. These reciprocal data deficits reflect the different foci of the feeding and speaking studies referred to above.
Hiiemae et al.(2002) did not specifically address the movements of the tongue surface in speaking, but the relatively compact spatial domains in Fig. 4B show movement within a more restricted space than for feeding (Fig. 4A). Since we can find no evidence of significant medio-lateral hyoid movement in man or, for that matter, in other mammals (Anapol, 1988), it is probably reasonable to assume that the ‘speaking domains’ in Fig. 4B represent the actual sagittal range of tongue and hyoid markers, and of jaw movement. It must be noted that: (a) the teeth do not come into full occlusion during the reading of the ‘Grandfather Passage’, as evidenced by the spatial domains for the jaw marker; and (b) the anterior tongue marker makes almost no palatal contact except at the anterior margin of its range of movement.
The absence of published descriptions of global tongue movement in speaking is explained when the wide anatomical range of positions of the oral articulators in speech is considered (see Fig. 5-1 in Daniloff, 1973). Point-tracking techniques (EMA, XRMB) provide data on movements of the jaws, lips, and tongue. We are looking at a functional complex where the events of interest are both transitory and localized within the larger oral cavity and the oropharynx. It is therefore not at all surprising that the focus has been on tongue shape in phoneme production rather than on synchronous tongue, hyoid, and jaw movement patterns.
However, while not focused on the tongue movements in speaking which made the articulator interaction he was analyzing possible, some of Perkell’s (1969) figures (especially his 3.2- 3.4 and 3.15) imply a movement trajectory. His Fig. 3.15 is particularly interesting, since it shows an orbital movement of a posterior tongue marker when the subject uttered the /hák¤/ sounds. A following paper (Perkell et al., 1992) examined the velocity and acceleration of the lips in persons uttering a range of consonants and vowels in combinations.
(VII) Tongue Shapes in Speaking/Vocalization
It is beyond the scope of this review, and especially the authors’ limited expertise in the subtleties of speech production, to attempt to address possible mechanisms whereby one phonetic tongue configuration changes into another during speech—i.e., with co-articulation of V-C or C-V combinations. The difficulties in determining the mechanisms of tongue-shape change are implicitly discussed in Stone et al. (1988 et seq.). She makes a particularly important point (1990), namely, that the cross-sectional profile of the ‘upper tongue’ (its upper surface) is important, since it can vary from anterior to posterior, and that ultrasound or other cross-sectional images will not accurately reflect tongue shape unless obtained seriatim from as far forward to as far back as possible.
Rather, we are focusing on those shapes given the hypothesis in the Introduction which suggests that tongue shapes in speech are consistent with, if not derived from, those seen in feeding. Using a novel 3-D ultrasound machine coupled with EPG, Stone and Lundberg (1996) were able to reconstruct the tongue surface in three dimensions when their subject was sounding 12 vowels and 6 consonants in American English. The EPG data complemented the ultrasound images by recording the extent and placement of tongue-palate contacts. This was important because analysis of the data showed the lateral margin contacts between the tongue and the palatal gingiva of the maxillary dental arcade. As Stone reported (1990), some tongue positions are stabilized by palatal contacts. After reconstructing the tongue-surface shapes for all 18 sounds, Stone and Lundberg concluded that the shapes could be grouped into four categories, one of which (‘two-point displacement’) was seen only with the consonant ‘ell’; the other three were seen with both vowels and consonants. In their first category, ‘front-raising’, the anterior and middle tongue are raised, with the formation of a midline groove posteriorly extending into the oropharyngeal surface (Fig. 6). A complete midline groove with elevated lateral tongue margins characterizes their second category. This shape was associated with ‘low vowels’. Shape three is described as ‘back-raising’ and is essentially the reciprocal of the first: The posterior and middle tongue are elevated, often with the appearance of an anterior groove or dimple. In ‘two-point displacement’, the tongue has an elevated anterior and posterior segment with a small central groove in the middle segment. Stone and Lundberg (1996) make a convincing case for the ‘muscular hydrostat’ approach to tongue shape, pointing out that the local expansions and contractions reflect the redistribution of tongue substance to form the shapes they identify.
Are there similarities between tongue shapes in feeding and those in sound production? Clearly, there are (compare Figs. 5 and 6). Combinations of ‘heaping’ and ‘hollowing’ occur in both behaviors. Interestingly, Stone and Lundberg’s front-raising shape is highly reminiscent of the tongue shape in ‘squeeze-back’, where the front of the tongue is raised and the posterior and oropharyngeal surface is grooved. The only tongue shape/movement not seen in speech is the twisting movement used to control food position and to retrieve food fragments from the vestibules and floor of the mouth in clearance (Fig. 5). However, the deformations of the tongue surface used in speech are more complicated than those in swallowing; it is the movements in food processing (including clearance) that show the full range of possible tongue shapes.
(VIII) The Hyolingual Musculature
Jaw motion is intricately related to hyoid and tongue motion. Jaw closing depends on the adductors (temporalis, masseter, and pterygoids). These muscles and their roles in positioning the jaw in humans are reviewed by Miller (1991), and in a variety of non-human mammals by Herring (1994) and Langenbach and van Eijden (2001). The activity in the muscles associated with functional behaviors is usually recorded electromyographically (EMG), and the data are used to interpret patterns of activity producing complex movement events [see Crompton et al.(1977), for an example of EMG with CFG recorded jaw and hyoid movement]. That study used surgically inserted fine-wire electrodes to obtain the EMGs of most of the hyolingual muscles in the opossum and correlated their activity with jaw and hyoid movements. Establishing an equivalent database for humans is a wholly different proposition. Limited numbers of fine-wire electrodes (Basmajian and Stecko, 1962) can be used. Even in the opossum study, no attempt was made to record from within the body of the tongue, since it would be impossible to determine electrode position relative to the intrinsic muscles. The conventional descriptions of extrinsic and intrinsic tongue muscles make the assumption that the ‘extrinsic’ muscles move the tongue in ‘orofacial space’ while the intrinsic muscles change its surface shape. That approach is no longer tenable. Stone and Lundberg (1996) argue that (a) the genioglossus is a midline muscle which can influence tongue shape, e.g., create midline grooving (and protract the tongue body); and that (b) the hyoglossus (HG), palatoglossus (PG), and styloglossus (SG) act to lower or raise the lateral margins of the tongue (and retract the tongue body). This work suggests that the distinction between ‘extrinsic’ and ‘intrinsic’ muscles may be artificial, a position held by many working with the tongues of non-human mammals and implicit in Takemoto (2001).
Recently, the functional anatomy of the hyolingual musculature, especially the intrinsic musculature of the tongue, has attracted substantial attention (DePaul and Abbs, 1996; Sutlive et al.,1999, 2000; Sokoloff, 2000), as has the interleaving of extrinsic and extrinsic muscle fibers (Takemoto, 2001). DePaul and Abbs (1996) examined the intrinsic tongue muscles of macaques. They performed a very thorough anatomical/histochemical analysis of the muscle fibers using carefully prepared blocks of tongue muscle tissue and sampling all regions from the tip to the oropharyngeal surface. The most important of their findings is the observation that the fiber type population changes within the tongue from tip to most posterior and from upper surface (superior longitudinalis, SL) to inferior (inferior longitudinalis, IL). These authors argue that their results indicate a functional intramuscular segregation within the tongue. Takemoto (2001) reviews the previous literature and, working on human tongues, has developed a model for the intrinsic structure of the tongue based on dissection and histology. He identifies 5 layers (Fig. 8). The bulk of the muscles in the tongue body—i.e., fibers of the transversus (T), genioglossus (GG), and verticalis (V)—are identified as the core. Regions of the hyoglossus, styloglossus, and palatoglossus are identified as ‘fringes’ external to the main tongue body but with their fibers extending into it. This study is a major achievement. Takemoto justified the effort because “an understanding of the complex organization of the human tongue musculature is a critical requirement for modeling the speech production mechanism.”
A painstaking examination of the neuroanatomy of the tongue in dogs identified neuromuscular compartments within the intrinsic musculature. Mu and Sanders (1999) describe the superior longitudinal muscles as having an average of 40 distinct fascicles spanning the length of the tongue, with each fascicle supplied by a nerve branch from the hypoglossal nerve (XII). The inferior longitudinal was similarly organized. Each transverse and vertical muscle had over 140 separate sheets, with every sheet innervated by a separate terminal nerve from XII. The thin layers of transversus and verticalis were oriented in a precisely alternating sequence mutually at right angles.
The concept of functional segmentation within the tongue was proposed by Stone (1990) and advanced by others (e.g., Mao et al., 1992; Mu and Sanders, 1999). Stone (1991) argues that the tongue can be divided into 5 functional segments in both the coronal and the antero-posterior planes, giving a total of 25 segments. It is important to realize that analysis of current data supports the view that the medio-lateral segments are probably ‘tied’ into the antero-posterior segments such that elevation of the lateral tongue margins will be associated with midline tongue-surface depression, and that elevation of the front of the tongue has to be compensated for by some change in the configuration of its posterior parts. These segments may represent ‘compartments’, but it seems likely that there are smaller volumes within such segments which produce finely graded movements. If the segments were formed from ‘zones’ including the intrinsic muscles, as well as, perhaps, parts of the extrinsic muscles, then such an arrangement could explain the rapidly altering morphology of the tongue surface in speech (Stone, 1990; Stone and Lundberg, 1996) as well as the patterns of intrinsic expansion and contraction seen in feeding in the mid-tongue of macaques (Hiiemae et al., 1995) and the undulation described in the rabbit tongue by Cortopassi and Muhl (1990). Unfortunately, modeling activity in the hypoglossal nerve (XII) and brainstem centers to address this issue will be difficult, given the complexity of the intra-lingual distribution of the motor nerve.
(IX) Modeling the Tongue
Kier and Smith (1985) argued that the mammalian tongue is a muscular hydrostat. Although this idea remains hypothetical, there is widespread agreement in the literature that the human tongue meets that definition (see above). Muscular hydrostats have the following properties: (a) they are incompressible, since they have high water content; (b) they have essentially constant volume, so that change in any one dimension has to be compensated for by change in others; and (c) such changes can occur within regions of the organ, i.e., shape changes can be localized. Efforts to develop a ‘testable’ model of the tongue have been driven by the speech/language community. We are not aware of any such effort among oral biologists. The reasons may be both historical and clinical. Oral biology research now focuses on the CNS control of masticatory behaviors and the activity of the jaw muscles, with very little attention to the tongue per se (see Lund, 1991) or its intra-oral behavior (Sawczuk and Mosier, 2001). The clinical focus has been on the swallowing of liquids. It took the first study on the processing of solid foods (Palmer et al., 1992) to draw attention to the fact that the liquid swallow paradigm cannot be extrapolated to the human intra-oral management and deglutition of foods that do not flow and have to be chewed. Our findings in the study of hyoid and tongue-surface movements in speaking and feeding (Hiiemae et al., 2002) are consistent with the muscular hydrostat model. Although tongue-surface motion is correlated with that of the hyoid and jaw, there is also a high degree of independence, particularly during speaking (when jaw motion is relatively small in amplitude). On the other hand, our studies of food transport mechanisms show limitations of the model. The ‘pullback’ mechanism of stage I transport is an excellent example: It is accomplished by posterior motion of the entire tongue, including the hyoid bone. Since the food is sitting on the tongue surface, it is pulled from the front of the mouth to the molar region. This functionally important change in position of the tongue surface is accomplished with no significant change in tongue shape, only a shift in tongue position. Thus, the muscular hydrostat model, while it has obvious importance for the control of tongue shape, does not provide a complete picture of tongue motion.
Modeling of the orofacial complex in feeding has approached the system functionally: i.e., what are the stages in the process and what appears to regulate progression through them? In contrast, models of the orofacial-oropharyngeal complex in speech have been directed at the changes in tongue shape required to produce vowels and consonant-vowel (C-V) combinations. It must be said that the tongue shapes shown in Fig. 6 are not strictly models; they are rather data derived ‘images’ based on a two- to three-dimensional image conversion by means of specially developed software and so accurately reflect the shape of the tongue surface under the prescribed conditions (Stone and Lundberg, 1996).
One goal of those modeling the tongue, sensu strictu, is to develop a database of tongue behavior and underlying structure sufficiently detailed to allow for the development of a 3D computer model which can be manipulated to ‘produce sounds’ and, we have to assume, realistic electronic ‘speech’. As engineering and computer technology has developed, so have the approaches available to ‘tongue modelers’.
Another goal is the prediction of tongue-surface behavior from muscle contraction. Based on an earlier Finite Element Method (FEM) model (Kiritani et al., 1977), Kakita et al.(1985) ‘mapped’ patterns of known EMG activity in the muscles (using data from Alfonso et al., 1982) onto their FEM model to predict the patterns producing formant patterns in vowel space. Their static model used 86 tetrahedra to represent half the tongue body, assuming symmetry about the mid-sagittal plane. The tetrahedra were grouped into 30 functional units, giving 33 functional node points on which the effects of muscle contraction could be modeled. Importantly, they extended the model to factor in the extrinsic lingual muscles and, albeit in a preliminary fashion, the supra- and infrahyoid muscles. Based on reports of differential EMG behavior within the genioglossus (GG), depending on the vowel sounded, they divided that muscle into three parts: anterior, middle, and posterior (GGA, GGM, and GGP). The styloglossus (SG) was also divided into three elements based on the results of their modeling efforts and the anatomy of the muscle fibers [the authors cite Miyawaki (1974) as the source of the anatomical data they used]. Component 1 (their terminology) acts on the lateral surface of the tongue body, pulling it back toward the styloid process; Component 2 pulls the tip of the anterior part of the tongue backward and slightly downward; Component 3 pulls the middle tongue body upward. [This ‘bunching’ action of Component 3 was described by Ladefoged et al.(1978).] Hyoglossus, palatoglossus, and pharyngeal constrictors were included in the model. Its parameters were then compared with measured EMG activity. The authors obtained a good correspondence between the model and the EMG data for the extrinsic muscles. However, the model simulation indicated that other muscles, particularly the pharyngeal constrictors, had to contribute to the shape of the oropharyngeal airspace for the production of some vowels.
Another goal is the prediction of tongue-surface behavior from muscle contraction. Perkell (1994) developed a model using a mosaic of 14 quadrilateral areas bounded by a network of tension-generating elements representing the tongue musculature and connective tissue. He varied lingual shape by specifying the degree of excitation in the intrinsic and extrinsic muscles. The beginnings of a very sophisticated computational and biomechanical engineering approach to the modeling of the tongue, again for speech production, is reported by Wilhelms-Tricarico (1995). The author states that this was the first stage in an effort to create a physiological model of speech production factoring in all the elements involved in vocalization. The Wilhelms-Tricarico paper addresses the problem of the soft tissues, particularly the tongue and lips. The approach “provides a foundation for applying finite element methods to simulate these structures in a biomechanical model of speech production”. The method is based on the anatomy of the muscles, specified as ‘fields’ within which the direction of active and passive tensile stress is modeled. The elastic elements of the passive components are also modeled with the use of a strain energy function and the viscous stress components with a linear viscosity. The incompressibility of the tissues was also considered. This paper is technically very complex, but its conclusion is not: The simulation tests, with 8 muscles used, demonstrate the validity of the method and support the feasibility (the development) of a physiologically based model of speech production.
Despite this promising start, we still do not have a satisfactory ‘testable’ model of the human tongue and movements in vowel and consonant production. Great efforts are being made to make a biomechanical/computational model that explains the movements and morphology involved in phoneme production (e.g., Kakita et al., 1985; Wilhelms-Tricarico, 1995; Sanguineti et al., 1997), but the complexity of the biomechanics and dynamics is enormous.
Another model (Kaburagi and Honda, 2001) has been proposed, based on ‘articulatory postures’ but also dependent on complex mathematical modeling of phonemic tasks. The authors define phonemic postures using invariant features of articulatory posture. Statistically derived measures of articulatory movements with least variability are taken as ‘invariant features’. They found that typical examples of low variability occurred in articulatory movements involving vocal tract constrictions or relative movement among articulators, reflecting task-sharing structures. Articulatory movements were partly constrained by the sequence of phenomic tasks but were determined to satisfy both the constraints of specific phonemic tasks and the requirement for smoothness in the model. The authors tested the model as a predictor of actual articulatory movements using empirical data. They conclude that their simulation shows that the ‘task representation method’ they propose has major advantages: (i) the phonemic invariant feature is defined statistically, and (ii) the prediction accuracy of the dynamic model can be improved by use of the invariant feature task representation. They suggest that their model could serve as a basic framework for acoustic evaluation of the dynamic articulatory model in the context of speech synthesis.
Approaching the predictive modeling problem from a somewhat different perspective (a control model for speech), Sanguineti et al.(1997) used FEM to develop a biomechanical model in the framework of the equilibrium point hypothesis (gamma model) of motor control. They applied the model to the ‘estimation’ of the “central control commands” issued to the muscles, given a dataset of sagittal digitized tracings of vocal tract shape recorded by low-intensity CFG during speech. The authors determined that “despite the great mobility of the tongue and the highly complex arrangement of the tongue muscles, its movements can be explained in terms of the activation of a small number of independent muscle groups, each corresponding to an ‘elementary’ or ‘primitive’ movement”. The authors argue that their results are consistent with the hypothesis that the tongue is controlled by a small number of independent articulators. They also report that they evaluated the effects of jaw and hyoid articulators, and that there may be substantial interaction between them and the tongue. They conclude that the central nervous system may not need a detailed representation of tongue mechanics but may rely on a small number of muscle synergies invariant over the whole space of tongue configurations.
In summary, mathematical/engineering models of the tongue increase in sophistication and are used to represent empirical tongue physiology as well as predict changes in shape from muscle activity, tongue-surface geometry, or acoustic spectra. The most recent use an FEM construct, factoring in the properties of the tissues and their anatomical arrangement. To date, the models have focused exclusively on tongue movement/shape change in the production of speech sounds.
(X) Directions for Future Research
The goal of this review was to correlate and evaluate current knowledge, and identify questions that need to be addressed. The oral biology of feeding is rapidly becoming better understood. We have good data on the basic processes of eating and drinking in young adult subjects imbibing liquids and size-controlled samples of a range of solid foods. While there are data (e.g., Kohyama et al., 2002) showing an increase in cycle and sequence duration in older subjects eating solids, we need a thorough examination of both the feeding and speaking process in older subjects. In addition, there is a series of unresolved questions:
(1) We hypothesize that the hyolingual complex is part of a kinetic chain that also involves the jaws. We have examined movement domains, but further studies are necessary to address higher-order kinematics, i.e., velocities and accelerations as well as cross-correlations among jaw, hyoid, and tongue motions, and segmental shortening and lengthening within the tongue. These studies will permit us to determine whether tongue motions in feeding are consistent with the models of tongue function developed in studies of speech.
(2) Since the cyclic movements of the hyoid are a function of a kinetic chain, then EMG of all accessible muscles will allow the system to be modeled in vivo. The continuing questions about the contributions of specific muscles—e.g., the geniohyoid, genioglossus, and anterior belly of the digastric—to jaw opening and tongue protrusion should be resolved. Similarly, the frequency and circumstances in which jaw adductors and depressors are co-activated can be determined.
(3) Jaw movements in feeding are regulated by a Central Pattern Generator (CPG) whose output is modified by sensory input from the oropharyngeal complex (Dellow and Lund, 1971; Lund, 1991). The question still to be addressed is whether tongue cycling is regulated by the same CPG. Equally, the cyclicities associated with speech show attributes that could argue in favor of an hypothesis which proposes that the movements of speech are a subset of those used in feeding.
(4) Models of tongue motion need further study and elaboration. The muscular hydrostat model is of particular interest and importance because of its elegance and simplicity as well as its face validity. Future studies should test its underlying assumptions, such as the isovolumic, incompressible nature of the tongue. Future modeling studies should also incorporate the physical links of the tongue to the jaw and hyoid, which have been neglected in the past.
(5) Developmental studies should carefully evaluate the milestones in acquisition of both feeding and speech skills in young children. These studies cannot be performed with current radiological techniques because of the uncertain risks of radiation in children. Ultrasound and MRI technologies may be more suitable methods of inquiry, particularly if the speed of data acquisition in cine-MRI becomes sufficiently rapid.
The functional linkages among jaw, hyoid, and tongue movements in feeding and speech. Movement of one element affects that of most others. For simplicity, the Fig. does not include the cheeks and lips, the former having an important role in food management in feeding, and the latter being important articulators in speech. Diagrammatic sagittal sections of the oropharyngeal complex. Time/position records for mandible and hyoid over 10 sec of normal feeding Sagittal domain plots for jaw, hyoid, and tongue markers (anterior and posterior) for the same subject: Tongue shapes in feeding as presented in Abd El Malek (1955). These drawings were based on still photographic images taken from a single subject. The Examples of tongue shapes developed in the sounding of vowels and consonants. These images are 3D reconstructions from ultrasound slices (see text). The upper left-hand image is a consonant, the upper right a ‘front vowel’. The lower left image shows the shape for another front vowel, and the lower right a ‘back vowel’. The reconstructions are reproduced with permission from Dr. Maureen Stone and were originally published in Stone and Lundberg (1996). The movement of the anterior tongue marker (see Fig. 2A) relative to the palate during complete sequences of feeding on soft food (chicken spread, left) and hard food (cookie, right). The tongue surface cycles so that it is traveling upward and forward as the teeth come into full occlusion, then forward and downward in the last stages of the intercuspal phase (as the teeth come out of occlusion) and in the first part of opening. This Fig. shows the progressive palatal (upward) ‘migration’ of the tongue surface cycle as the feeding sequence proceeds. The pattern of that migration differs between hard and soft foods. From Palmer et al.(1997), with permission from Pergamon Press. A model of the intrinsic structure of the tongue based on Takemoto (2001). 







Footnotes
Acknowledgements
Many people—students, support staff, colleagues, and friends, as well as reviewers of our previous manuscripts and the editors of the journals to whom they have been submitted—have assisted us over the decade we have been studying the orofacial complex in feeding and, recently, speech. We owe a particular debt to Xuezhen Wu and Chune Yang for their painstaking efforts in creating the datafiles from the VFG tapes from which all our results flow. We are deeply indebted to Dr. Maureen Stone for her assistance in the speech/language literature and her willingness to give extensive advice during the preparation of this review.
Our research is supported in part by an award (R01 DC02123) from the USPHS National Institute on Deafness and other Communication Disorders. KMH has also been supported by Syracuse University in the preparation of this review.
