Abstract
Objective
Hyoid bone movement is potentially related to aspiration risk in post-stroke dysphagia (PSD) patients but is difficult to assess quantitatively. This study aimed to measure the distance of hyoid bone movement more efficiently and accurately using a deep learning model and determine the clinical usefulness of the model in PSD patients.
Methods
This study included 85 patients with PSD within 6 months from onset. Patients were grouped into an aspiration group (n = 35) and a non-aspiration group (n = 50) according to the results of a videofluoroscopic swallowing study. Hyoid bone movement was tracked using a deep learning model constructed with the BiFPN-U-Net(T) architecture. The maximum distance of hyoid bone movement was measured horizontally (Hmax), vertically (Vmax), and diagonally (Dmax).
Results
Compared with the non-aspiration group, the aspiration group showed significant decreases in hyoid bone movement in all directions. The area under the curve of Vmax was highest at 0.715 with a sensitivity of 0.680 and specificity of 0.743. The Vmax cutoff value for predicting aspiration risk was 1.61 cm. The success of oral feeding at the time of discharge was significantly more frequent when hyoid movement was equal to or larger than the cutoff value although no significant relationship was found between hyoid movement and other clinical characteristics.
Conclusion
Hyoid bone movement of PSD patients can be measured quantitatively and efficiently using a deep learning model. Deep learning model-based analysis of hyoid bone movement seems to be useful for predicting aspiration risk and the possibility of resuming oral feeding.
Introduction
Dysphagia is defined as the difficulty or discomfort experienced during the progression of a bolus from the mouth to the stomach. 1 More than 40% of stroke patients experience dysphagia, and the prevalence of dysphagia is steadily increasing. 2 Post-stroke dysphagia (PSD) is one of the most common and serious complications of stroke and can lead to malnutrition, dehydration, aspiration pneumonia, or even sudden death from airway asphyxia. 3 Consequently, the long-term prognosis could be worsened or the risk of mortality may increase because of delay in functional recovery, increased dependence, or prolonged hospitalization.4–6 The increased risks of morbidity and mortality associated with PSD are usually due to aspiration, which involves inhaling liquid or solid into the respiratory tract.7,8 Therefore, it is important to predict and prevent aspiraton in PSD patients early and accurately to prevent severe outcomes.
The hyoid bone is a horseshoe-shaped anatomical structure that plays an important role in the swallowing process by elevating the hyolaryngeal complex. Inadequate elevation of the hyolaryngeal complex due to decreased or untimely movement of the hyoid bone can cause aspiration in patients with PSD.9,10 The hyoid bone must move forward at the correct time with appropriate force to allow the epiglottis to protect the airway, preventing penetration or aspiration. Proper movement of the hyoid bone is also responsible for relaxing the upper esophageal sphincter and generating force that allows food to move into the esophagus.9–12 Discoordination or paralysis of laryngeal muscles resulting from stroke may compromise hyoid bone movement.13,14 Therefore, the movement of the hyoid bone should be assessed correctly in patients with PSD, but methods for objective and quantitative and objective measurement have not been established yet. 15
Hyoid bone movement is usually evaluated via a videofluoroscopic swallowing study (VFSS), which is recognized as a standard tool for evaluating dysphagia.16,17 X-ray fluoroscopic video images are taken while the patient swallows a bolus mixed with radio-opaque barium. VFSS is advantageous in that it allows direct visualization of anatomical structures such as the hyoid bone and pathological conditions, including aspiration and accumulation of residual food. However, visual tracking and assessing the movement of the hyoid bone in fast-moving VFSS images are challenging for human eyes. A dichotomous grading scale to categorize hyoid movement as normal or decreased or a ranking scale such as the Modified Barium Swallowing Impairment Profile is commonly used, but their usefulness is limited by low reliability and insufficient quantification.
To address these limitations, automatic tracking of the hyoid bone in VFSS images has been attempted.18–21 Investigators designed diverse software for computerized tracking in which the hyoid bone was tracked automatically after manual marking of the boundary of the hyoid bone and cervical vertebrae in the first few image frames. Although the use of computerized tracking allowed more quantitative and objective analysis, human effort was still needed, and the possibility of errors was still present. The accuracy and efficiency of computerized tracking have been greatly improved with the application of deep learning models, which have been significantly improved in recent years.22–24 Deep learning models can detect and segment hyoid bone autonomically, eliminating most of the bias of the assessor. 23 Recently, a previous study proposed a deep learning hyoid bone that can identify and track the hyoid bone accurately without any manual demarcation. 23 It was constructed on the basis of BiFPN U-Net and Bottleneck Transformer and achieved an accuracy of up to 99.5%. However, the clinical usefulness of the model has not been verified in evaluating the swallowing function of patients with dysphagia, including those with PSD.
The purpose of this study was to confirm the diagnostic significance of a deep learning-based hyoid bone tracking model for predicting aspiration risk in PSD patients.
Methods
Subjects
After reviewing the medical records of patients who were admitted to Dankook University Hospital for stroke from March 2022 to May 2023, 125 patients were selected according to the following criteria: (a) showed symptoms of aspiraton, such as coughing, difficulty swallowing, or hoarseness; (b) were evaluated by a VFSS within 6 months after the onset of stroke; and (c) had a penetration aspiration scale (PAS) score of 1–3 or 7–8 on the VFSS (Figure 1). 25 Patients with a PAS score of 4–6 were excluded because we intended to include the patients with definite aspiration and those without possibility of aspiration. Among those, patients who had a history of other neurological conditions that may cause dysphagia (such as traumatic brain injury, Parkinson's disease, or brain tumors), severe cognitive dysfunction, anatomical or functional changes in the neck or pharynx (such as oropharyngeal cancer, Zenker's diverticulum, or enlarged thyroid gland), neck or pharyngeal surgery, or radiation therapy on the neck or pharynx were excluded (Figure 1).

Flowchart of subject selection.
Videofluoroscopic swallowing study (VFSS)
A VFSS was performed based on Logemann's protocol with some modifications. 12 Lateral video images of X-ray fluoroscopy were taken from the oropharyngeal region while the patients were sitting upright. To standardize the relative size of the anatomical structures, a 24 mm diameter coin was attached to the surface of the skin on the lateral neck. Five types of boluses (thick liquid, soft blended diet, yogurt, thin liquid, and liquid ingested by drinking from a cup) mixed with radio-opaque contrast medium were used for recording swallowing images. Fluoroscopic images were recorded at a rate of 30 frames per second (fps) and saved as digital video files on a personal computer. Only the video images of swallowing thick liquid were used in this study to avoid a possible effect of the viscosity difference.26–28
Collection of VFSS images
VFSS videos recorded from the 109 patients were collected after exclusion, as mentioned above. Videos from 14 patients were excluded because of improper posture or tracheostomy-cannula insertion. Another 10 videos were excluded because the model could not accurately detect the hyoid bone, possibly due to shading or obstruction by other structures, when the deep learning model was applied. Finally, 85 videos were included in the analysis and grouped into two groups: the ASP group (n = 35), in which aspiration was clearly observed (PAS 7 or 8), and the no ASP group (NoASP group) (n = 50), in which there was no evidence of aspiration (PAS 1, 2, or 3). Each video was separated into small files containing one swallowing event using lossless editing software (LosslessCut® version 2.6.2, Copyright Mikael Finstad). Each swallowing event was defined as the period from the beginning of laryngeal elevation to returning to its original position. Some extra frames were also included before and after the swallowing event to avoid editing errors.
Deep learning-based hyoid bone tracking
A deep learning model based on BiFPN U-Net was used for hyoid bone tracking. The construction and performance details are described in a previous study. 23 This model was chosen for its superior accuracy in pixel detection and box detection compared to the models used in other studies. 23
Briefly, the model calculates the relative distance between the center of the hyoid bone and the center of the cervical spinal column (second to fifth cervical vertebrae) after segmenting those anatomical structures (Figure 2). A 24 mm diameter coin was attached to the skin on the side of the neck for the purpose of standardizing the distance measurement. Once the video files are input into the model, the relative positions of the hyoid bone and cervical vertebrae are automatically calculated in each frame. The coordinates for their positions were calculated as the average of the real-time segmented pixel coordinates (Figure 3). The vertical axis was automatically adjusted for each frame according to the coordinates of the cervical spine, while the horizontal axis (x-axis) was positioned perpendicular to the lowest coordinate of that axis (Figure 3). To evaluate movement quantitatively, the maximum distances of hyoid bone movement were measured horizontally, vertically, and diagonally (Figure 4). For each frame, two coordinates with the maximum horizontal movement were automatically selected, and the distance between them (Hmax) was measured by calculating the inter-coordinate distance. The distance of maximum vertical movement (Vmax) was measured by the same algorithm after the selection of two coordinates with the greatest difference in the vertical direction. The maximum diagonal distance (Dmax) was measured in the same manner. All the distances were standardized with reference to the diameter of the segmented image of a coin, which was measured along the long axis in the case of an oval appearance. The coordinates and the maximum distances of hyoid bone movement were calculated four times. All values of Hmax, Vmax, and Dmax were found to be in 100% agreement. If the input of the videos is identical, the analysis by the deep learning model will produce the same coordinates of the hyoid bone. To avoid overfitting or underfitting, the data set was split into training, validation, and test datasets at a ratio of 7 : 1 : 2. 23

The salient anatomical structures were segmented by the BiFPN-U-Net(T) network. The purple and green boxes indicate the hyoid bone in the highest and lowest positions, respectively. The hyoid bones at the highest and lowest positions are shown in a single frame for convenience. The yellow box indicates the cervical spinal column from the second to fifth cervical vertebrae. The blue box indicates a 24 mm coin.

The distance of hyoid bone movement was measured by the BiFPN-U-Net(T) network. Green objects indicate the hyoid bone over time, and purple objects indicate the cervical vertebral column from the second to fifth cervical vertebrae. The red arrow indicates the y-axis, and the blue arrow indicates the x-axis. (Cx, Cy) denotes the coordinate of the center of the cervical vertebral column and (Hx, Hy) denotes the coordinates of the hyoid bone. DRx and DRy correspond to the horizontal and vertical distances from (Cx, Cy) to (Hx, Hy), respectively.

An example of hyoid bone tracking by BiFPN-U-Net(T) in one subject at a rate of 30 frames per second. The red dots indicate the coordinates in each frame, and the numbers are the frame numbers. The maximum distance was measured in the horizontal (Hmax), vertical (Vmax), and diagonal (Dmax) directions.
Collection of medical records
Basic clinical information of the subjects was collected from medical records. It included age, sex, characteristics of stroke (size, side, and type), status of feeding at the time of discharge (oral vs. tube), and occurrence of ASP pneumonia during the admission period.
The size of the lesion was measured as the area of the lesion in axial slices of the diffusion magnetic resonance imaging for ischemic infarct and computed tomography scan for hemorrhagic infarct, in the slice demonstrating the largest dimension of the lesion.29,30 Multiple lesions and extensive hemorrhagic lesions such as spontaneous subarachnoid hemorrhage (S-SAH) and spontaneous subdural hemorrhage (S-SDH) were excluded from the calculation. The number of subjects measured was 49 in total. The size of an infarct was dichotomized as large if it exceeded 4 cm2 or small if it was at most 4 cm2 because a size of 4 cm2 frequently contributed to the risk of a major stroke.29,31
For evaluating the clinical status of subjects, clinically significant parameters such as the status of feeding at the time of discharge, occurrence of aspiration pneumonia during the admission period, and follow-up PAS were collected. 32 The status of feeding at the time of discharge was collected from all of the 85 subjects. The medical records during admission were investigated to confirm whether aspiration pneumonia occurred. The follow-up VFSS was performed in 49 subjects in total. Among the 35 subjects with aspiration in the initial VFSS, the follow-up VFSS was conducted only in 23 subjects.
Analysis of results
Results were analyzed in two steps. In the first step, the results of the hyoid movement distance in each direction obtained using the deep learning model were compared between the ASP and NoASP groups, as well as the cutoff value was determined for classifying the groups. In the second step, clinical characteristics were compared between the subjects with and without a decrease in hyoid bone movement according to the cutoff value.
Statistical analysis
SPSS (SPSS Version 21.0, IBM Corp., Armonk, NY) was used for the statistical analysis. Differences in baseline characteristics between the groups were compared using the chi-square test for noncontinuous variables and the Mann‒Whitney U test for continuous variables. Differences in the distance of hyoid bone movement between the groups were tested using the Mann‒Whitney U test. A p-value <0.05 indicated statistical significance. Receiver operating characteristic (ROC) curves were analyzed to assess the diagnostic value of three variables (Hmax, Vmax, and Dmax) for differentiating the ASP and NoASP groups, and the cutoff values were determined by Youden's index. The chi-square test was used for the analysis of correlation between clinical status and hyoid bone movement. When the sample size was small or distributed unequally, Fisher's exact test was used.
Results
There were no significant differences between the groups in terms of baseline characteristics (Table 1). The distribution of lesion locations is shown in Table 2.
Baseline characteristics of the subjects.
ASP: aspiration group; NoASP: non-aspiration group.
Value for age is median (range 25th–75th). Other values are the number of cases.
P values were obtained by the Mann–Whitney U test for age and by the chi-square test for gender and cause.
Distribution of lesion locations.
MCA: middle cerebral artery; PICA: posterior inferior cerebellar artery; PCA: posterior cerebral artery; S-SAH: spontaneous subarachnoid hemorrhage; S-SDH: spontaneous subdural hemorrhage; S-ICH: spontaneous intracerebral hemorrhage.
In all directions, the movement of the hyoid bone was significantly decreased in the ASP group than in the NoASP group (p < 0.05) (Table 3). In the classification of the ASP and NoASP groups, ROC curve analysis revealed an area under the curve (AUC) of 0.689 for Hmax, 0.715 for Vmax, and 0.709 for Dmax (Figure 5 and Table 4). As shown in Table 4, Vmax had the highest AUC value and sensitivity. Dmax showed the same sensitivity as Vmax but the AUC was lower. Hmax showed much higher specificity than the other two variables but the sensitivity and AUC value were much lower. Vmax seemed to have the highest diagnostic value for the prediction of aspiration, and its cutoff value was 1.61 cm.

ROC curve of the maximum distance of hyoid bone movement for classifying the ASP group and NoASP group.
Maximum distance of hyoid bone movement.
ASP: aspiration group; NoASP: non-aspiration; Hmax: maximum horizontal distance; Vmax: maximum vertical distance; Dmax: maximum diagonal distance.
The values are the medians (range 25th–75th).
P values were obtained by the Mann–Whitney U test.
Results of the ROC curve analysis.
ROC: receiver operating characteristic; AUC: area under the curve; Hmax: maximum horizontal distance; Vmax: maximum vertical distance; Dmax: maximum diagonal distance.
No significant relationship was found between hyoid movement and each clinical characteristic such as side, size, type, or location of the lesion, when the subjects were divided into two groups based on the cut-off value of each direction (Hmax, Vmax, or Dmax) (Tables 5 and 6). The occurrence of aspiration pneumonia was also not significantly different between the subjects with and without decreased hyoid movement, although pneumonia occurred only in 10 patients limiting the statistical strength (Table 7). However, the success of oral feeding at the time of discharge was significantly more frequent when hyoid movement was equal to or larger than the cutoff value (Table 8).
Hyoid bone movement versus clinical characteristics.
MCA: middle cerebral artery; PCA: posterior cerebral artery; Hmax: maximum horizontal distance; Vmax: maximum vertical distance; Dmax: maximum diagonal distance.
Not decreased: equal to or longer than the cutoff value.
Decreased: shorter than the cutoff value.
No significant relationship was found between hyoid bone movement and clinical characteristics (cause, side, and site) by chi-square test.
Change of hyoid bone movement according to the size of the lesion.
Hmax: maximum horizontal distance; Vmax: maximum vertical distance; Dmax: maximum diagonal distance.
Not decreased: equal to or longer than the cutoff value.
Decreased: shorter than the cutoff value.
P values were obtained by the chi-square test.
Diffuse or multiple lesions were excluded.
Occurrence of aspiration pneumonia during the hospital stay versus hyoid bone movement.
Hmax: maximum horizontal distance; Vmax: maximum vertical distance; Dmax: maximum diagonal distance.
Not decreased: equal to or longer than the cutoff value.
Decreased: shorter than the cutoff value.
P values were obtained by Fisher's exact test.
Status of feeding at discharge versus hyoid bone movement.
Hmax: maximum horizontal distance; Vmax: maximum vertical distance; Dmax: maximum diagonal distance.
Not decreased: equal to or longer than the cutoff value.
Decreased: shorter than the cutoff value.
P values were obtained by the chi-square test.
Discussion
Aspiration refers to the entry of foreign materials into the airway and is one of the most serious complications of PSD. 28 Serious conditions such as pneumonia or asphyxia may result when there is a considerable amount of aspiration, leading to a poor prognosis. Therefore, its prediction is essential for successful rehabilitation of PSD patients.
The hyoid bone plays a key role in protecting the airway from aspiration. Proper movement of the hyoid bone enables the epiglottis to appropriately protect the airway and move boluses to the esophagus by opening the esophageal sphincter.9–12 Many experts believe that the kinematic features of hyoid bone movement are related to the risk of aspiration.19,21,29 Patients with dysphagia may exhibit reduced hyoid bone movement due to discoordination or paralysis of the laryngeal muscles.13,14 Impaired movement of the hyoid bone can be an indicator of an increased risk of aspiration. Thus, accurate analysis of hyoid bone movement is essential for assessing swallowing function and predicting aspiration risk in patients with PSD.10,11
The movement of the hyoid bone is usually analyzed using the VFSS, which is regarded as the gold standard for evaluating swallowing function. A dichotomous scale, indicating the presence or absence of a reduction in hyoid bone movement, is most commonly used. However, it is difficult to measure the movement of a small structure, such as the hyoid bone, precisely with human eyes in fast-moving VFSS videos. It is a time-consuming and ineffective procedure that requires frame-by-frame inspection.
To overcome the limitations of visual observation, computerized analysis has been proposed for a more accurate and efficient estimation of hyoid bone movement. Previous studies have utilized programs that can measure the kinematics of hyoid bone movement and their correlation with aspiration.19,21,33–35 They indicated that the risk of aspiration decreases when the distance of anterior or horizontal movement of the hyoid bone increases.21,36,37 It has been demonstrated that the velocity of horizontal movement is decreased in patients with dysphagia, although its relationship with aspiration risk has not been analyzed directly. 29 The main drawback of previous computerized analysis methods is that they still require manual marking of the hyoid bone by examiners, leaving the possibility of human bias.
Methodologically this study was basically a retrospective study using the video files pre-recorded for evaluating dysphagia by VFSS but the measurement of hyoid bone movements using a deep learning model aligned with a systematic collection of clinical data that is typical of prospective studies. The deep learning model used in this study is a prototype designed for study purposes only. Its applicability should be expanded by enhancing the convenience for clinical use in the future. Clinicians could get help by reducing fatigue and increasing efficiency and accuracy in measuring the distance of hyoid bone movement. Additionally, the high reproducibility of the model without any human bias can promise accurate results. The prediction of aspiration and clinical outcome would be another benefit of the model.
In this study, a deep learning model was employed to minimize human intervention and achieve completely automated analysis. The model was constructed with BiFPN U-Net to estimate the relative distance between the center of the hyoid bone and the center of the cervical vertebrae after segmenting those anatomical structures, as published in a previous study. 23 Its accuracy in detecting and tracking the hyoid bone has already been shown to be superior to that of other models. 23 The distance of hyoid bone movement could be automatically measured without any human intervention other than editing video files using the deep learning model.
The hyoid bone is known to move vertically first and then forward, resulting in the appearance of diagonally shaped movement by returning to its original position in reverse order.9,10 To assess the kinematics more comprehensively, we measured the distance in three directions, namely, the vertical, horizontal, and diagonal directions, using variables designated Vmax, Hmax, and Dmax, respectively. As shown in Table 4, Vmax exhibited the most balanced value for sensitivity and specificity. Dmax exhibited the same sensitivity as Vmax, but the specificity and AUC were lower. Hmax showed much higher specificity than the other two variables, but the sensitivity was too low. These results suggest that Vmax should be considered the best diagnostic variable for the prediction of aspiration, whereas Dmax might also be useful. An AUC value higher than 0.7 indicates a good diagnostic value, suggesting that the deep learning-based hyoid bone tracking model can predict aspiration correctly.
The results showed that the clinical characteristics were not different between the subjects with and without a decrease in hyoid bone movement. It was unexpected that the occurrence of aspiration pneumonia was not more frequent in the subjects with a decrease in hyoid bone movement. The result could be affected by a low incidence of aspiration pneumonia in this subject population (only 10 in 85). Nevertheless, the possibility of regaining the ability of oral feeding was significantly higher when the hyoid bone movement was not decreased below the cutoff value. We believe that the clinical importance of this result is very significant because the ultimate goal of dysphagia rehabilitation is resuming oral feeding without the risk of aspiration. It also underscores the importance of comprehensive swallowing rehabilitation to improve movements of hyoid bone in patients with dysphagia, especially in the horizontal and diagonal directions. We concluded that hyoid bone movements should be crucially focused in both diagnostic evaluations and clinical strategies.
The importance of movement direction appeared somewhat different in this study. The distance of vertical movement was found to be most useful in predicting aspiration in PSD though horizontal and diagonal distance seemed to be of a greater clinical value in foretelling the possibility of oral feeding. This finding emphasizes the whole kinematics of the hyoid bone movements should be analyzed for evaluating and predicting the clinical status of patients with dysphagia.
This study also has several limitations. VFSS videos taken with only a single swallow event of thick liquid were included in this study. Swallowing patterns may vary with different viscosities and events even in the same patient. The inclusion of only subacute patients could also limit the diversity of training data. Further studies including data from multiple swallows with diets of different viscosities and chronic stroke patients are required to improve the accuracy of the model. Some of the video images had to be excluded because the hyoid bone could not be identified clearly due to poor image quality or was not in the frame on the screen due to abrupt changes in the head and neck position. Efforts to maintain optimal image quality are required when performing VFSS. The video files were edited before analysis to include a single swallowing event. The development of a more sophisticated algorithm that can detect swallow events in whole-video recordings is desirable to achieve a truly automated model.
To the best of the author's knowledge, this is the first study that used a deep learning-based hyoid bone tracking model to predict aspiration in patients with PSD. This study conclusively revealed that the movement of the hyoid bone is reduced in PSD patients when aspiration is present. The results of this study support that aspiration could be predicted by using deep learning-based analysis of hyoid bone movement. In future studies, it is anticipated that more precise and valuable models could be developed using more features and factors that can be used more efficiently in real clinical settings.
Conclusion
The movement of the hyoid bone can be measured efficiently and accurately using deep learning-based analysis of hyoid bone movement. The deep learning model can help clinicians predict the presence of aspiration and the status of feeding after the completion of rehabilitation in PSD. It is expected that such models will be used widely in the clinical setting by enhancing user convenience.
Footnotes
Acknowledgement
None.
Contributorship
YHR contributed to conceptualization, methodology, data curation, formal analysis, and writing–original draft. DK contributed to methodology, data curation, review, and editing. SYK contributed to supervision, review, and editing. JHK contributed to data curation, formal analysis, and editing. SJL contributed to project administration, conceptualization, methodology, data curation, formal analysis, visualization, review, and editing.
Declaration of conflicting interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical approval
This study received ethical approval from the Dankook University Hospital Institutional Review Board (approval no. 2022–12-017) on 10 February 2023. This is an institutional review board-approved retrospective study, all patient information was de-identified and patient consent was not required. Patient data will not be shared with third parties.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article. This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2021R1F1A1062248).
Guarantor
SJL
