Sage Journals: Discover world-class research

Abstract

Background

In the clinical assessment of chronic insomnia, modern medicine encounters challenges in the widespread adoption of objective assessment methods, such as polysomnography, due to their high costs and complex operational requirements. Therefore, it was crucial to implement objective and varied methods to accurately assess chronic insomnia.

Methods

Clinical information was collected from 594 patients diagnosed with chronic insomnia. Their facial and tongue features, as observed through traditional Chinese medicine, were recorded using the tongue face diagnosis analysis-1 instrument and analyzed with the tongue/face diagnosis analysis system. Stepwise-regression analysis, principal component analysis, and zero-inflated negative binomial (ZINB) regression were employed for variable screening. Ultimately, the classification model for assessing the severity of chronic insomnia was constructed using six supervised machine learning methods: decision trees, neural networks, random forests, support vector machines, logistic regression, and naive Bayes. The model was evaluated using sensitivity, specificity, F1 score, precision, and accuracy. Visualization was conducted with the Shapley Additive exPlanations (SHAP) explainer and decision curve analysis (DCA), and it was finally calibrated using Platt scaling.

Results

A comprehensive evaluation revealed that model 4 exhibited superior performance. This model integrated baseline data, sleep symptoms, and facial features, achieving its highest receiver operating characteristic curve value of 0.822. Furthermore, DCA demonstrated that model 4 exhibited significant clinical utility. The SHAP illustrated that among the variables exerting the greatest influence on insomnia, the effects of Ch-Y and CH-G were notably prominent, compared to the conventional PSQI and SAS, followed by Ch-R. Finally, the results of Platt scaling calibration indicated that model 4 exhibited significant improvement post-calibration, with the predicted probabilities closely aligning with the actual probabilities of occurrence.

Conclusions

This study emphasized convenient and non-invasive diagnostic methods for investigating specific facial and tongue features associated with chronic insomnia. A classification model for chronic insomnia has been developed through the integration of multiple features, enabling a more accurate assessment of insomnia severity and facilitating advancements in therapeutic interventions.

Keywords

Chronic insomnia traditional Chinese medicine tongue diagnosis facial features machine learning

Introduction

Socio-economic factors, modern lifestyles, work-related pressures, and living environments have led to numerous adverse effects on human sleep. Consequently, insomnia must be recognized as a significant public health issue,¹ as it has been closely linked to cardiovascular and metabolic diseases.^2,3 Chronic insomnia was characterized by insomnia symptoms occurring at least three times per week for a minimum of three months.⁴ However, the widespread adoption of objective assessment methods, such as polysomnography, was hindered by their high costs and complexities.⁵ As a result, self-reported symptoms—including difficulties in falling asleep or maintaining sleep, daytime fatigue, and related scales—remain the predominant means of evaluating chronic insomnia. In light of this, it was imperative to implement accurate assessments of chronic insomnia utilizing both objective and diverse methodologies.^6,7

Traditional Chinese medicine (TCM) has been studying insomnia for thousands of years.^8–11 However, much of the existing research on chronic insomnia has predominantly focused on behavioral factors and somatic symptom presentations. Therefore, by identifying the specific features of chronic insomnia from the perspective of simple, convenient, and cost-effective TCM diagnostics, and by assessing insomnia in a precise and prospective manner, this approach could provide an opportunity for timely clinical interventions. The origins of TCM observation diagnosis can be traced back to the Spring and Autumn Period and the Warring States Period, with color diagnosis emerging as the most widely practiced method and serving as a crucial indicator of diagnosis in TCM. With the advancement of modern interdisciplinary research and informatics, precision diagnostic equipment has facilitated the establishment of visual assessment data derived from visual assessment experiments, demonstrating its objectivity and reliability in the fields of color science and color engineering.^12,13 Quantitative analyses of tongue and facial coloration have been ongoing for an extended period.^14–16 Variations in diseases and health status, along with changes in blood circulation and emotional states, could induce subtle color alterations in the tongue and facial images that were often imperceptible to the naked eye.^17,18 This phenomenon was encapsulated in the “Pathological complexion” theory, which posited that abnormal tongue or skin color could indicate internal disorders in TCM.

Supervised machine learning has established applications in healthcare.¹⁹ Common methods included decision trees, neural networks, random forests, SVM, logistic regression, and naive Bayes; these offered significant advantages in processing quantitative features of the tongue and facial, classifying insomnia severity, and enhancing models' interpretability. Consequently, this study constructed classification models to assess the severity of chronic insomnia using the aforementioned supervised machine learning methods. This approach provided a novel perspective for exploring the relationship between modernized observation of digital information, grounded in TCM theories, and chronic insomnia, thereby inspiring new solutions to public health challenges.

Material and methods

Study population

The participants in this study comprised 1524 chronic insomnia patients who attended the Department of Psychiatry at Shanghai Municipal Hospital of TCM and the Department of Rehabilitation at Xiangshan TCM Hospital from January 2021 to December 2022 in Shanghai, China. Their basic information and clinical consultation records were collected, and the participants’ insomnia, as well as their psychological and emotional states, were assessed using a self-assessment scale system. The study was conducted in accordance with the guidelines of the Declaration of Helsinki and received approval from the Ethics Committees of Shanghai Municipal Hospital of TCM (batch number: 2021SHL-KY-11-01) and Xiangshan TCM Hospital (batch number: XSEC2021001). All participants provided written informed consent, which was also approved by the Ethics Committee.

Insomnia severity assessment

The insomnia severity index (ISI) is a validated screening instrument designed to assess the severity of insomnia symptoms. It comprises seven items, with a higher total score (ranging from 0 to 28) indicating more severe insomnia.²⁰ The items assess the following aspects separately: (1) the severity of sleep onset, sleep maintenance, and early morning awakenings; (2) the satisfaction level with the current sleep pattern; (3) the degree to which sleep difficulties interfere with daytime functioning; (4) the noticeability of sleep problems by others; (5) the distress caused by these sleep difficulties. Higher scores correlate with greater insomnia severity.^21,22 The scoring system is categorized as follows: <15 indicates mild insomnia; 15–21 denotes moderate insomnia; and 22–28 signifies severe insomnia.²³ Moderate and severe insomnia were analyzed as one group (moderate/severe) in this study. Additionally, the PSQI was administered to all participants to evaluate their sleep quality. The total PSQI score ranges from 0 to 21, with a higher score indicating poorer sleep quality.²⁴ The Self-Rating Depression Scale (SDS) and the Self-Rating Anxiety Scale (SAS) were also utilized to assess the participants’ mental and emotional states. Higher scores on the SDS or SAS suggest that the individual experiences greater levels of depression or anxiety.^25,26

Selection criteria

The inclusion criteria for this study were patients who presented to the clinic with insomnia symptoms occurring at least three times per week and persisting for a minimum of three months. The exclusion criteria included: (1) pregnant or breastfeeding women; (2) individuals with serious primary diseases affecting the circulatory, respiratory, digestive, urinary and endocrine systems; (3) individuals with breathing-related sleep disorders, restless legs syndrome, periodic limb movement disorder, environmental sleep difficulties, or sleep deprivation syndrome; (4) individuals with severe mental disorders, alcoholism, or drug abuse. Additionally, to mitigate the potential influence of braces or dentures on tongue characteristics, individuals wearing these appliances were excluded from the study. Similarly, participants who wore facial makeup were also excluded. Following the enrollment process (Figure 1), a total of 594 patients with chronic insomnia were included in the study. The subjects were categorized into two groups based on the ISI: mild insomnia group (n = 240), moderate-to-severe insomnia group (n = 354).

Figure 1.

Flow chart for enrollment of study participants.

TCM observation information collection and analysis

The facial regions selected for this study were based on the principles outlined in the Huangdi Neijing (the Yellow Emperor's Canon of Internal Medicine), which was first published over 2000 years ago.^27–29 The representative facial areas chosen for collection included the forehead, the center of the eyebrows, the tip of the nose, the zygomas, the cheeks, and the jaw.³⁰ Specifically, the forehead was located near the intersection of the frontal hairline and the bisector of the line connecting the eyebrows to the anterior midline. The center of the eyebrows was positioned at the intersection of the line connecting the medial sides of the eyebrows with the anterior midline. The tip of the nose was situated at the point where the anterior midline intersects the nose. The left zygoma was found at the intersection of the vertical line extending from the left lateral canthus of the eye and the convexity of the left cheekbone, whereas the right zygoma corresponded to the vertical line from the right lateral canthus of the eye that intersects the convexity of the right cheekbone. The left cheek was located at the midpoint of the line connecting the inner and outer canthus of the left eye with the tip of the nose, whereas the right cheek was found at the midpoint of the line connecting the inner and outer canthus of the right eye with the tip of the nose. Finally, the jaw was positioned at the intersection of the anterior midline with the lower jaw. Additionally, images of the tongue were collected from both the sublingual veins (SV) and the surfaces of the tongue.

Researchers, following professional acquisition training from the Intelligent Processing Laboratory of TCM Diagnostic Information at Shanghai University of TCM, employed the self-developed Tongue Face Diagnosis Analysis-1 Instrument (TFDA-1) tongue and face diagnosis instrument (Equipment number ER17005-201810-41/50) to acquire tongue and facial images between 8:00 and 11:00 a.m. The TFDA-1, a registered medical device, was based on the National Key Research and Development Program for the Modernization of Chinese Medicine Research Special Project (No. 2017YFC1703301) and had been widely utilized for clinical collection of tongue and face images to assess organism status.^31–33 Its D50 light source provided high stability and excellent color reproduction. Notably, the TFDA-1 was equipped with a lower jaw rest designed to hold the lower jaw in place, ensuring a standardized distance between the tongue or face and the lens, thereby minimizing distortion of the digital image in earlier studies.³⁴ Prior to image acquisition, the instrument was disinfected using alcohol-soaked cotton balls, and the researchers ensured that participants maintained neutral facial expressions, arrived with empty stomachs, and had clean mouths and tongues free of foreign objects or discoloration. Participants were instructed to sit comfortably, resting their lower jaw on the mandibular support. Initially, they closed their eyes slightly to align their faces for image capture. They then opened their mouths and extended their tongues to fully expose the tongue surface for imaging. Finally, participants were asked to gently place the tip of their tongues against the maxillary and incisal boundaries, thereby completely exposing the SV for collection.

This study employs the tongue/face diagnosis analysis system (TDAS/FDAS), developed by the Smart Diagnosis Technology Research Team at Shanghai University of TCM, to extract color features from both the tongue and face.^31–33 For facial analysis, the system consistently captures the color domain of facial points, facilitating the acquisition of color values from six distinct regions of the face. Specifically, the color values for the zygomas and cheeks were derived from the averages of their respective left and right sides. The forehead was denoted as “Fh,” the center of the eyebrows as “Eb,” the tip of the nose as “Ns,” the zygomas as “Zg,” the cheeks as “Ch,” and the jaw as “Ja.” In terms of tongue analysis, the system automatically extracts relevant tongue components—separating the tongue body and tongue coating on the surface, as well as distinguishing between the SV and other structures on the ventral surface—and calculates their corresponding color values. The face and tongue color indexes were derived from RGB color space, Lab color space, and YCrCb color space, with the prefix “SV-” indicating sublingual veins, “TB-” indicating tongue body, and “TC-” indicating tongue coating (Figure 2).

Figure 2.

Process of collecting and analyzing the information of TCM observation.

Statistical analysis

This study employed IBM SPSS version 25.0 statistical software for data analysis. Non-continuous data were reported as frequencies and percentages. Continuous data with a normal distribution were presented as means and standard deviations (SD), while non-normally distributed continuous data were described using medians and interquartile ranges (IQR). Group comparisons were conducted using the chi-square test for categorical variables and the Mann–Whitney U test for continuous variables, with adjustments made using the Bonferroni correction to account for multiple testing. A p-value of less than 0.05 (two-tailed) was deemed statistically significant for the comparisons.

Stepwise-regression analysis

Stepwise regression, a method for variable selection, falls under the category of multiple linear regression models. Its general form can be represented as follows:

Y = β_{0} + β_{1} X_{1} + β_{2} X_{2} + \dots + β_{p} X_{p} + ε

where Y denotes the dependent variable, $X_{i}$ represents the explanatory variable, $β_{i}$ is the regression coefficient, and $ε$ signifies the random error term.

Stepwise methods begin with the null model or with certain variables to enhance the model's performance, by selecting or eliminating a single variable at each step.³⁵ The dynamic evolution of stepwise regression can be formulated as

Y = β_{0} + \sum_{i \in S} β_{i} X_{i} + ε

Where S is the set of independent variables that are updated at each step based on the results of the tests conducted. In this study, we utilized stepwise regression analysis to automatically identify tongue sign characteristics that significantly influence insomnia severity, and employed F-tests to validate the effectiveness of this stepwise regression model.

Principal component analysis

To filter the core indicators from the extensive color information, this study employed principal component analysis (PCA) for feature dimensionality reduction.³⁶ An orthogonal transformation was utilized to convert d-dimensional data into d′-dimensional data (where d′ < d), aiming to retain as much information as possible. It is important to note that there exists a certain linear dependence among the d-dimensional data, whereas the d′-dimensional data is characterized by a lack of such linear dependence.³⁷ PCA decomposes the total variance of the m original indicators into the sum of the variances of m uncorrelated composite indicators, thereby maximizing the variance of the first principal component, denoted as $λ_{1}$ . The contribution ratio of the first principal component is defined as the ratio of the variance of the first principal component $Z_{1}$ to the total variance, expressed as

λ_{1} / \sum_{i = 1}^{m} λ_{i}

The larger this ratio, the greater the ability of this indicator to synthesize the original indicators.

The contribution rate of the i-th principal component $Z_{i}$ , is expressed as

\frac{λ_{i}}{\sum_{i}^{m} λ_{i}} = \frac{λ_{i}}{m} (k = 1, 2, \dots, m)

The cumulative contribution rate of the first k principal components is expressed as

\sum_{i = 1}^{k} \frac{λ_{i}}{m} (k \leq m)

The retention of principal components is guided by the following principles: (1) The number of principal components is determined using the scree plot. If the inflection point of the scree plot occurs at the K-th principal component, the first k principal components are retained; (2) The first k principal components are retained when the cumulative contribution rate exceeds 60%.

The factor loading, which represents the correlation coefficient between the i-th principal component $Z_{i}$ and the j-th original indicator $X_{j}$ , indicates both the strength and direction of the relationship between the principal component $Z_{i}$ and the original indicator $X_{i}$ . The factor loading is calculated as the product of the square root $\sqrt{λ_{i}}$ of the eigenvalue of the i-th principal component $Z_{i}$ and the coefficient $α_{i j}$ of the j-th original indicator $X_{j}$ , expressed as

q_{i j} = \sqrt{λ_{i} α_{ij}}

By extracting the rotated component matrix, we obtain the absolute value of the loading coefficient between the common factor (principal component) and the original face color indicator. A larger absolute value indicates that the corresponding face color indicator is more representative of the original data features.

Zero-inflated negative binomial (ZINB) regression model

This study utilized a ZINB regression model to explore the correlation between patients’ nighttime sleep patterns and ISI scores, aiming to identify key clinical indicators of sleep conditions. Zero-inflated models integrate a model for excess zeros with a regression model for count data, accounting for potential zero values^38,39 (ISI scores ranging from 0 to 28). The ZINB regression is particularly suitable for analyzing count variables with a high number of zero counts, where the excess zeros were assumed to arise from a different process than the count values and are modeled separately.⁴⁰

Classification by supervised machine learning approach

This study utilized several common supervised machine learning algorithms, including decision trees, neural networks, random forests, SVM, logistic regression, and naive Bayes, to develop classification models for insomnia severity. Mild insomnia, defined as an ISI score of less than 15, served as the reference category for categorizing the results, whereas an ISI score greater than 15 was designated as the positive category for classification purposes. Decision trees are traditional algorithms used for classification and regression tasks; they divide data based on training data and feature attributes for effective classification or prediction.^41,42 Neural networks emulate human brain networks and can tackle various machine learning challenges, including classification and regression.^43,44 Random forests, which are ensemble learning algorithms based on decision trees, construct multiple trees and merge their results, with randomness helping to prevent overfitting.^45,46 Support vector machines (SVMs), used for both classification and regression, project data into a high-dimensional space to identify optimal segmentation planes for classification or regression.^47,48 Logistic regression analysis estimates the probability of a specific output class based on input variables,⁴⁹ whereas naive Bayes classification, grounded in Bayes’ theorem, classifies samples by calculating the probability of a sample belonging to a particular class.⁵⁰

This study represents AI-driven digital TCM research focused on clinical prediction modeling. The R programming language was employed for modeling purposes. To address missing values and outliers, features with more than 20% missing values were excluded, along with 16 samples exhibiting extreme bias. The continuous variables in the remaining samples underwent Z-score normalization, and the final dataset was randomly divided into training and test sets in a 7:3 ratio. For each model, a set of hyperparameters that maximized the area under the receiver operating characteristic (ROC) curve (AUC) for the training set was selected using a Bayesian optimizer, ensuring optimal performance and effective prediction on the test set. All models were evaluated using 10-fold cross-validation to enhance robustness and reliability. The performance of the model was demonstrated using a confusion matrix. It was evaluated through various metrics, including accuracy, sensitivity (recall), specificity, F1 score, precision, AUC, and was presented in the radar plot format. The formulas for sensitivity, specificity, F1 score, precision, and accuracy were as follows:

S e n s i t i v i t y = \frac{T P}{T P + F N}

S p e c i f i c i t y = \frac{T N}{T N + F P}

F 1 s c o r e = \frac{2 \times P r e c i s i o n \times S e n s i t i v i t y}{P r e c i s i o n + S e n s i t i v i t y}

P r e c i s i o n = \frac{T P}{T P + F P}

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

Visualization of supervised machine learning

To enhance the interpretability of the machine learning models, this study employed the Shapley Additive exPlanations (SHAP) interpreter to visualize the classification results. The importance and potential directionality of various factors influencing the severity of chronic insomnia were analyzed using the SHAP value as a standardized measure of feature importance. Furthermore, DCA were conducted to evaluate the clinical utility of the models.^51–53

Calibration of supervised machine learning models

Calibration ensures that the predicted probabilities accurately reflect the true likelihood of the outcome, thereby enhancing models reliability and credibility.⁵⁴ In this study, the Platt Scaling probability calibration method was employed to map these scores to calibrated probabilities within the 0–1 interval by introducing a sigmoid function, which aligned them more closely with the actual proportion of positive class samples. By setting the original output scores of the model as f(x), the calibrated probabilities are calculated as follows:

P (y = 1 | f (x)) = \frac{1}{1 + e x p (A \cdot f (x) + B)}

Here, A and B are the parameters to be optimized, obtained by maximizing the log-likelihood function. Great likelihood estimation is subsequently utilized to optimize the parameters A and B. The objective is to minimize the following loss function:

L (A, B) = - \sum_{i = 1}^{N} [y_{i} \log p_{i} + (1 - y_{i}) \log (1 - p_{i})]

where $p_{i} = \frac{1}{1 + e x p (A \cdot f (x) + B)}$ and $y_{i}$ represents the true label of the sample.

After calibration, the effectiveness is evaluated using the observed/expected ratio (O/E), slope, integrated calibration index (ICI), and expected calibration error (ECE). The ideal calibration results are as follows: O/E ≈ 1, Slope ≈ 1, and both ECE and ICI close to 0. When O/E is close to 1, it indicates that the predicted probability aligns closely with the actual incidence; when the slope approaches 1, it suggests that the raw scores are near the true probability distribution; a decrease in ICI signifies an improvement in global calibration effect; and a reduction in ECE indicates that calibration has lessened local error.

Ethics approval and consent to participate

The study was conducted according to the guidelines of the Declaration of Helsinki and was approved by the Ethics Committees of Xiangshan TCM Hospital (batch number: XSEC2021001) and Shanghai Municipal Hospital of TCM (batch number: 2021SHL-KY-11-01). All participants signed a written informed consent form that was approved by the Ethics Committee.

Results

Basic information of the insomniacs

The study involving 594 patients with varying degrees of insomnia revealed significant differences related to age, marital status, and living situation (p < 0.05) (Table 1). Notably, moderate-to-severe insomniacs had a mean age of 49 years, indicating a younger demographic compared to those experiencing mild insomnia. Among the insomniac population, a higher percentage were married (80.13%), and a greater proportion of those living together (86.70%) developed chronic insomnia compared to those living alone. Furthermore, a higher percentage of patients with moderate-to-severe insomnia were on medication (50.85%) compared to those with mild insomnia. Lastly, the assessment of insomnia symptoms indicated significant differences across multiple dimensions, including sleep quality and mood (p < 0.001).

Table 1.

General characteristics of insomniacs (n = 594).

Variables	All patients (n = 594)	Mild insomnia (n = 240)	Moderate-to-severe insomnia (n = 354)	p-Value
Demographic characteristics
Age, mean (SD)	50.00 (14.00)	52.20 (13.38)	49.00 (14.00)	0.014^a
BMI, mean (SD)	22.14 (3.00)	22.28 (2.92)	21.97 (3.06)	0.423^a
Gender, N (%)				0.120^b
Male	146 (24.58)	67 (27.92)	79 (22.32)
Female	448 (75.42)	173 (72.08)	275 (77.68)
Marital status, N (%)				0.004^b
Single, widowed, or divorced	118 (19.87)	34 (14.17)	84 (23.73)
Married	476 (80.13)	206 (85.83)	270 (76.27)
Educational level, N (%)				0.078^b
Graduate student	48 (8.08)	15 (6.25)	33 (9.32)
University student	284 (47.81)	115 (47.92)	169 (47.74)
Secondary school students	235 (39.56)	104 (43.33)	131 (37.01)
Primary school and below	27(4.55)	6 (2.5)	21(5.93)
Living situation, N (%)				0.036^b
Living alone	79 (13.30)	23 (9.58)	56 (15.82)
Living together	515 (86.70)	217 (90.42)	298 (84.18)
Occupation, N (%)				0.778^b
Physical work	140 (23.57)	58 (24.17)	82 (23.16)
Mental work	454 (76.43)	182 (75.83)	272 (76.84)
Medical history
Causes of insomnia, N (%)				0.095^b
Physical illness	49(8.25)	18 (7.50)	31(8.76)
Socio-psychological stress	179 (30.13)	70 (29.17)	109(30.79)
Both of the above	68 (11.45)	18 (7.50)	46 (12.99)
Unknown cause	298 (50.17)	134 (55.83)	168 (47.46)
Course of insomnia, N (%)				0.947^b
<6 months	249 (41.92)	101 (42.08)	148 (41.81)
≥6 months	345 (58.08)	139 (57.92)	206 (58.19)
Medication taken, N (%)				<0.001^b
Yes	262 (44.11)	82 (34.17)	180 (50.85)
No	332 (55.89)	158 (65.83)	174 (49.15)
Symptom assessment
ISI score, mean (SD)	16.00 (5.09)	11.08 (2.49)	19.34 (3.42)	<0.001^a
PSQI score, mean (SD)	13.18 (3.17)	11.53 (2.91)	14.30 (2.85)	<0.001^a
SAS score, mean (SD)	45.44 (10.00)	41.18 (8.44)	48.32 (9.95)	<0.001^a
SDS score, mean (SD)	48.84 (10.70)	44.43 (9.42)	51.83 (10.49)	<0.001^a

Note: ^a Used Mann–Whitney U test, ^b used the chi-square test.

Tongue and facial features

This study identified significant differences in the color of the SV among patients with varying degrees of insomnia (Table 2). The median differences were particularly pronounced for the first four indicators: SV-R, SV-Y, SV-G, and SV-B. Specifically, compared to patients with mild insomnia, those with moderate-to-severe insomnia exhibited a gradient increase in both the SV-R and SV-Y, with statistically significant differences (p < 0.05). This finding suggested that the intensity of red coloration in the SV of insomnia patients intensified as the severity of insomnia increases. Additionally, the SV-G and SV-B in patients with moderate-to-severe insomnia were significantly higher than those in patients with mild insomnia (p < 0.05), indicating that the SV of patients with moderate-to-severe insomnia exhibited a more pronounced blue-green hue compared to those with mild insomnia. The connection between blood stasis and tongue color is well-established in TCM theory,^55,56 where blue-green discoloration is regarded as an objective indicator of blood stasis.⁴⁷ In summary, the color of the SV in patients with insomnia demonstrated a deepening red hue and an increase in blood stasis as the severity of insomnia escalates (Figure 3).

Figure 3.

Comparison of SV features in patients with different degrees of insomnia.

Table 2.

Statistical analysis of tongue index [median (P25, P75)].

Domain	Color space	Index	Mild insomnia (n = 240)	Moderate-to-severe insomnia (n = 354)
SV	RGB	SV-R	70.00 (58.00–83.00)	75.00 (61.00–85.00)*
		SV-G	42.00 (36.00–51.00)	45.00 (38.00–52.00)*
		SV-B	43.00 (36.00–56.00)	45.00 (39.00–55.00)*
	Lab	SV-L	19.33 (15.77–23.93)	21.13 (17.05–25.12)*
		SV-a	11.64 (9.00–14.68)	12.84 (10.12–15.56)*
		SV-b	3.93 (2.25–5.45)	4.51 (2.57–6.02)*
		SV-Y	47.38 (39.65–57.84)	51.37 (42.60–60.50)*
	YCrCb	SV-Cr	124.78 (123.56–125.96)	124.28 (123.11–125.67)*
		SV-Cb	139.88 (136.80–143.08)	141.08 (138.10–144.26)*
TB	RGB	TB-R	157.00 (150.00–164.00)	159.00 (152.00–165.25)
		TB-G	89.00 (81.00–96.00)	90.00 (84.00–97.00)
		TB-B	91.00 (83.75–97.25)	92.00 (85.00–99.00)
	Lab	TB-L	45.95 (42.75–48.51)	46.69 (43.96–48.57)
		TB-a	27.59 (25.78–29.04)	27.81 (25.92–29.98)
		TB-b	10.47 (8.47–12.69)	10.87 (8.78–13.34)
	YCrCb	TB-Y	110.51 (103.57–116.04)	111.95 (106.05–116.41)
		TB-Cr	157.03 (154.82–158.99)	157.49 (155.24 −159.72)
		TB-Cb	119.39 (117.63–120.77)	119.11 (117.23–120.56)
TC	RGB	TC-R	141.00 (124.00–153.00)	143.00 (126.00–157.00)
		TC-G	92.50 (77.00–105.00)	95.00 (78.00–108.00)
		TC-B	92.00 (78.00–107.00)	94.00 (78.00–111.00)
	Lab	TC-L	44.22 (38.06–49.44)	45.67 (38.87–50.47)
		TC-a	19.59 (18.13–21.21)	19.69 (18.14–21.11)
		TC-b	7.46 (5.76–9.59)	7.73 (5.67–9.61)
	YCrCb	TC-Y	107.24 (94.23–118.71)	110.17 (95.75–120.99)
		TC-Cr	148.90 (147.45–150.18)	149.02 (147.11–150.53)
		TC-Cb	121.48 (119.82–122.65)	121.19 (119.86–122.64)

Note: Used Mann–Whitney U test vs. mild insomnia.

*p < 0.05.

In terms of facial features, this study identified significant differences in the facial characteristics of patients with varying degrees of insomnia, particularly in the center of eyebrows, the tip of the nose, the zygomatic, the cheek, and the jaw regions (Table 3, Figure 4). Specifically, the measurements for Eb-G, Eb-L, Ns-R, Ns-G, Ns-L, Ns-Y, Zg-R, Zg-G, Zg-L, Zg-Y, Ch-R, Ch-G, Ch-B, Ch-L, Ch-Y, Ja-G, Ja-L, and Ja-Y were significantly higher in patients with moderate-to-severe insomnia compared to those with mild insomnia (p < 0.05). These findings suggested that, patients with moderate-to-severe insomnia exhibited brighter tones in the tip of the nose, cheeks, and jaw regions (Eb-L, Ns-L, Ns-Y, Zg-L, Zg-Y, Ch-L, Ch-Y, Ja-L, and Ja-Y). Furthermore, patients with moderate-to-severe insomnia displayed a blue-greenish hue in the center of eyebrows (Eb-G), the tip of nose (Ns-G), the zygomatic (Zg-G), the jaw areas (Ja-G) and the cheeks (Ch-G, Ch-B), along with increased redness in the cheeks (Ch-R). Additionally, patients with severe insomnia presented a more pronounced red coloration in the tip of the nose and zygomatic areas (Ns-R, Zg-R). This phenomenon may be attributed to the high density of capillaries in the cheeks, tip of the nose, and zygomatic regions,¹⁶ which are less likely to be interfered with by external factors such as hair and beard, leading to a greater reflection of red hues.

Figure 4.

Comparison of facial features in patients with different degrees of insomnia. (A) Differences in the center of eyebrow color values among insomniacs. (B) Differences in the tip of the nose color values among insomniacs. (C) Differences in zygomatic color values among insomniacs. (D) Differences in cheek color values among insomniacs. (E) Differences in jaw color values among insomniacs.

Table 3.

Statistical analysis of facial index [median (P25, P75)].

Domain	Color space	Index	Mild insomnia (n = 240)	Moderate-to-severe insomnia (n = 354)
Forehead	RGB	Fh-R	155.71 (145.07–168.04)	157.66 (147.24–169.48)
		Fh-G	98.96 (86.22–86.22)	98.96 (87.53–110.22)
		Fh-B	82.20 (69.49–93.23)	82.31 (69.75–92.92)
	Lab	Fh-L	47.94 (43.78–51.57)	48.28 (44.04- 52.76)
		Fh-a	22.12 (19.83–24.17)	22.12 (20.07–24.51)
		Fh-b	19.39 (16.88–21.56)	19.54 (16.76–22.83)
		Fh-Y	113.85 (104.07–121.96)	114.29 (104.66–124.06)
	YCrCb	Fh-Cr	153.97 (151.21–156.36)	153.97 (151.98–156.58)
		Fh-Cb	112.50 (110.98–114.41)	112.32 (110.07–114.36)
Eyebrow	RGB	Eb-R	172.47 (163.19–178.72)	173.39 (167.19–181.60)
		Eb-G	118.00 (106.93–127.24)	118.49 (110.86–129.50)*
		Eb-B	100.92 (89.57–110.25)	101.74 (92.22–111.41)
	Lab	Eb-L	55.21 (51.12–58.31)	55.22 (52.93–59.36)
		Eb-a	20.33 (18.27–22.41)	20.33 (18.26–22.17)
		Eb-b	18.31 (15.78–20.46)	18.31 (15.80–21.24)
	YCrCb	Eb-Y	129.98 (120.81–136.90)	129.98 (124.47–139.10)
		Eb-Cr	152.90 (150.65–155.06)	152.90 (150.61 −155.31)
		Eb-Cb	112.64 (110.95–114.55)	112.64 (110.54–114.57)
Nose	RGB	Ns-R	182.03 (173.68–188.99)	182.92 (178.19–191.40)*
		Ns-G	124.40 (114.62–133.47)	125.59 (119.18–133.77)*
		Ns-B	109.78 (99.23–118.70)	110.30 (102.30–118.45)
	Lab	Ns-L	58.06 (53.77–60.94)	58.26 (56.22–61.24)*
		Ns-a	21.59 (19.62–23.02)	21.59 (19.71–23.25)
		Ns-b	17.13 (14.90–19.76)	17.47 (14.98–20.26)
		Ns-Y	136.48 (127.06–143.24)	136.71 (132.07–143.77)*
	YCrCb	Ns-Cr	153.93 (151.89–156.09)	154.15 (152.04–156.56)
		Ns-Cb	113.44 (111.31–115.17)	113.13 (110.90–115.13)
Zygoma	RGB	Zg-R	160.43 (148.61–168.61)	161.98 (155.02–170.47)*
		Zg-G	99.85 (88.63–109.69)	100.43 (93.89–110.03)*
		Zg-B	80.38 (70.21–90.69)	82.72 (73.06–92.36)
	Lab	Zg-L	48.93 (44.74–52.46)	49.01 (46.74–52.67)*
		Zg-a	23.48 (21.42–25.43)	23.48 (21.63–25.46)
		Zg-b	20.78 (18.36–22.81)	20.91 (18.42−23.81)
		Zg-Y	115.47 (106.35–123.70)	115.93 (110.94–123.69)*
	YCrCb	Zg-Cr	155.50 (153.06–157.54)	155.62 (153.66–158.14)
		Zg-Cb	111.43 (109.92–113.36)	111.33 (109.12–113.10)
Cheek	RGB	Ch-R	174.68 (163.91–181.04)	175.36 (170.14–184.89)*
		Ch-G	114.87 (100.87–123.74)	115.37 (108.68–126.13)*
		Ch-B	97.16 (85.10–107.20)	98.73 (88.67–109.03)*
	Lab	Ch-L	54.56 (49.72–57.63)	54.75 (52.23–58.65)*
		Ch-a	23.19 (20.85–25.12)	22.93 (20.87–24.55)
		Ch-b	19.24 (16.71–21.49)	19.24 (16.37–22.06)
	YCrCb	Ch-Y	128.43 (117.59–135.16)	128.86 (122.95–137.43)*
	YCrCb	Ch-Cr	155.42 (153.15–157.73)	155.42 (153.30–157.77)
		Ch-Cb	112.05 (110.48–114.01)	112.05 (109.83–114.11)
Jaw	RGB	Ja-R	162.27 (149.14–171.58)	163.39 (156.18–175.33)
		Ja-G	100.08 (88.01–109.14)	101.29 (92.72–112.64)*
		Ja-B	85.46 (76.06–94.52)	87.21 (77.90–96.93)
	Lab	Ja-L	49.32 (44.31–52.65)	49.48 (46.61–53.57)*
		Ja-a	24.22 (22.15–26.21)	24.22 (22.03–26.27)
		Ja-b	18.43 (16.00–21.35)	18.83 (16.19 −21.88)
		Ja-Y	116.65 (105.82- 124.02)	117.18 (110.42–126.18)*
	YCrCb	Ja-Cr	155.86 (153.40–158.18)	155.86 (153.23–158.59)
		Ja-Cb	113.19 (110.92–115.11)	112.83 (110.65–114.77)

Note: Used Mann–Whitney U test vs. mild insomnia.

*p < 0.05.

The results of stepwise regression analysis

All tongue features requiring variable screening were incorporated as independent variables into a stepwise regression model. The model successfully passed the F-test, yielding an F-value of 7.528 and a p-value of 0.006, which indicates the model's validity. Through automatic identification, SV-R was ultimately selected as the sole variable (Table 4).

Table 4.

Stepwise-regression iteration process and outcomes.

The number of iterations	Term	$β$	SE	t	p-Value
1	Constant term	1.382	0.081	17.124	0.000
1	SV-R	0.003	0.001	2.744	0.006

Note: The iteration method is the stepwise approach.

The results of PCA

According to the aforementioned principle of principal component retention, PC1 and PC2 in this study (i.e., $Z_{1}$ and $Z_{2}$ ) exhibited the strongest capacity to synthesize the original indicators of face color, cumulatively retaining 76.282% of the original information with a favorable contribution rate (Table 5, Figure 5). By examining the factor loading plots and extracting the rotated component matrix, the face color indicators that adequately represented the features of the original data in this study were, in order, Ch-L, Ch-Y, Ch-R, Ns-L, Ns-Y, and Ch-G (Table 6, Figure 6).

Figure 5.

Scree plot of PCA. Note: The X-axis and Y-axis represent the components and eigenvalues, respectively. Steepness indicates that the eigenvalues are large and contain more information.

Figure 6.

Factor loading plots. Note: The X-axis and Y-axis correspond to Principal component 1 (Dim1) and Principal component 2 (Dim2), respectively.

Table 5.

Variance contribution rate of PCA.

Number of PCs	Eigenvalue	Contribution rate (%)
PC1	25.872	47.912
PC2	15.320	28.370

Table 6.

Rotated component matrix

Variables	Absolute value of factor loadings		Communalities
Variables	PC1	PC2	Communalities
Ns-G	0.913	-0.054	0.921
Ns-L	0.936	0.009	0.923
Ns-Y	0.935	-0.025	0.924
Zg-b	0.006	0.912	0.900
Zg-Cb	-0.155	-0.909	0.898
Ch-R	0.942	0.178	0.960
Ch-G	0.934	-0.051	0.947
Ch-L	0.962	0.003	0.953
Ch-Y	0.959	-0.033	0.952
Ch-Cb	0.101	-0.932	0.934
Ja-R	0.901	0.187	0.948
Ja-L	0.909	0.022	0.970
Ja-Y	0.904	-0.015	0.971

Note: Blue indicates that the absolute value of the load factor is greater than 0.9, and bold indicates that the absolute value of the variable load factor is ranked in the top 6. Rotation method: Varimax.

The results of ZINB regression

The ZINB regression model indicated that the two items, frequency of weekly insomnia and sleeplessness throughout the night, had a significant positive effect on the degree of insomnia in this study (Figure 7). Specifically, the regression coefficient for insomnia frequency was 0.050, demonstrating significance at the 0.05 level (z = 1.979, p = 0.048), with an odds ratio (OR) of 1.051. This implies that for each unit increase in the number of insomnia episodes per week, the severity of insomnia, as measured by the ISI, increased by a factor of 1.051. Additionally, the regression coefficient for sleeplessness throughout night was 0.242, also significant at the 0.05 level (z = 2.090, p = 0.037), with an OR of 1.273. This indicated that for each increase in instances of all-night sleeplessness, the ISI value increased by a factor of 1.273.

Figure 7.

Results from the negative binomial portion of the ZINB regression model for estimating the effect of nighttime behaviors on insomnia.

The results using supervised machine learning

This study primarily analyzed the contribution of TCM observational features to the classification of insomnia severity levels. Additionally, by integrating the results from previous variable analyses and screenings, the corresponding baseline information, sleep symptoms, SV features, and facial features were combined to construct various models as input variables for modeling and evaluation (Table 7). Models 1, 2, 3, 4, and 5 were developed and assessed based on the following combinations: baseline (sex, age, SAS, SDS); baseline plus sleep symptoms (sex, age, SAS, SDS, PSQI, frequency of weekly insomnia, sleeplessness throughout the night); baseline plus sleep symptoms plus SV features (sex, age, SAS, SDS, PSQI, frequency of weekly insomnia, sleeplessness throughout the night, SV-R); baseline plus sleep symptoms plus facial features (sex, age, SAS, SDS, PSQI, frequency of weekly insomnia, sleeplessness throughout the night, Ch-L, Ch-Y, Ch-R, Ns-L, Ns-Y, Ch-G); and baseline plus sleep symptoms plus SV features plus facial features (sex, age, SAS, SDS, PSQI, frequency of weekly insomnia, sleeplessness throughout the night, SV-R, Ch-L, Ch-Y, Ch-R, Ns-L, Ns-Y, Ch-G).

Table 7.

Classification results of each model based on different data sets.

Classifier	Models	Sensitivity	Specificity	F1 score	Precision	Accuracy	AUROC
Decision tree	Model 1	0.593	0.677	0.656	0.736	0.626	0.634
	Model 2	0.731	0.549	0.721	0.712	0.659	0.655
	Model 3	0.731	0.563	0.725	0.718	0.665	0.663
	Model 4	0.748	0.535	0.731	0.711	0.665	0.656
	Model 5	0.750	0.535	0.730	0.711	0.665	0.656
Logistic regression	Model 1	0.657	0.662	0.700	0.747	0.659	0.712
	Model 2	0.796	0.648	0.785	0.775	0.737	0.805
	Model 3	0.778	0.662	0.778	0.778	0.732	0.809
	Model 4	0.796	0.676	0.793	0.789	0.749	0.820
	Model 5	0.796	0.662	0.789	0.782	0.743	0.807
Naive Bayes	Model 1	0.574	0.746	0.660	0.775	0.642	0.708
	Model 2	0.620	0.775	0.702	0.807	0.682	0.785
	Model 3	0.611	0.761	0.691	0.795	0.670	0.793
	Model 4	0.657	0.775	0.728	0.816	0.704	0.812
	Model 5	0.639	0.718	0.701	0.775	0.670	0.803
Neural network	Model 1	0.426	0.803	0.548	0.767	0.575	0.644
	Model 2	0.778	0.648	0.774	0.771	0.726	0.781
	Model 3	0.778	0.662	0.778	0.778	0.732	0.809
	Model 4	0.722	0.718	0.757	0.796	0.721	0.727
	Model 5	0.639	0.761	0.711	0.802	0.687	0.800
Random forest	Model 1	0.611	0.507	0.632	0.653	0.570	0.581
	Model 2	0.778	0.592	0.760	0.743	0.704	0.759
	Model 3	0.759	0.720	0.756	0.752	0.704	0.769
	Model 4	0.759	0.676	0.770	0.781	0.726	0.804
	Model 5	0.759	0.662	0.766	0.774	0.721	0.787
SVM	Model 1	0.639	0.690	0.693	0.758	0.659	0.712
	Model 2	0.787	0.662	0.783	0.780	0.737	0.808
	Model 3	0.806	0.662	0.795	0.784	0.749	0.813
	Model 4	0.778	0.676	0.781	0.785	0.737	0.822
	Model 5	0.796	0.676	0.793	0.789	0.749	0.807

Note: Model 1, “baseline,” Model 2, “baseline plus sleep symptoms,” Model 3, “baseline plus sleep symptoms plus SV features,” Model 4, “baseline plus sleep symptoms plus facial features,” Model 5, “baseline plus sleep symptoms plus SV features plus facial features.”

The results indicated that all supervised machine learning methods demonstrated improved performance when utilizing composite data compared to baseline data alone. In model 2, which incorporated baseline information and sleep symptoms, the SVM classification achieved the highest performance, with an accuracy of 0.737 and an AUC of 0.808. Additionally, the logistic regression also exhibited strong performance, achieving an accuracy of 0.737 and an AUC of 0.805. In model 3, which combined baseline data, sleep symptoms, and SV features, the SVM again performed the best, with an accuracy of 0.749 and an AUC of 0.813. In model 4, which included baseline data, sleep symptoms, and facial features, yielded commendable results from both logistic regression and SVM classifications, with accuracies of 0.749 and 0.737, respectively, and AUCs of 0.820 and 0.822. In model 5, which integrated baseline data, sleep symptoms, SV features and facial features, the logistic regression and SVM perform well with accuracies of 0.743 and 0.749, respectively, while the AUCs for both were 0.807.

Furthermore, radar plots comparing the performance of five models under different classifiers (Figures 8–12) assessed six performance dimensions: Sensitivity, specificity, F1 score, precision, accuracy, and AUC. These assessments revealed that model 4 exhibited superior overall performance, underscoring the significant value of incorporating baseline data, sleep symptoms, and facial features in evaluating the severity of insomnia in patients. The inclusion of patients’ sleep performance, along with their objectified and digitized facial TCM observational features in the baseline information, markedly enhanced the effectiveness of the insomnia classification models.

Figure 8.

Comparison for performance evaluation of different classifiers in model 1 based on the baseline. Note: (A) displays the six ROC curves for model 1. The proximity of each curve to the upper left corner signifies the superior performance of the machine learning method. (B) illustrates the performance of model 1 across various metrics: Sensitivity, specificity, F1 score, precision, accuracy, and recall. Each metric is represented along an axis radiating outward from the center, with a larger area of radiation indicating better overall performance of model 1.

Figure 9.

Comparison for performance evaluation of different classifiers in model 2 based on baseline plus sleep symptoms. Note: (A) displays the six ROC curves for model 2. The proximity of each curve to the upper left corner signifies the superior performance of the machine learning method. (B) illustrates the performance of model 2 across various metrics: Sensitivity, specificity, F1 score, precision, accuracy, and recall. Each metric is represented along an axis radiating outward from the center, with a larger area of radiation indicating better overall performance of model 2.

Figure 10.

Comparison for performance evaluation of different classifiers in model 3 based on baseline plus sleep symptoms plus SV features. Note: (A) displays the six ROC curves for model 3. The proximity of each curve to the upper left corner signifies the superior performance of the machine learning method. (B) illustrates the performance of model 3 across various metrics: Sensitivity, specificity, F1 score, precision, accuracy, and recall. Each metric is represented along an axis radiating outward from the center, with a larger area of radiation indicating better overall performance of model 3.

Figure 11.

Comparison for performance evaluation of different classifiers in model 4 based on baseline plus sleep symptoms plus facial features. Note: (A) displays the six ROC curves for model 4. The proximity of each curve to the upper left corner signifies the superior performance of the machine learning method. (B) illustrates the performance of model 4 across various metrics: Sensitivity, specificity, F1 score, precision, accuracy, and recall. Each metric is represented along an axis radiating outward from the center, with a larger area of radiation indicating better overall performance of model 4.

Figure 12.

Comparison for performance evaluation of different classifiers in model 5 based on baseline plus sleep symptoms plus SV features plus facial features. Note: (A) displays the six ROC curves for model 5. The proximity of each curve to the upper left corner signifies the superior performance of the machine learning method. (B) illustrates the performance of model 5 across various metrics: Sensitivity, specificity, F1 score, precision, accuracy, and recall. Each metric is represented along an axis radiating outward from the center, with a larger area of radiation indicating better overall performance of model 5.

Visualization of model 4

Visualization for model 4 was conducted using the DCA and SHAP explainers. The DCA indicated that model 4 exhibited significant clinical utility (Figure 13). By comparing the net benefit values of the classifiers in model 4 at critical threshold points (Table 8), it was observed that the random forest classifier consistently maintained the highest net benefit across the entire threshold interval, showcasing excellent generalization capability. Specifically, its net benefit value at the clinically significant threshold of 0.1 was 0.531, indicating that for every 100 patients, 8.1 unnecessary treatments could be avoided compared to all other treatment strategies. Consequently, the random forest is recommended as a core method for clinical decision support systems. Furthermore, the confusion matrix (Supplementary Material 1) visually illustrates the performance of the classification models.

Figure 13.

DCA curves of model 4.

Table 8.

Net benefit values at key threshold points across classifiers (treatment strategies) for model 4.

Threshold probability	Decision tree	Logistic regression	Naive Bayes	Neural network	Random forest	SVM	All treatment strategies
0.1	0.498	0.512	0.505	0.519	0.531	0.524	0.450
0.2	0.405	0.423	0.412	0.428	0.447	0.438	0.400
0.3	0.328	0.352	0.335	0.350	0.381	0.369	0.300

With the increase in the number of variables in model 4, we utilized the SHAP to visually present the selected variables and illustrated the positive and negative impacts of each feature on a given sample. Figure 14 displayed the absolute values of the average SHAP values for various features (Figure 14). Among the variables exerted the greatest influence on the severity of insomnia, the effects of Ch-Y and CH-G were notably prominent, compared to the conventional PSQI and SAS, followed by Ch-R, as illustrated in Figure 14(B) and (D).

Figure 14.

SHAP explanations of model 4 with different classifiers. Note: SHAP explanations, red color representing higher values of the covariate, while blue representing lower values of the covariate. Covariates ordered according to the Gain statistic. The X-axis represents the contribution of the feature to the prediction of a single sample, and the Y-axis represents the feature name.

The results of Platt scaling calibration

The results of Platt scaling calibration indicated that model 4 exhibited significant improvement post-calibration (Figure 15). The O/E was 1.038, and the slope was 1.026, suggested that the predicted probabilities from model 4 closely aligned with the actual probabilities of occurrence. Furthermore, the raw scores approached the true probability distribution. Both the ICI and the ECE were reduced following calibration, demonstrating a marked enhancement in global calibration effects and a significant reduction in local errors.

Figure 15.

Results of Platt scaling calibration. Note: The X-axis represents different models, while the Y-axis denotes the values of the evaluation index for the calibration effect. The values displayed on the nested histograms correspond to the evaluated calibration effect for the calibrated models.

Discussion

In the diagnosis and assessment of chronic insomnia, modern medicine faces challenges in popularizing objective assessment methods such as polysomnography due to their high costs and operational complexity.⁵ Consequently, the field was predominantly guided by patients’ self-reported symptoms, including difficulties in falling or maintaining sleep, daytime fatigue, and evaluations using related scales. This reliance on subjective information could compromise accuracy. Notably, the misalignment between subjective experiences and objective symptoms was a common issue among insomnia patients. This gap may be addressed through the application of modern TCM diagnostic methods. With advancements in image processing technology, artificial intelligence, and other computer sciences, the diagnostic information collection process—encompassing factors such as light source illumination, color temperature, color rendering index, and color reproduction—has matured. Digital image analysis techniques could now meet the data requirements of TCM, thereby overcoming the limitations of subjective evaluation and addressing the challenges associated with visual reproduction in TCM diagnosis.^57,58

Chinese medicine posited that the pathology of chronic insomnia primarily resides in the heart.⁵⁹ The tongue, regarded as a reflectiton of heart function—often described as “the seedling of the heart”—provided significant insights for the diagnosis and treatment of chronic insomnia. Facial diagnosis, a method that involved observing alterations in the color and luster of facial skin, served as a valuable diagnostic tool.⁶⁰ This approach provided intuitive and comprehensive information that assists practitioners in making assessments. It may integrate TCM smart diagnostic technology with mobile and home healthcare, aiming to achieve low-cost, full-cycle, non-invasive management of chronic insomnia in the future.

This study examined the tongue and facial features of individuals with insomnia, revealing that as the severity of insomnia increased, the redness and blue-green hues of the SV also intensified. This consistent change indicated that the tongue may serve as a reflective surface, providing real-time feedback on the progression of insomnia and offering an objective basis for the auxiliary diagnosis of chronic insomnia. Patients with moderate-to-severe insomnia exhibited a blue-greenish hue in the Eb-G, Ns-G, Zg-G, Ja-G, Ch-G, and Ch-B, whereas the redness of the Ch-R increased. These observations aligned with TCM theory regarding “Damp-heat” associated with insomnia,⁶ which posits that individuals suffering from insomnia may display a greenish or reddish hue on their faces.⁶¹ Furthermore, the color of the SV in patients with moderate-to-severe insomnia aligns with the typical TCM Zheng, which is a combination of signs and symptoms summarizing the disease process at a certain stage.^62,63 In this stage, the tongue and face may appear greenish-purple, indicating blood stasis, while the reddish hue of the cheeks and zygomatic areas suggests yin deficiency.

In addition, this study employed three feature selection methods. For tongue characteristics, we utilized stepwise-regression analysis to automatically identify the tongue characteristics—SV-R—that significantly influence insomnia severity. For facial complexion features, TCM categorizes the face into specific regions for observational purposes; however, this approach has the disadvantage of a complicated number of facial diagnostic indices. To address this issue, we employed PCA to identify the core indices—Ch-L, Ch-Y, Ch-R, Ns-L, Ns-Y, and Ch-G—that most effectively represent the original features of facial color. Additionally, for nighttime behaviors in insomnia patients, we utilized a ZINB regression model to identify key clinical indicators of sleep conditions, including the frequency of weekly insomnia and sleeplessness throughout the night.

Studies of nighttime behaviors of chronic insomnia had yielded intriguing results. Utilizing ZINB regression, we found that the frequency of weekly insomnia and sleeplessness throughout the night significantly positively affected insomnia, even more so than difficulties in falling asleep and dreamy. Most previous epidemiological studies had focused on the association between sleep duration and health, in fact, irregular sleep was often linked to an increased risk of various diseases,⁶⁴ and the importance of sleep regularity as a health indicator was growing.⁹ The present results suggested that public health campaigns should emphasize and communicate the significance of sleep regularity, rather than solely concentrating on nighttime sleep quality.

The application of machine learning methods for disease prediction and health state classification using data derived from the four diagnoses of TCM has been extensively documented. In facial diagnosis applications, Lin Ang et al.¹³ employed multivariate selection methods, including stepwise Akaike information criterion (AIC) and least absolute shrinkage and selection operator (LASSO), to construct a hypertension prediction model based on facial CIELAB color-space features. In tongue diagnosis applications, binary logistic regression and neural networks were utilized to develop a hypertension prediction model using digital characteristics of the SV, identifying thickened SV width as an independent hypertension risk factor.⁶⁵ Additionally, stacking ensemble learning and ResNet50 were applied to build a diabetes risk prediction model that incorporated TCM tongue features.⁶⁶ In integrated tongue-pulse diagnosis applications, Shi et al.⁶⁷ implemented four machine learning methods—logistic regression, support vector machine, random forest, and neural network—to establish a fatigue state classification model using combined tongue and pulse data, achieving early differential diagnosis between disease-related and non-disease-related fatigue. In this study, we analyzed the tongue, facial features, and nocturnal performance of insomnia patients to construct the chronic insomnia classification models utilizing several feature fusion through supervised machine learning. Our findings indicated that all machine learning approaches demonstrated superior performance when utilizing composite data compared to those based solely on baseline data. Notably, model 4, which incorporated baseline along with sleep symptoms and facial features, exhibited the most comprehensive performance. This suggested that integrating patients’ sleep performance and their digitized and objectified facial features into the baseline information could significantly enhance the insomnia classification models.

In conclusion, this study employed modern TCM diagnostic equipment and intelligent analysis techniques to investigate the specific facial and tongue features associated with chronic insomnia, focusing on convenient and non-invasive diagnostic methods. Subsequently, a classification model for chronic insomnia was developed through the integration of multiple features, enabling a more accurate assessment of insomnia severity. This model held promise for the early detection of insomnia patients, thereby facilitating advancements in therapeutic interventions.

Limitations

This study had several limitations. First, the impact of coronavirus disease 2019 (COVID-19) on individuals’ sleep cannot be overlooked,^68,69 and the progression of COVID-19 among the subjects in this study has not been thoroughly examined. Second, all patients were first-time attendees at the Department of Psychiatry and Rehabilitation and had not previously received systematic treatment for insomnia. It cannot be excluded that some individuals may have been taking medication, which could potentially influence the study's results. Third, the parameter estimates for all models may have been underpowered. Finally, this was a single-center retrospective cohort study based in Shanghai, which affected the quality and diversity of the data, pending external validation and optimization. Future research will incorporate power analysis to ensure three criteria: (1) a global shrinkage factor of ≥0.9, (2) a small absolute difference of ≤0.05 in adjusted Nagelkerke's R², and (3) precise estimation of the overall risk in the population.⁷⁰ Additionally, it will focus on broadening the sample range, conducting multicenter studies, and integrating new technological approaches, such as 3D pulse data⁷¹ and spectral information.⁷² Furthermore, the application of multi-task supervised machine learning⁷³ techniques could enhance data utilization, capture the similarities and differences among varying degrees of chronic insomnia, and improve the accuracy of classification models. In conclusion, the development of the chronic insomnia assessment and early warning models tailored for practical clinical applications will be a primary focus of future research.

Conclusion

In conclusion, this study employed modern technology in TCM diagnosis by utilizing the standardized diagnostic equipment, the TFDA-1 tongue and face diagnosis instrument, to collect clinical information from 594 chronic insomnia patients. This data included baseline conditions, sleep performance, mood scales, and digital images of the tongue and face. Tongue- and face-related color features were intelligently analyzed by the TDAS/FDAS. The study applied stepwise regression analysis, PCA and ZINB regression for variable screening, ultimately employing various machine learning methods to construct the chronic insomnia assessment models based on the integration of TCM observation diagnostic features. This approach offered a novel perspective for the clinical diagnosis and treatment of chronic insomnia.

Supplemental Material

sj-docx-1-dhj-10.1177_20552076251393331 - Supplemental material for A new method for assessing chronic insomnia: Machine learning-based fusion of TCM observation digital features

Supplemental material, sj-docx-1-dhj-10.1177_20552076251393331 for A new method for assessing chronic insomnia: Machine learning-based fusion of TCM observation digital features by Yu Wang, Jie Chen, Qincheng Chen, Xudong Huang, Yulin Shi, Zhentao Li, Xuemin Wang, Liang Xu and Jiatuo Xu in DIGITAL HEALTH

Footnotes

List of abbreviations

Acknowledgments

The authors are especially thankful for the positive support received from the Xiangshan Traditional Chinese Medicine hospital and Shanghai Hospital of Integrated Traditional and Western Medicine.

ORCID iD

Yu Wang

Ethical approval

Consent to participate

All participants signed a written informed consent form that was approved by the Ethics Committee.

Consent for publication

Not applicable.

Contributorship

The work was carried out in collaboration between all authors. WY was involved in the conception of the idea, analyzed the data, and drafted the manuscript with CJ. XJT and XL were responsible for overall guidance. CQC, HXD, and SYL were responsible for data analysis. LZT contributed to the model optimization during the revision process, particularly in the implementation of Platt Scaling for probability calibration. WXM monitored the quality of the clinical samples. All authors read and approved the final manuscript.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by the National Key Research and Development Program of China (2017YFC1703301), the National Natural Science Foundation of China (81873235), Construct Program of the Key Discipline of State Administration of Traditional Chinese Medicine of China (ZYYZDXK-2023069), Shanghai Science and Technology Committee Rising-Star Program (24YF2746000), Research Project of Shanghai Municipal Health Commission (2024QN018), Shanghai University of Traditional Chinese Medicine Science and Technology Development Programme (23KFL005). The funder has not influenced the research in any way, and the research has been carried out independently.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability

The datasets generated and analyzed during the current study were not publicly available due to the confidentiality of the data, which was an important component of the National Key Technology Research and Development Program of the 13th Five-Year Plan (No. 2017YFC1703301) in China, but they were available from the corresponding author on reasonable request.

Supplemental material

Supplemental material for this article is available online.

References

The Lancet Diabetes E . Sleep: a neglected public health issue. Lancet Diabetes Endocrinol 2024; 12: 365.

Taheri

. Sleep and cardiometabolic health — not so strange bedfellows. Lancet Diabetes Endocrinol 2023; 11: 532–534.

Nikbakhtian

Reed

Obika

, et al. Accelerometer-derived sleep onset timing and cardiovascular disease incidence: a UK biobank cohort study. Eur Heart J Digit Health 2021; 2: 658–666.

American Academy of Sleep Medicine . International Classification of Sleep Disorders. 3rd ed. United States: American Academy of Sleep Medicine, 2014.

Liu

Chen

Yan

, et al. Monitoring vital signs and postures during sleep using WiFi signals. IEEE Internet Things J 2018; 5: 2071–2084.

Zhu

Cai

, et al. Application of machine learning models in predicting insomnia severity: An integrative approach with constitution of traditional Chinese medicine. Front Med (Lausanne) 2023; 10: 1292761.

Simon

Terhorst

Cohrdes

, et al. The predictive value of supervised machine learning models for insomnia symptoms through smartphone usage behavior. Sleep Med X 2024; 7: 100114.

Liu

Bai

Zhang

, et al. Utilizing network pharmacology and experimental validation to explore the potential molecular mechanisms of raw Pinellia ternate in treating esophageal cancer. J Gastrointest Oncol 2023; 14: 2006–2017.

Xia

Huang

Tong

, et al. Pearl powder reduces sleep disturbance stress response through regulating proteomics in a rat model of sleep deprivation. J Cell Mol Med 2020; 24: 4956–4966.

10.

Cheng

Zhang

, et al. Shumian capsule improves the sleep disorder and mental symptoms through melatonin receptors in sleep-deprived mice. Front Pharmacol 2022; 13: 925828.

11.

Yan

Shen

, et al. Comparative pharmacokinetics of six major compounds in normal and insomnia rats after oral administration of Ziziphi spinosae semen aqueous extract. J Pharm Anal 2020; 10: 385–395.

12.

Chu

Zhu

, et al. Artificial intelligence in tongue image recognition. International Journal of Software Science and Computational Intelligence 2023; 15: 1–25.

13.

Ang

Lee

Kim

, et al. Prediction of hypertension based on facial complexion. Diagnostics (Basel) 2021; 11: 540.

14.

Zhao

, et al. Qualitative and quantitative analysis for facial complexion in traditional Chinese medicine. Biomed Res Int 2014; 2014: 207589.

15.

Chen

. Determination methods for inspection of the complexion in traditional Chinese medicine: a review. Zhong Xi Yi Jie He Xue Bao 2009; 7: 701–705.

16.

Zhuo

Yang

Zhang

, et al. Human facial complexion recognition of traditional Chinese medicine based on uniform color space. Int J Pattern Recognit Artif Intell 2014; 28: 1450008.

17.

Nakajima

Minami

Tanabe

, et al. Facial color processing in the face-selective regions: an fMRI study. Hum Brain Mapp 2014; 35: 4958–4964.

18.

Sun

Nakayama

Dagdanpurev

, et al. Remote sensing of multiple vital signs using a CMOS camera-equipped infrared thermography system and its clinical application in rapidly screening patients with suspected infectious diseases. Int J Infect Dis 2017; 55: 113–117.

19.

Caruana

Niculescu-Mizil

. An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd international conference on machine learning (ICML); 2006 Jun 25–29; Pittsburgh, PA. New York, NY: Association for Computing Machinery, 2006, pp.161–168.

20.

Castronovo

Galbiati

Marelli

, et al. Validation study of the Italian version of the insomnia severity index (ISI). Neurol Sci 2016; 37: 1517–1524.

21.

Bastien

. Validation of the insomnia severity index as an outcome measure for insomnia research. Sleep Med 2001; 2: 297–307.

22.

Morin

Vallières

Ivers

. Dysfunctional beliefs and attitudes about sleep (DBAS): validation of a brief version (DBAS-16). Sleep 2007; 30: 1547–1554.

23.

Xiao

Liu

Zhang

, et al. The association between depressive symptoms and insomnia in college students in Qinghai province: the mediating effect of rumination. Front Psychiatry 2021; 12: 751411.

24.

Bertolazi

Fagondes

Hoff

, et al. Validation of the Brazilian Portuguese version of the Pittsburgh Sleep Quality Index. Sleep Med 2011; 12: 70–75.

25.

Yue

Wang

, et al. Comparison of hospital anxiety and depression scale (HADS) and Zung self-rating anxiety/depression scale (SAS/SDS) in evaluating anxiety and depression in patients with psoriatic arthritis. Dermatology 2020; 236: 170–178.

26.

Dunstan

Scott

. Norms for Zung's self-rating anxiety scale. BMC Psychiatry 2020; 20: 90.

27.

Kong

. Huangdi Neijing: a synopsis with commentaries. Hong Kong: The Chinese University of Hong Kong Press, 2010.

28.

Liu

. The development history of Chinese medical inspection diagnosis. Harbin, China: Heilongjiang University of Chinese Medicine, 2010.

29.

Deng

. The evolution of the theories of facial color diagnosis, tongue diagnosis, pulse diagnosis and ulnar skin diagnosis in the Neijing and their regularity study. Beijing, China: Beijing University of Chinese Medicine, 2015.

30.

Jiao

. Study on the key techniques of TCM color diagnosis based on “image-spectrum” joint analysis. Shanghai, China: Shanghai University of Traditional Chinese Medicine, 2019.

31.

Jiang

Yao

, et al. Tongue image quality assessment based on a deep convolutional neural network. BMC Med Inform Decis Mak 2021; 21: 147.

32.

Shi

Cui

, et al. Clinical data mining on network of symptom and index and correlation of tongue-pulse data in fatigue population. BMC Med Inform Decis Mak 2021; 21: 72.

33.

Yuan

, et al. A tongue features fusion approach to predicting prediabetes and diabetes with machine learning. J Biomed Inform 2021; 115: 103693.

34.

Wang

Lou

, et al. Relationship chains of subhealth physical examination indicators: A cross-sectional study using the PLS-SEM approach. Sci Rep 2023; 13: 13640.

35.

Borui

. Variable and model selection method in linear regression for analysis. Greek: International Conference on Signal Processing and Machine Learning, 2023.

36.

Jiao

, et al. Tongue color clustering and visual application based on 2D information. Int J Comput Assist Radiol Surg 2020; 15: 203–212.

37.

Zhou

. Machine learning. Beijing: Tsinghua University Press, 2016.

38.

Kim

. Social determinants of health in relation to firearm-related homicides in the United States: A nationwide multilevel cross-sectional study. PLoS Med 2019; 16: e1002978.

39.

Schaumberg

Reilly

Anderson

, et al. Improving prediction of eating-related behavioral outcomes with zero-sensitive regression models. Appetite 2018; 129: 252–261.

40.

UCLA Institute for Digital Research and Education . Zero-inflated negative binomial regression: SAS data analysis examples. Los Angeles, CA: UCLA Institute for Digital Research and Education, 2019.

41.

Pashaei

Ozen

Aydin

. Improving medical diagnosis reliability using Boosted C5.0 decision tree empowered by particle swarm optimization. Annu Int Conf IEEE Eng Med Biol Soc 2015; 2015: 7230–7233.

42.

Pan

Chen

. Research on diagnosis-related group grouping of inpatient medical expenditure in colorectal cancer patients based on a decision tree model. World J Clin Cases 2020; 8: 2484–2493.

43.

Checcucci

Autorino

Cacciamani

, et al. Artificial intelligence and neural networks in urology: current clinical applications. Minerva Urol Nefrol 2020; 72: 49–57.

44.

Laudicella

Comelli

Stefano

, et al. Artificial neural networks in cardiovascular diseases and its potential for clinical application in molecular imaging. Curr Radiopharm 2021; 14: 209–219.

45.

Chen

Wang

Cao

, et al. A random forest model based classification scheme for neonatal amplitude-integrated EEG. Biomed Eng Online 2014; 13: 20141211.

46.

Seo

Lee

, et al. A machine-learning approach to predict postprandial hypoglycemia. BMC Med Inform Decis Mak 2019; 19: 210.

47.

Agyapong

Miller

Wilson

, et al. Development of a proteochemometric-based support vector machine model for predicting bioactive molecules of tubulin receptors. Mol Divers 2022; 26: 2231–2242.

48.

Tang

Tong

Zheng

, et al. Using a selective ensemble support vector machine to fuse multimodal features for human action recognition. Comput Intell Neurosci 2022; 2022: 1877464.

49.

Xiang

Yuan

. Application analysis of combining BP neural network and logistic regression in human resource management system. Comput Intell Neurosci 2022; 2022: 7425815.

50.

Asafu-Adjei

Betensky

. A pairwise naive Bayes approach to Bayesian classification. Intern J Pattern Recogn Artif Intell 2015; 29: 20150728.

51.

Vickers

. Decision analysis for the evaluation of diagnostic tests, prediction models and molecular markers. Am Stat 2008; 62: 314–320.

52.

Vickers

Van Calster

Steyerberg

. Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests. Br Med J 2016; 352: 6.

53.

Vickers

van Calster

Steyerberg

. A simple, step-by-step guide to interpreting decision curve analysis. Diagn Progn Res 2019; 3: 18.

54.

Platt

. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in Large Margin Classifiers 1999; 10: 61–74.

55.

Chen

Jiang

, et al. Computational tongue color simulation in tongue diagnosis. Color Res Appl 2021; 47: 121–134.

56.

Kawanabe

Kamarudin

Ooi

, et al. Quantification of tongue colour using machine learning in Kampo medicine. Eur J Integr Med 2016; 8: 932–941.

57.

Kanawong

, et al. An automatic tongue detection and segmentation framework for computer aided tongue image analysis. Int J Funct Inform Person Med 2012; 4: 56–58.

58.

Matos

Machado

Monteiro

, et al. Can traditional Chinese medicine diagnosis be parameterized and standardized? A narrative review. Healthcare (Basel) 2021; 9: 20210207.

59.

Zhou

. Internal medicine of traditional Chinese medicine. Beijing: China Press of Traditional Chinese Medicine, 2007, pp.146–149.

60.

Liao

Wen

, et al. Convolutional herbal prescription building method from multi-scale facial features. Multimed Tools Appl 2019; 78: 35665–35688.

61.

Zhang

Huang

Wang

, et al. Characteristics of Chinese medicine inspection of identifying Yin syndrome and Yang syndrome of insomnia. China Journal of Traditional Chinese Medicine and Pharmacy 2018; 33: 2262–2264.

62.

Jia

, et al. Evidence-based ZHENG: A traditional Chinese medicine syndrome 2013. Evid Based Complement Alternat Med 2014; 2014: 484201.

63.

Shi

Zhang

, et al. A multi-objective hyper-heuristic clustering algorithm for formulas in traditional Chinese medicine. IEEE Access 2023; 11: 100355–100370.

64.

Chaput

Biswas

Ahmadi

, et al. Sleep irregularity and the incidence of type 2 diabetes: A device-based prospective study in adults. Diabetes Care 2024; 47: 2139–2145.

65.

Wang

Shi

, et al. Core characteristics of sublingual veins analysis and its relationship with hypertension. Technol Health Care 2023; 32: 1641–1656.

66.

Chen

, et al. Establishment of noninvasive diabetes risk prediction model based on tongue features and machine learning techniques. Int J Med Inform 2021; 149: 104429.

67.

Shi

Yao

, et al. A new approach of fatigue classification based on data of tongue and pulse with machine learning. Front Physiol 2021; 12: 708742.

68.

Limongi

Siviero

Trevisan

, et al. Changes in sleep quality and sleep disturbances in the general population from before to during the COVID-19 lockdown: a systematic review and meta-analysis. Front Psychiatry 2023; 14: 1166815.

69.

Salfi

Amicucci

Corigliano

, et al. Poor sleep quality, insomnia, and short sleep duration before infection predict long-term symptoms after COVID-19. Brain Behav Immun 2023; 112: 140–151.

70.

Riley

Snell

Ensor

, et al. Minimum sample size for developing a multivariable prediction model: PART II - binary and time-to-event outcomes. Stat Med 2019; 38: 1276–1296.

71.

Cao

Zhu

, et al. Artificial intelligence-enabled novel atrial fibrillation diagnosis system using 3D pulse perception flexible pressure sensor array. ACS Sens 2025; 10: 272–282.

72.

Zhang

Chun

, et al. A risk warning model for anemia based on facial visible light reflectance spectroscopy: Cross-sectional study. JMIR Med Inform 2025; 13: 64204.

73.

Caruana

. Multitask learning. Mach Learn 1997; 28: 41–75.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.54 MB