Abstract
With the development of deep learning technology, the autonomous analytical performance of Traditional Chinese Medicine (TCM) inspections has greatly advanced in recent decades, particularly in the areas of tongue and face diagnosis. To improve the effectiveness of diagnosis and treatment in clinical practice, TCM doctors typically differentiate between TCM-based deficiency and excess based on patterns. Therefore, an accurate TCM-based deficiency and excess pattern differentiation system is required to support TCM doctors in their work, including online diagnosis and treatment, applications on major health platforms, and other situations. This study aimed to develop a TCM-based inspection characteristic extraction model based on convolutional neural networks to extract significant characteristics from the face, lips, tongue, and other areas. Based on TCM theory and the clinical expertise of doctors, mapping modules were created for TCM-based deficiency and excess. These two modules were combined to provide a thorough TCM-based deficiency and excess pattern differentiation system. The experimental results showed that the average accuracy for inspection characteristics, such as tongue body color, coating color, and coating thickness, as well as lip color reached 90% in tests on the gathered facial dataset. In addition, the average accuracy attained 81.67%.for the trained TCM-based deficiency and excess pattern differentiation system.
Keywords
Introduction
Traditional Chinese Medicine (TCM) is a holistic system that explores human physiology, as well as the diagnosis and prevention of diseases. The initial stage of TCM diagnosis is inspection.1,2 TCM doctors assess a patient's condition and internal pathological changes by observing external manifestations in various parts of the body, including the face, 3 lip, 4 and tongue.5–9
According to TCM, the face is a holographic depiction of different organs and illnesses because the Zang-fu of the human body is related to it through meridians. The internal condition of the body is reflected by changes in the appearance of the face. In TCM, facial observation is the most accessible method of evaluating human traits during inspections. The tongue body, coating color, and coating thickness, along with lip and face color are important facial characteristics. According to TCM theory, the two basic states in the development of diseases are “deficiency” and “excess,”10–13 which are essential in determining the type of illness and many healthcare fields, such as home healthcare and intelligent rehabilitation. 14
Face color is one of the most direct indicators perceived by TCM doctors during consultation, and different face colors represent different bodily states. According to the TCM theory, a bluish face primarily indicates cold patterns and qi stagnation; a red face is associated with heat patterns and often suggests heart problems; a yellow face is indicative of spleen deficiency and a damp phlegm pattern; a white face typically signifies deficiency and a cold pattern, which frequently suggests qi and blood deficiency. Finally, patients with kidney deficiency patterns are more likely to have a dark face.15–17
The lips of healthy people are usually rosy, glossy, and have normal moisture content. However, when the body is ill, the lips can show different forms and colors that correspond to deficiency or excess in human organs; this is called “lip diagnosis” in TCM.18,19 A relative deficiency of blood and qi is indicated by pale white lips. Blood stasis and qi stagnation are suggested by dusky purple lips, which frequently signal a state of excess, whereas internal heat and surplus energy in the body are indicated by crimson lips.
The tongue state can also reveal the status of deficiencies and excesses in the body. The color and shape of the tongue provide important information about a person's health. For example, patients with a deficient pattern frequently exhibit reduced tongue coating, tender and enlarged tongue, tooth marks, and white tongue coating. In contrast, patients with an excess pattern usually have a rough tongue and a thick tongue coating.20–25
The inconsistent evaluation results from different TCM doctors observing the same facial image may entail subjective and objective elements, including lighting, surroundings, and the individual experiences of the doctors. Objective diagnoses are difficult because of subjectivity. Therefore, artificial intelligence image analysis technology can help doctors differentiate patterns by automatically extracting objective facial traits.
Artificial-intelligence technology, especially deep learning, has developed rapidly over the last decade. In several domains, convolutional neural network (CNN)-based image analysis capabilities have approached or even exceeded the human performance.26–28
Image analysis of the face, lips, and tongue in TCM diagnosis has been the subject of in-depth research by numerous scholars. Chen 29 was able to extract facial characteristics using an active appearance model (AAM) and Gabor filters. Shahar 30 used a deep learning model and key points to extract facial colors. Borza 31 suggested using a fully convolutional multitasking neural network for facial color analysis. Zheng 32 used a support vector machine (SVM) to categorize lip color. Vergnaud 33 created a hierarchical ascending classification approach for color cluster analysis.
Convolutional neural network models have also been used to classify tongue color.34–36 To determine the color of the tongue coating, Zhang37,38 attempted deep learning, whereas Kamarudin 39 used the HSV color threshold method. One study 40 evaluated the tongue coating thickness using data augmentation and an image-to-image conversion model, whereas Liu41,42 utilized a convolutional neural network to classify the tongue coating thickness.
In the field of TCM-based pattern differentiation, relying solely on inspection of the face, lips, or tongue is insufficient for distinguishing between deficiency and excess patterns of the body. Therefore, a more comprehensive approach that combines the characteristics of the face, lips, and tongue is required.
This study addresses this by combining the expertise of TCM doctors in identifying deficiency and excess patterns with image analysis methods from open-source models. We designed and developed a TCM-based pattern differentiation system that incorporates facial and lip characteristics, tongue appearance, and other facial elements to conduct a more thorough assessment of patient deficiency and excess patterns. The key components of this system include the following: (1) open-source models trained on a large-scale dataset that provide prior knowledge of facial, lip, and tongue image regions, eliminating the need for manual calibration of these areas. (2) Mapping rules from several TCM doctors who contributed their expertise by mapping inspection characteristics to deficiency and excess patterns using task calibration.
The advances discussed in this paper cover three primary areas. First, an end-to-end computation was realized for five inspection characteristics: face color, lip color, tongue body color, tongue coating thickness, and tongue coating color. Second, we designed an attention-guided model using prior knowledge (AGPK) for characteristic extraction during TCM inspection, to simultaneously learn the positional information of the face, lips, and tongue. Third, we created a mapping module that links inspection characteristics to deficiency and excess patterns of the human body, drawing on TCM theory and the clinical expertise of TCM doctors.
Related technology
Convolutional neural network
Convolutional neural network models are commonly used in deep learning in the domain of image processing. These models frequently stack several convolutional layers to extract distinctive characteristics and enhance the network structure. However, training the model may become unstable if the network becomes too deep, which can result in problems such as vanishing and exploding gradients.
To overcome these obstacles, the ResNet model 26 adds a residual module with a skip connection topology, to help avoid performance deterioration and reduce the difficulty of training deep CNNs. Figure 1 displays the structure of the residual module.

Residual module of ResNet.
The mathematical expression of residual module in the ResNet model is as follows:
In this context, x represents the input feature map of the residual module; F(x) is the expected learned residual feature map; and F(x) + x is the actual output feature map of the residual module.
Multitask learning
In some situations, the model must handle a larger task consisting of several smaller tasks rather than just one task. In such situations, breaking down the original assignment into multiple smaller tasks is a typical strategy for self-directed learning. However, modeling each subtask alone can ignore the connections, limitations, and interactions among them, which is insufficient to achieve optimal results.
One novel approach is to adopt a multitask learning (MTL) strategy to learn several tasks simultaneously. This strategy has two benefits: on the one hand, it can reduce overfitting and improve the model's capacity for generalization; on the other hand, it learns the subtasks simultaneously, which can help counteract some of the noise introduced by the various subtasks and increase the robustness of the model. From the actual modeling perspective, a model can be considered to belong to the category of multitask learning if it includes multiple losses.
Proposed method
The two primary modules of the TCM-based deficiency and excess pattern differentiation system described in this paper are the TCM-based inspection characteristic extraction model and TCM-based deficiency and excess mapping module. Face, lip, tongue body, and tongue coating color, and tongue coating thickness are among the inspection characteristics that can be extracted from facial images using a TCM-based inspection characteristic extraction model. Based on TCM theory and clinical experience, the TCM-based deficiency and excess mapping module outlines the mapping rules between the set of inspection characteristics and TCM-based deficiency and excess states. Figure 2 shows the overall structure of the pattern differentiation system.

Overall structure of the TCM-based deficiency and excess pattern differentiation in this paper.
TCM-based inspection characteristic extraction model
A TCM-based inspection characteristic extraction model was designed to extract five key inspection characteristics: face color, lip color, tongue body color, tongue coating color, and tongue coating thickness. These characteristics are associated with three specific regions: face, lips, and tongue. Therefore, the model must be capable of learning the location information of these three facial regions during both the training and inference processes. Additionally, to enhance the patient experience, the inspection characteristic extraction process should be performed in real-time to minimize latency. These requirements make it essential to design an end-to-end characteristic extraction model for TCM-based inspection.
In the field of deep learning for image processing, significant research has been conducted on facial detection and segmentation of the lip region and the tongue. Scholars have published numerous well-performing open-source models in these areas. Consequently, this study utilized open-source pretrained models for face detection and segmentation of the lip and tongue, to obtain positional information on various facial parts necessary for training.
The input to the TCM-based inspection characteristic extraction model consists of two components: (1) rectangular coordinates of the face region, along with the segmentation masks for the lip and tongue regions, which were derived from open-source models; and (2) various labels annotated by the TCM doctor on the facial image, including face, lip, tongue, and tongue coating color, and tongue coating thickness. Figures 3 and 4 illustrate the training and inference processes for the TCM-based inspection-characteristic extraction model.

The training process of the characteristic extraction model for TCM-based inspection.

The inference process of the characteristic extraction model for TCM-based inspection.
To enable the model to simultaneously learn the location of different parts of the face and inspection characteristics during the training process, an attention-guided module that uses priori knowledge is proposed.
Without loss of generality, let us take face color as an example. Suppose that the input face image X∈RH0×W0×3; H0 and W0 are respectively the height and width of the face image. Face image X is input into the backbone network (ResNet18 was chosen based on computational efficiency and past experience.) to obtain the coordinates of the face area and feature map F∈RH×W×C; H, W, and C are respectively the height, width, and number of channels of the feature map F. The coordinates are converted into the mask image of the face area M∈RH0×W0; the value of the mask image M is 0 or 1, and the face area is square. The feature map F is converted into the attention graph A∈RH×W by convolution with 3 × 3 kernel and pooling with 2 × 2 kernel:
The feature map F is fused with the attention map A to generate the feature vector (v∈RC×1) for color classification.
The face color classifier receives the feature vector v as input, resulting in the output of the prediction vector p∈R K×1:
Loss-of-face color classification consists of two parts. The first part uses the mask map M to guide the learning process of the attention map and obtain the facial attention loss:
The second part involves using facial label Y, as labeled by TCM doctors, to guide the learning process of face color classification. Classification loss is expressed as follows:
The final loss of face color classification is:
Similarly, for inspection characteristics, such as tongue coating thickness and lip, tongue body, and tongue coating color, the corresponding losses can be obtained following the above process. Thus, the final loss is the sum of the losses corresponding to all the diagnostic characteristics:
It should be noted that the coefficients α and β are dynamically determined during training.
Here are a few refined notes:
During the training process, the learning of characteristics, such as face, lip, tongue body, and tongue coating color, and tongue coating thickness, shared the same feature map. However, attention maps varied for different parts of the face, each focusing on a distinct facial region. This diversity in the attention maps guides the model to concentrate on various facial parts during training. Facial and lip color characteristics belong to multiple classification tasks, with cross-entropy being the corresponding loss. Conversely, tongue characteristics, including tongue color, tongue coating color, and tongue coating thickness, are multilabel tasks, and their respective losses are binary cross-entropy losses. These open-source models were used only in the training stage of the inspection characteristic extraction model, which was not necessary for the inference stage. During the training stage, the model learned the locations of various facial parts, and the model parameters contained this information.
TCM-based deficiency and excess mapping module
Given the image of a human face, the TCM-based inspection-characteristic extraction model obtains a set of inspection characteristics. Subsequently, the TCM-based deficiency and excess mapping module is utilized to convert these characteristics into the final deficiency and excess pattern labels. The mapping rules for TCM-based deficiencies and excesses were jointly designed and discussed by multiple TCM doctors based on TCM theory and their clinical experience. The rules are listed in Table 1.
TCM-based deficiency and excess mapping rules.
A thorough weighted score was assigned to the deficiency and excess patterns associated with each individual inspection characteristic during application, depending on the match. Notably, the weights of the weighted scores were assigned by TCM doctors before determining the final deficiency and excess patterns.
Dataset and model parameter settings
Dataset
The dataset used in this study was obtained from two sources: web scraping and collection from the TCM-based deficiency and excess pattern differentiation system (a self-developed product of the Xin-Huangpu Joint Innovation Institute of Chinese Medicine). The images collected through web scraping were deduplicated, filtered for clarity, and manually selected. The total number of images was 2500. Regarding the image selection criteria, all facial images were acquired under natural lighting conditions to ensure authentic representation and were subjected to rigorous data cleansing. The images were systematically included based on their alignment with predefined diagnostic categories to enrich the content coverage of each label.
The annotation of the dataset in this study was divided into two parts: automatic annotation using open-source models and manual calibration by TCM doctors. Automated annotation primarily involves identifying the face, lip, and tongue areas. Manual annotation by TCM doctors was mainly performed by three TCM doctors to calibrate multiple inspection characteristics of the same face image, judge the labels of each inspection characteristic, and determine the final labels of each inspection characteristic according to the majority vote. Only those who received full consensus among all three doctors were incorporated into the modeling dataset, to ensure high inter-rater agreement and subsequent model stability.
As shown in Table 2, the dataset used in this study was a multitask dataset containing five inspection characteristics. During the training process, the model simultaneously acquired the relevant knowledge of all five inspection characteristics.
Label distribution of inspection characteristics after manual annotation.
Evaluation metrics and experimental parameters
Using PyTorch as the deep learning framework, Adam as the model optimizer, and binary cross-entropy as the loss function with a learning rate of 0.0001, the number of training epochs was set to 100.
In the training process of deep-learning classification models, common evaluation metrics include precision, recall, and accuracy. Precision and recall are specific to a certain category, whereas accuracy is related to all categories.
Here, TP and TN represent the numbers of correctly predicted (true) positive and negative samples, respectively, and FP and FN represent the numbers of incorrectly predicted (false) positive and negative samples, respectively. To comprehensively observe the model during the training process, this study utilized precision, recall, and accuracy.
Experiment and result analysis
TCM-based inspection characteristic extraction model
The existence of multiple losses in this study made manual adjustment of each loss very complex. Therefore, multitask loss 43 was used to automatically adjust the weight of individual losses during the training stage.
In Table 3, the accuracy of the TCM-based inspection characteristic extraction model for facial color characteristic classification is 80.23%, which is lower than that of the other characteristics. The accuracy of the lip color characteristic classification is 96.01%, which is the best performance. The accuracies for tongue color, tongue coating color, and tongue coating thickness are 89.39%, 88.94%, and 93.72%, respectively. Overall, the TCM-based inspection characteristic extraction model yielded a relatively accurate set of inspection characteristics.
Performance of the TCM-based inspection characteristic extraction model on the test set.
TCM-based deficiency and excess pattern differentiation system
The trained TCM-based inspection characteristic extraction model was combined with the designed TCM-based deficiency and excess mapping module, to obtain the final TCM-based pattern differentiation system for deficiency and excess.
To test the effectiveness of the TCM-based pattern differentiation system for deficiencies and excesses, the Xin-Huangpu Joint Innovation Institute of Chinese Medicine developed its own TCM-based pattern differentiation system. Their system was installed on a PAD in the form of an APP. Patients facial images were captured during clinic visits, and the system automatically extracted the relevant inspection characteristics and converted them into deficient and excess pattern labels. The results are shown in Figure 5 and Table 4.

TCM-based pattern differentiation system of deficiency and excess.
Effectiveness of the TCM-based pattern differentiation system of deficiency and excess.
As shown in Table 4, the TCM-based pattern differentiation system for deficiency and excess designed in this study achieved accuracies of 79% and 72% for the deficiency and mixed deficiency-excess labels, respectively, and 93% for the excess label, with an overall average accuracy of 81.67%.
Ablation experiment
The TCM-based inspection characteristic extraction model is influenced by multiple factors that mainly include three aspects: training methods, weighting of losses, and the need for a prior knowledge guidance mechanism such as AGPK. The results of the comparison are presented in Table 5.
Ablation experiment results on the test set.
Note: IT: individual training, PK: prior knowledge, FC: face color, LC: lip color, TBC: tongue body color, TCT: tongue coating thickness, TCC: tongue coating color.
Among these factors, prior knowledge played the most important role, as shown in Table 5. This makes sense because the lip and tongue areas are relatively small, and a significant amount of noise exists in the input facial images. The model cannot precisely identify the locations of the lips and tongue without the aid of prior knowledge and is thus unable to produce accurate prediction results.
Conclusion and future analysis
The pattern differentiation between deficiency and excess is an important concept in TCM theory, referring to the state of yin, yang, qi, and blood, as well as the organs within the human body. To comprehensively analyze the inspection characteristics of different facial parts, such as the face, lips, and tongue, and determine the TCM-based deficiency and excess patterns, this study first constructed a TCM-based inspection characteristic extraction model to extract characteristics such as face, lip, tongue body, and tongue coating color, and tongue coating thickness. Subsequently, the patient's TCM-based deficiency-excess pattern was obtained based on the deficiency and excess mapping module established by TCM doctors. The TCM-based deficiency and excess pattern differentiation system presented in this study holds significant importance in guiding disease diagnosis and prescribing medications.
Footnotes
Acknowledgements
This research was funded by “The Fundamental Research Funds for the Central Public Welfare Research Institutes”(YZX202406). We are grateful to Lingdong Kong for his assistance in data collection.
Ethics approval
This paper is not applicable for both human and animal studies.
Consent to publish
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by “The Fundamental Research Funds for the Central Public Welfare Research Institutes” (YZX202406). Special Project of the State Key Laboratory of Dampness Syndrome in Chinese Medicine Jointly Built by Provincial and Ministerial Levels (No. SZ2021ZZ01).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
