Sage Journals: Discover world-class research

Abstract

BACKGROUND:

Coronary heart disease (CHD) is the first cause of death globally. Hypertension is considered to be the most important independent risk factor for CHD. Early and accurate diagnosis of CHD in patients with hypertension can plays a significant role in reducing the risk and harm of hypertension combined with CHD.

OBJECTIVE:

To propose a non-invasive method for early diagnosis of coronary heart disease according to tongue image features with the help of machine learning techniques.

METHODS:

We collected standard tongue images and extract features by Diagnosis Analysis System (TDAS) and ResNet-50. On the basis of these tongue features, a common machine learning method is used to customize the non-invasive CHD diagnosis algorithm based on tongue image.

RESULTS:

Based on feature fusion, our algorithm has good performance. The results showed that the XGBoost model with fused features had the best performance with accuracy of 0.869, the AUC of 0.957, the AUPR of 0.961, the precision of 0.926, the recall of 0.806, and the F1-score of 0.862.

CONCLUSION:

We provide a feasible, convenient, and non-invasive method for the diagnosis and large-scale screening of CHD. Tongue image information is a possible effective marker for the diagnosis of CHD.

Keywords

Coronary heart disease machine learning hypertension early diagnosis feature fusion

1. Introduction

According to a report released in 2020 by the International Collaborative Research on the Global Burden of Disease, coronary heart disease (CHD) is the number one cause of death worldwide, seriously endangering human life and health [1]. Hypertension is recognized as the most important independent risk factor for CHD and currently affects over 1.25 billion people worldwide [2, 3]. Studies have shown that the prevalence of hypertension in patients with stable CHD is as high as 60%, and the risk of cardiovascular death is significantly increased when the two diseases coexist [4, 5]. Early screening and diagnosis of CHD in hypertension patients are essential to reduce the risk and harm caused by the comorbidity of these two diseases. However, it is still an unsolved issue to find a suitable method for extensive screening and diagnosis of CHD in patients with hypertension [6].

Currently, the conventional methods used in the clinical diagnosis of CHD include biochemical blood indicators, electrocardiogram, cardiac stress test, computed tomography angiography (CTA), and coronary angiography (CAG). Among them, CAG is considered to be the “gold standard” for the clinical diagnosis of CHD [7]. However, as an invasive examination, CAG is taken with surgical procedures which may cause inconvenience and risks for patients, and also has high examination cost. There is a dilemma in the early diagnosis of CHD at this stage: although in-depth examination is essential for accurate diagnosis, it is troublesome and causes unnecessary cost in medical resources. Therefore, there is an urgent need for a sensitive, non-invasive, convenient, and low-cost diagnostic technology to supplement diagnostic gaps in large-scale CHD screening [8].

Previous research suggests that the tongue may have diagnostic value for CHD [9]. Tongue diagnosis is an important part of traditional Chinese medicine (TCM). As a terminal human organ, the tongue closely relates to the circulatory system, but its possible predictive value for CHD has been consistently ignored. The tongue’s color, shape, and coating contain much physiological and pathological information [10]. Studies have shown that diagnosing diseases through tongue features is effective [11, 12]. Although the tongue has such crucial diagnostic value, the relationship between the changes in tongue images and CHD and the value of tongue images in the diagnosis and screening of CHD has not yet been studied. which is exactly what we want to explore.

Given that tongue imaging is a non-invasive and low-cost diagnostic tool that is well-suited for large-scale screening, we conducted a prospective clinical study to further evaluate the value and stability of tongue imaging in the diagnosis of CHD. In recent years, the continuous development of tongue image research and analysis techniques has provided a foundation for our study [13], allowing us to observe the features of the tongue and objectively describe it. At the same time, the rapid development of artificial intelligence (AI) has provided tremendous assistance for screening, diagnosing, and treating various diseases. As the core AI method, machine learning has been widely used in the medical field [14, 15, 16]. Traditional machine learning technology has good performance in mining structured feature data of tongue images, whereas deep learning is a subfield of machine learning that can mine richer semantic information. Deep learning technology can learn valuable features of the original images for classification tasks through training [17, 18]. Although the use of AI to assist in the diagnosis of cardiovascular diseases has been increasingly studied by scholars [19, 20], a tongue-based diagnostic model for CHD has never been established.

In this study, we used the TFDA-1 tongue diagnosis instrument to collect standardized tongue images and studied the non-linear relationship between the features of the tongue and CHD. Tongue Diagnosis Analysis System (TDAS) was used to extract features of the tongue image, including color, texture, and coating area. We applied ResNet-50 as the deep learning backbone to extract deep features with neural networks, and used XGBoost to achieve the two features’ fusion, which led to a customized algorithm for diagnosing CHD in patients with hypertension. This study revealed the reference value of the tongue image on CHD diagnosis and explored a new method for non-invasive diagnosis of CHD.

2. Materials and methods

2.1 Data sources

The data was from March 2019 to September 2019, a total of 360 participants with a clinical diagnosis of hypertension were recruited, including 125 patients without CHD and 235 patients with CHD (Table 1). Written informed consent was obtained from all participants prior to inclusion. Using the random seed number of 0, 80% of the data was selected as the training set and the remaining 20% was the test set.

Table 1
Basic information of participants

Item	Hypertension ( $n=$ 125)	Hypertension with CHD ( $n=$ 235)	$P$
Male (%)	51.20	50.21	0.858
Age (year)	69.273 (63.427–73.118)	69.707 (66.441–72.862)	0.016
Height (m)	162.591 (158.865–166.317)	165.000 (162.695–167.305)	0.870
Weight (kg)	60.773 (55.400–66.146)	65.488 (62.281–68.695)	0.187
BMI (kg/m ${}^{2}$ )	22.978 $\pm$ 3.134	23.912 $\pm$ 3.232	0.041
Course of hypertension (year)	9.48 (6.88–12.07)	12.5 (9.27–15.74)	0.079
SBP (mmHg)	158.318 (143.613–173.023)	145.244 (139.146–151.341)	0.007
DBP (mmHg)	90.045 (84.460–95.631)	83.268 (79.642–86.895)	0.081

According to the experimental procedure, we trained a group of professionals to take tongue images of participants in a standard posture. The tongue image acquisition equipment was TFDA-1 (Fig. 1), and the tongue images were collected 2 hours after meals. The participants were asked to sit down in front of the instrument, be relaxed, place their chin on the support platform, and straighten the tongue to make it flatten naturally. We took a total of 2 tongue images for each participant. The quality of photos was controlled by removing those with fogging, low resolution, staining of tongue coating, overexposure, and abnormal tongue shapes.

Figure 1.

The tongue diagnosis instrument. 1: lens hood, 2: ring light source, 3: camera lens, 4: mandible support plate.

The Institutional Review Board (IRB) of Shuguang Hospital affiliated to the Shanghai University of Traditional Chinese Medicine reviewed and approved the study protocol (IRB number: 2018-626-55-01). The study was registered under clinical trial registration number ChiCTR1900026008. All participants signed informed consent forms and the study was conducted in accordance with the Declaration of Helsinki. All source code and data analyzed during the current study are available from the corresponding author upon reasonable request.

2.2 Label assignment

The annotation mainly includes the diagnosis of hypertension and CHD. The diagnostic criteria of hypertension refers to the China Guidelines for the Prevention and Treatment of Hypertension [21]. The CHD diagnosis is based on patients’ CAG results which is the “gold standard.” In this study, patients with coronary stenosis by more than 50% were diagnosed with CHD. The label of hypertensive patients without CHD was marked as 0, and the label of hypertensive patients with CHD was marked as 1.

2.3 Extraction tongue features by TDAS

Table 2
Statistical analysis of tongue features

Item	Hypertension	Hypertension with CHD
Per-all	0.396 (0.342–0.464)	0.343 (0.284–0.428) ${}^{**}$
TB-CON	130.262 (103.230–147.971)	142.924 (108.718–168.835)
TB-ASM	0.057 (0.051–0.623)	0.054 (0.047–0.059) ${}^{*}$
TB-ENT	1.332 $\pm$ 0.065	1.361 $\pm$ 0.080 ${}^{**}$
TB-MEAN	0.035 (0.031–0.037)	0.037 (0.032–0.040) ${}^{*}$
TC-CON	158.379 (116.942–200.706)	182.116 (132.055–217.313) ${}^{*}$
TC-ASM	0.059 (0.049–0.061)	0.051 (0.041–0.057) ${}^{***}$
TC-ENT	1.320 (1.283–1.389)	1.381 (1.320–1.457) ${}^{**}$
TC-MEAN	0.038 $\pm$ 0.007	0.041 $\pm$ 0.008 ${}^{**}$
TB-R	163.000 (160.000–168.000)	161.000 (156.250–165.000) ${}^{**}$
TB-G	105.000 (99.000–113.000)	102.000 (95.250–107.750) ${}^{**}$
TB-B	105.906 $\pm$ 10.625	101.355 $\pm$ 9.565 ${}^{***}$
TC-R	164.000 (155.000–168.000)	159.000 (152.000–165.000) ${}^{**}$
TC-G	120.000 (113.500–129.000)	117.000 (106.000–124.750) ${}^{*}$
TC-B	120.000 (112.000–126.500)	113.000 (106.000–124.000) ${}^{**}$
TB-H	359.305 (356.281–361.628)	360.801 (357.957–362.865) ${}^{*}$
TB-I	124.812 $\pm$ 8.650	121.283 $\pm$ 7.801 ${}^{**}$
TB-S	0.169 (0.151–0.186)	0.179 (0.157–0.202) ${}^{*}$
TC-H	360.421 $\pm$ 4.492	362.274 $\pm$ 4.727 ${}^{**}$
TC-I	132.000 (127.000–141.000)	130.000 (120.250–137.000) ${}^{*}$
TC-S	0.118 (0.097–0.131)	0.129 (0.107–0.146) ${}^{*}$
TB-L	107.036 $\pm$ 3.234	105.173 $\pm$ 2.820 ${}^{***}$
TB-a	19.579 $\pm$ 3.290	19.998 $\pm$ 3.025
TB-b	6.662 (5.068–7.801)	7.775 (5.931–8.978) ${}^{**}$
TC-L	110.170 (108.964–113.404)	107.96 (105.446–111.280) ${}^{***}$
TC-a	13.372 (11.388–14.976)	13.81 (12.080–15.410)
TC-b	4.930 (3.195–6.358)	5.799 (4.348–7.043) ${}^{**}$

Note: TB is tongue body, TC is tongue coat, R represents red, G represents green, B represents blue, H represents hue, I represents intensity, S represents saturation, L represents luminance, a represents the range from red to green, b represents range from yellow to blue. Compared with the hypertension group, $P<$ 0.05 label as ${}^{*}$ , $P<$ 0.01 label as ${}^{**}$ , $P<$ 0.001 label as ${}^{***}$ .

Figure 2.

The process of extracting the color, texture and coating area features of the tongue image by TDAS.

TDAS, a self-developed tongue feature analysis system, can analyze standardized tongue images into features of medical diagnostic significance. Our study used TDAS to extract the tongue’s color, texture and coating area features. Figure 2 shows the extraction process of tongue image features by TDAS. The division-merging algorithm and chrominance threshold method were combined to separate the tongue body and tongue coating. Detailed algorithm was referred to the relevant literature [22]. We calculated the color of the tongue body and tongue coating separately and used color spaces RGB, HSI and Lab to describe color features. The method is to calculate the RGB value of each pixel point and then take the average of the total pixels. Considering the visualization of color and the feasibility and practicability of classification, we transformed RGB chroma space into Lab and HIS [23]. The obtained parameters included red (R), green (G), blue (B), hue (H), saturation (S), intensity (I), lightness (L), red-green (a) and yellow-blue (b). TDAS applied gray scale differential algorithm to describe the texture of body and coating. Texture features include indexes contrast (CON), angular second moment (ASM), entropy (ENT) and mean value [24]. The index perAll reflects the size of the coating area, and the calculation method is tongue coating area/tongue body area. Specifically, these features were TB-R, TB-G, TB-B, TC-R, TC-G, TC-B, TB-H, TB-S, TB-I, TC-H, TC-S, TC-I, TB-L, TB-a, TB-b, TC-L, TC-a, TC-b, TB-CON, TB-ASM, TB-ENT, TB-MEAN, TC-CON, TC-ASM, TC-ENT, TC-MEAN and per-All. To investigate whether there are any medically diagnostic-value feature differences between the tongue images of two groups of participants, we conducted statistical analysis on these feature indexes analyzed using TDAS (Table 2).

2.4 Data pretreatment

2.4.1 Pretreating of tongue feature indexes

We have performed pretreatment on the tongue feature indexes analyzed using TDAS. Firstly, since the tongue feature has a few missing values (less than 5%), we filled the missing values by the mean value. Then, to make the model more robust, Turkey’s test was used to detect and eliminate outliers. By Turkey’s Method, minimum is the lowest data point $({Q1-1.5\times\textit{IQR}})$ , maximum is the largest data point $({Q3+1.5\times\textit{IQR}})$ . Outliers are defined as less than the minimum or greater than the maximum, which will be replaced by mean or median based on data distribution (Eq. (1)). In addition, the numerical ranges of different features of the tongue image varied substantially, so we carried out Max-Min normalization processing on the data. Min-Max normalization refers to a linear transformation of the original data that maps values to the range [0, 1] (Eq. (2)).

$\displaystyle X>Q3+1.5\times\textit{IQR}\mathop{\cup}\nolimits X<Q1-1.5\times% \textit{IQR}$ (1)

Where $X$ denotes the original value of the features. $Q1$ denotes the lower quartile. $Q3$ denotes the upper quartile. IQR is the Interquartile range and denotes the difference between $Q3$ and $Q1$ .

$\displaystyle x^{\prime}=\frac{x-x_{\min}}{x_{\max}-x_{\min}}$ (2)

Where $x$ denotes the original data, $x^{\prime}$ denotes the Min-Max normalized data, $x_{\min}$ denotes the minimum value in the original data, and $x_{\max}$ denotes the maximum value in the original data.

2.4.2 Pretreating of tongue image

Since deep learning models are prone to overfitting in small datasets, training a well-performing model requires a large amount of tongue image data. This study used data augmentation techniques to obtain sufficient training data, thus the original 360 labeled tongue diagnosis images was increased to 21,600 for network training. To complete the data augmentation, we applied the following transformation: first, the image was horizontally flipped once. Then, it was translated by 50 pixels in both the horizontal and vertical directions. Next, the image was scaled by 50%. Finally, it was rotated by 45 degrees, 135 degrees, 225 degrees, 315 degrees, and 0 degrees once each. The original tongue image and the enhanced tongue image are shown in Fig. 3.

Figure 3.

The original tongue image Data Augmentation. (a) The original tongue image after image segmentation. (b) Tongue image rotations. (c) Tongue image translations. (d) Tongue image flips. (e) Tongue image scalings.

2.5 Statistical analysis

The statistical tools used in this study were SPSS 23 and Sklearn package in Python. When the data fit the normal distribution, we used $\bar{X}\pm SD$ for statistical description and used ANOVA to compare the differences between groups. When the data was not normally distributed, we used quartiles to describe the data and used the Kruskal-Willis H test to compare the differences between the two groups. The dimension reduction of deep features was achieved by principal component analysis (PCA), which was implemented in Python.

2.6 Deep feature extraction

We applied deep migration learning to classification tasks, which not only effectively avoided the decline in model performance caused by insufficient samples but also improved the stability and classification accuracy of the network model. When extracting deep learning from the segmented tongue image, we compared the performance of several commonly used convolutional neural networks (CNN) models, including ResNet-50, AlexNet, VGG-16, and GoogLeNet-v1. Finally, we used ResNet-50 as the backbone network for tongue image deep feature extraction in our study, for it showed better performance than other common CNN models (Fig. 4 and Table 3).

Firstly, ResNet-50 is trained based on transfer learning to improve the model’s accuracy, and the pre-trained weights are loaded on the ImageNet of the model to initialize the network. Then, using segmented and enhanced tongue images as input to a trained CNN, the backbone network extracts

Figure 4.

The structure of the ResNet-50.

convolutional feature maps through global average pooling, which are then pooled into a visual field map global features. Finally, the multi-layer perceptron outputs a 2-dimensional vector, and the softmax function is used to output the probability of a diagnosis of CHD. The objective function of this method is the cross-entropy loss function, and the network uses the backpropagation algorithm with small batch data to update the model parameters.

Table 3

The performance comparison of different CNN models on the test set

Algorithm	Accuracy	AUC	AUPR	Precision	Recall	F-1score
ResNet-50	0.833	0.918	0.918	0.843	0.915	0.878
AlexNet	0.805	0.877	0.914	0.830	0.867	0.848
VGG-16	0.819	0.840	0.862	0.840	0.894	0.866
GoogleNet-v1	0.792	0.862	0.906	0.848	0.830	0.839

2.7 Implement of diagnostic model for CHD

2.7.1 Algorithm selection

We applied Decision Tree (DT), Random Forest (RF), K-NearestNeighbor (KNN), Logistic Regression (LR), Support Vector Machine (SVM), Artificial Neutral Network (ANN), and XGBoost as algorithms to achieve feature fusion and evaluate their performance. These seven algorithms are machine learning models widely used in medical diagnosis classification tasks. DT uses a tree-like structure to make predictions. It recursively partitions the data based on selected features and assigns a label to each tree branch. RF makes predictions by building and combining multiple decision trees. KNN predicts the category of a test sample based on the categories of the k nearest training samples to the test sample. LR is used for binary classification problems, optimising the likelihood function to minimize the error and using a logistic function to transform probabilities. SVM maps the data into a high-dimensional space and finds the best separating hyperplane, called the decision boundary, based on the support vectors. ANN is a computational model inspired by the structure and function of biological neural networks. It consists of interconnected nodes and learning algorithms that adjust the connection strengths to solve complex problems. XGBoost is an ensemble learning method that combines multiple weak learners and iteratively enhances them to create a powerful machine learning model. It belongs to the category of gradient boosting algorithms, which aim to minimize a predefined loss function by fitting multiple weak learners to the residuals of previous learners. We employed a random search strategy to optimize the model parameters in model implement. Among the seven algorithms, XGBoost performed best (Table 4), so it was chosen as the final model.

Table 4
The performance comparison of different features fusion algorithm on the test set

Algorithm	Accuracy	AUC	AUPR	Precision	Recall	F-1score
DT	0.745	0.746	0.841	0.795	0.897	0.843
KNN	0.711	0.695	0.841	0.754	0.891	0.817
LR	0.750	0.749	0.843	0.781	0.909	0.840
SVM	0.792	0.869	0.913	0.788	0.911	0.845
ANN	0.833	0.889	0.922	0.837	0.911	0.872
RF	0.804	0.811	0.939	0.914	0.821	0.865
XGBoost	0.869	0.957	0.961	0.926	0.806	0.862

2.7.2 Feature importance analysis

We applied the XGBoost to calculate each feature’s importance. The importance of a feature was calculated by counting the number of times this feature used as a basis for partitioning in all base classifiers. This study has 27 features related to color, texture, and tongue coating area, and 2048 deep features extracted based on ResNet-50. The deep feature has too many dimensions, which is not practical for data fusion directly. Thus, PCA was used to reduce the dimensions of deep features before performing feature fusion.

To be specific, we first calculated the feature importance according to the input of TDAS features. At this time, the most important 9 features are: TC-L $>$ TB-ENT $>$ TB-L $>$ TB-G $>$ TC-ASM $>$ TB-ASM $>$ TB-H $>$ Per-all $>$ TC-I (Fig. 5). We first used PCA to reduce the deep feature to 30 dimensions. Then the tongue color, texture, and coating area features are fused with deep features after dimensionality reduction. We used the fusion features as input to calculate the importance of features, and the most important 9 features are: DF-PC22 $>$ DF-PC22 $>$ TC-ASM $>$ DF-PC27 $>$ TB-B $>$ TC-b $>$ DF-PC30 $>$ DF-PC12 $>$ TC-CON (Fig. 6).

Figure 5.

Features importance calculated by the XGBoost model with input tongue features extract by TDAS.

Figure 6.

Features importance calculated by the XGBoost model with input fusion features.

2.7.3 Model development based on feature fusion

In this study, TDAS was used to extract tongue features, including color, texture, and coating area, ResNet-50 was used to extract deep features, and the XGBoost was used to fuse the features. The 27 medically significant features extracted by TDAS were fused with 30 deep features after dimensionality reduction, which were used as model input to achieve XGBoost model training. The fusion of deep features and TDAS features is the key to establishing a superior CHD diagnostic model (Fig. 7).

Figure 7.

Flowchart of tongue image feature fusion strategy.

2.8 Performance evaluation standard

$\displaystyle\textit{Accuracy}=\frac{TP+TN}{TP+FN+FP+TN}$ (3) $\displaystyle\textit{Precision}=\frac{TP}{TP+FP}$ (4) $\displaystyle\textit{Recall}=\frac{TP}{TP+FN}$ (5) $\displaystyle F1=2\times\frac{\textit{Precison}\times\textit{Recall}}{\textit{% Precision}+\textit{Recall}}$ (6)

We used accuracy, AUC, AUPR, precision, recall, and F1-Score to evaluate the model performance. AUC can be used to comprehensively evaluate a model’s sensitivity and specificity. AUPR can be used to comprehensively describe a model’s accuracy and recall rate. In addition, we performed a 10-fold cross-validation on the training set to evaluate the model’s effectiveness.

The cases can be divided into four categories according to their actual types and the predicted categories: True Postive (TP), False Positive (FP), False Negative (FN), and True Negative (TN).

3. Results

3.1 CHD tongue features

As shown in Table 2, there were significant differences in tongue features between the CHD group and non-CHD group for hypertensive patients. The results obtained by comparing the index Per-all indicated that patients with CHD had a thinner tongue coating. With respect to the texture index, the ASM value of CHD patients was lower, and the other index values were higher, indicating that the tongue texture was rougher in the group with CHD. By comparing the color index, it was found that the CHD group showed lower brightness for both the tongue body and the tongue coating.

3.2 Training set results

The features extracted from TDAS, deep features and fused features were used as input to construct the diagnostic model, and cross-validation was applied to evaluate the model on the training set. The accuracy, recall, and AUC of the model with TDAS features as input were 0.694, 0.640, and 0.767, respectively; and they were slightly better when using deep features as the input (accuracy $=$ 0.804, recall $=$ 0.654, and AUC $=$ 0.785). For the model with fused features as input, all the performance evaluation indicators significantly improved, with the accuracy, precision and recall reaching 0.885, 0.902, and 0.868, respectively. It shows that the overall performance of the prediction model is good (Fig. 8 and Table 5).

Table 5
Summary of cross validation result on the training set

Input features	Accuracy	AUC	AUPR	Precision	Recall	F-1score
Color & texture feature	0.694	0.767	0.782	0.727	0.640	0.673
Deep features	0.704	0.785	0.772	0.726	0.654	0.684
Fusion features	0.885	0.944	0.934	0.902	0.868	0.880

3.3 Testing set results

We also evaluated the final models’ performance on the test set. The model with color, texture, and tongue coating area features as input has the same precision as the model with deep features as input. But in terms of other evaluation indicators (accuracy, AUC, AUPR, recall, and F1-score), the latter performed slightly better. It is worth noting that, similar to the results of cross-validation, the model’s performance with fused features as input was obviously better, with AUC of 0.957, AUPR of 0.961, and Precision of 0.926. When the final model with fused feature as inputs was used for the diagnosis of CHD, the true positive rate is high (recall $=$ 0.806), the false positive rate is low (Fig. 9), and it has a good predictive ability (accuracy $=$ 0.869) (Table 6).

Table 6
The result of XGBoost algorithm on the test set

Input features	Accuracy	AUC	AUPR	Precision	Recall	F-1score
Color & texture feature	0.742	0.746	0.739	0.759	0.710	0.733
Deep features	0.759	0.806	0.788	0.759	0.846	0.800
Fusion features	0.869	0.957	0.961	0.926	0.806	0.862

Figure 8.

10-fold cross validation results of different types of features on the training set.

Figure 9.

The ROC curve of XGBoost algorithm on test set.

4. Discussion

Tongue diagnosis is an essential diagnostic approach in TCM, which contains a lot of important physiological and pathological information. TCM often observes and diagnoses patients through the tongue’s color, texture, and coat features. Traditional tongue diagnosis is based on a doctor’s judgment and is heavily influenced by personal experience, which easily leads to bias. To avoid this error, our research has established strict standards for tongue image acquisition and uniformly utilizes TDAS for feature analysis. The tongue features selected by TDAS are based entirely on the theory of tongue diagnosis in TCM. These indexes have good interpretability and can accurately reflect tongue characteristics, with high clinical diagnostic value. In our study, we used TDAS to extract 27 tongue features that could be interpreted as indexes. Compared with the two groups, there were significant differences in 24 indexes, with 16 of them being highly significant (Table 2). The differences could be interpreted as a darker red color of the tongue body, thinner and darker tongue coating, and rougher tongue texture in patients with CHD. These results indicate that there are significant differences in tongue features between hypertension patients with and without CHD. This suggests that tongue features are sensitive to the diagnosis of CHD and may serve as a new biological marker for the diagnosis of this condition.

Although the method of extracting tongue features using TDAS has many advantages, these features cannot cover all the tongue features that are meaningful for the diagnosis of CHD. In our research, we attempted to use TDAS features to implement a diagnostic model for CHD. However, the performance of this model was less than ideal, with an accuracy of 0.694 on the training set and 0.742 on the testing set. These results indicate that using only TDAS features to implement a diagnostic model requires further optimization in performance. Therefore, our study conducted further extraction of tongue features. With the explosive growth of different forms of information in various fields, it is difficult for traditional processes and technologies to further mine the practical information in the data. Data-driven deep learning has developed rapidly in recent years. It is a branch of machine learning that can effectively solve the problem of insufficient data mining [25]. ResNet-50 applied in our study is a deep residual CNN widely used in feature extraction [26]. It can learn complex features in unstructured tongue image data and discover hidden features that TDAS cannot obtain. In our research, we used Resnet-50 to extract deep features of the tongue and use it to diagnose CHD. We found that compared with TDAS features, deep features had better diagnostic ability, with an accuracy of 0.759. However, the performance of this model was still not satisfactory.

To achieve better diagnostic performance of the model, we attempted to perform feature fusion by XGBoost to combine TDAS features with deep features. XGBoost is a gradient ascending tree model that can achieve effective classification through an ensemble learning approach [27]. Due to its high computational efficiency and accuracy, it is widely used in many data mining tasks [28]. When ranking feature importance (Figs 5 and 6), the first nine indexes simultaneously include both deep features and TDAS features, which means that for tongue image information mining, deep features and tongue diagnostic features can complement each other. The feature fusion method can simultaneously play the role of medical experience and the data mining advantage of deep learning. The encouraging result is that the diagnosis model based on fused features showed stable performance and higher accuracy on the testing set (0.869), indicating that feature fusion is a valuable method for improving the diagnostic model’s performance. It should also be noted that, considering the instability of AI models, we used seven commonly used machine learning models to implement the diagnostic model. While the diagnostic performance of other models may not be as high as that of XGBoost, their accuracy rates are all above 0.7. This result indicates that tongue images can be used as a stable diagnostic tool for CHD, regardless of which AI learning model, further validating the reliability of our algorithm customization process.

Using the tongue to diagnose CHD can be considered a novel approach, and the appearance changes of the tongue caused by coronary artery stenosis may be related to abnormal blood flow [29]. As an important terminal organ rich in blood vessels, the appearance of tongue often changes when there are problems with blood circulation [30]. Therefore, diagnosing diseases using the tongue is very timely. The method of diagnosing CHD using tongue imaging only requires taking a photo of the participant’s tongue sticking out, a process that takes only a few minutes, is non-invasive, and does not incur any additional laboratory costs. This makes it a highly suitable screening method for hypertension populations with large numbers. This non-invasive CHD diagnosis method based on the tongue image is very inexpensive, which can lower the threshold for patients to receive diagnosis and enable more potential CHD patients to be diagnosed, enabling earlier detection and treatment. Moreover, this method also has the advantage of being quick and efficient, reducing medical costs while improving medical efficiency.

In summary, we have found that for hypertension patients, tongue images have significant diagnostic value for CHD screening, and a CHD diagnostic model has been implemented using tongue images. This model has the advantages of being non-invasive, inexpensive, and convenient, making it highly suitable for large-scale CHD screening, helping patients receive appropriate treatment earlier and reducing the harm caused by CHD. This study provides a new perspective for diagnosing CHD and may help other researchers engaged in non-invasive diagnosis of CHD consider using tongue images as one of the diagnostic criteria to optimize their diagnostic models.

However, there are still some shortcomings to our research. This study only included a Chinese population, and the applicability of the findings to other ethnic groups needs to be further verified. In addition, this study mainly explored the feasibility of using tongue images for non-invasive diagnosis of CHD. It did not involve other risk factors of CHD when building the diagnostic model, which may reduce the clinical significance of this method. Furthermore, although we have adopted a standardized process for tongue image acquisition, the stability of tongue features is inevitably affected by the subject’s tongue-stretching posture and habits. Finally, although we employed data augmentation techniques, this still does not constitute a large enough dataset for building a robust diagnostic model. Therefore, there is room for further optimization in this study.

5. Conclusion

Based on the study of tongue features, we summarized the tongue expression of CHD patients and customized a diagnostic algorithm, which performed well in hypertensive patients with an accuracy of 0.869 and an AUC of 0.957. This algorithm is fully based on tongue images to predict the risk of hypertension patients developing CHD, with the advantages of being non-invasive, simple, and accurate. It is very suitable for large-scale screening and risk assessment of CHD. This experimental result also provided evidence for the clinical value of the tongue in diagnosing CHD, suggesting that tongue image information is a very effective marker. Furthermore, we also demonstrated the great potential of deep learning frameworks for auxiliary disease diagnosis. As a powerful toolset, we expect deep learning to promote tongue diagnosis in the field of non-invasive disease diagnosis.

Author contributions

Mengyao Duan: Data collection, Methodology, Software, Validation, Formal analysis, Investigation, Writing original draft. Yiming Zhang: Conceptualization, Methodology, Validation, Plotting figures, Supervision. Yixing Liu: Revising draft, Methodology, Data curation. Boyan Mao: Revising draft, Validation, Investigation, Funding. Gaoyang Li: Revising draft, Methodology, Software. Dongran Han: Data collection, Conceptualization, Methodology, Investigation, Validation, Resources, Supervision, Funding. Xiaoqing Zhang: Editing, Conceptualization, Methodology, Investigation, Supervision, Funding.

Funding

The study was supported by the National Natural Science Foundation of China (grant number 12102064) and the Ministry of Science and Technology of the People’s Republic of China (grant numbers 2017FYC1703300 and 2022YFC3502300).

Footnotes

Acknowledgments

We would like to thank all participants and their family members for participating in this study.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Vos

Lim

Abbafati

Abbas

Abbasi

Abbasifard

, et al. Global burden of 369 diseases and injuries in 204 countries and territories, 1990–2019: A systematic analysis for the Global Burden of Disease Study 2019. Lancet. 2020; 396(10258): 1204-1222. doi: 10.1016/S0140-6736(20)30925-9.

Carey

Moran

Whelton

. Treatment of hypertension: A review. JAMA. 2022; 328(18): 1849-1861. doi: 10.1001/jama.2022.19590. PMID: 36346411.

Collaboration

NCDRF

. Worldwide trends in hypertension prevalence and progress in treatment and control from 1990 to 2019: A pooled analysis of 1201 population-representative studies with 104 million participants. Lancet. 2021; 398(10304): 957-80. doi: 10.1016/S0140-6736(21)01330-1.

Liu

Cao

Jin

Hua

Guo

, et al. Lipoprotein (a), hypertension, and cardiovascular outcomes: A prospective study of patients with stable coronary artery disease. Hypertens Res. 2021; 44(9): 1158-1167. doi: 10.1038/s41440-021-00668-4.

Redon

Tellez-Plaza

Orozco-Beltran

Gil-Guillen

Fernandez

Navarro-Pérez

, et al. Impact of hypertension on mortality and cardiovascular disease burden in patients with cardiovascular risk factors from a general practice setting: The ESCARVAL-risk study. J Hypertens. 2016; 34(6): 1075-1083. doi: 10.1097/HJH.0000000000000930.

Xiong

Mao

Zhao

Zhang

Tan

Liu

, et al. Plasma Exosomal S1PR5 and CARNS1 as Potential Non-invasive Screening Biomarkers of Coronary Heart Disease. Front Cardiovasc Med. 2022; 9: 845673. doi: 10.3389/fcvm.2022.845673.

Albus

Barkhausen

Fleck

Haasenritter

Lindner

Silber

, et al. The diagnosis of chronic coronary heart disease. Dtsch Arztebl Int. 2017; 114(42): 712-719. doi: 10.3238/arztebl.2017.0712.

Yao

Wang

Yan

Xie

, et al. Research Progress of Machine Learning and Deep Learning in Intelligent Diagnosis of the Coronary Atherosclerotic Heart Disease. Comput Math Method. 2022; 3016532. doi: 10.1155/2022/3016532.

Wang

Guo

Hao

Chen

, et al. Therapeutic effect in patients with coronary heart disease based on information analysis from Traditional Chinese Medicine four diagnostic methods. J Tradit Chin Med. 2014; 34(1): 34-41. doi: 10.1016/S0254-6272(14)60051-0.

10.

Sun

Wei

Zhu

Pang

Jia

Liu

, et al. Biology of the tongue coating and its value in disease diagnosis. Complement Med Res. 2018; 25(3): 191-197. doi: 10.1159/000479024.

11.

Huang

Chang

Lee

Huang

Chiang

, et al. Exploring the pivotal variables of tongue diagnosis between patients with acute ischemic stroke and health participants. J Tradit Complement Med. 2022; 12(5): 505-510. doi: 10.1016/j.jtcme.2022.04.001.

12.

Sheen

Chiang

, et al. Tongue diagnosis indices for upper gastrointestinal disorders: Protocol for a cross-sectional, case-controlled observational study. Medicine (Baltimore). 2018; 97(2): e9607. doi: 10.1097/MD.0000000000009607.

13.

Tania

Lwin

Hossain

. Advances in automated tongue diagnosis techniques. Integr Med Res. 2019; 8(1): 42-56. doi: 10.1016/j.imr.2018.03.001.

14.

Artificial intelligence for medical image processing. Technol Health Care. 2021; 29(2): 361. doi: 10.3233/THC-202658.

15.

Rabbani

Kim

GYE

Suarez

Chen

. Applications of machine learning in routine laboratory medicine: Current state and future directions. Clin Biochem. 2022; 103: 1-7. doi: 10.1016/j.clinbiochem.2022.02.011.

16.

Wang

Zhang

Tupin

Qiao

Liu,

, et al. Prediction of 3D Cardiovascular hemodynamics before and after coronary artery bypass surgery via deep learning. Commun Biol. 2021; 4(1): 99. doi: 10.1038/s42003-020-01638-1.

17.

Jang

Cho

. Applications of deep learning for the analysis of medical data. Arch Pharm Res. 2019; 42: 492-504. doi: 10.1007/s12272-019-01162-9.

18.

Liu

Zhao

. Application of convolution neural network in medical image processing. Technol Health Care. 2021; 29(2): 407-417. doi: 10.3233/THC-202657.

19.

Meng

Zhao

Dong

Pienta

Tang

, et al. Automatic extraction of coronary arteries using deep learning in invasive coronary angiograms. Technol Health Care. 2023; 1-15. doi: 10.3233/THC-230278.

20.

Gautam

Saluja

Malkawi

Rabbat

Al-Mallah

Pontone

, et al. Current and future applications of artificial intelligence in coronary artery disease. Healthcare (Basel). 2022; 10(2): 232. doi: 10.3390/healthcare10020232.

21.

Writing Group of 2018 Chinese Guidelines for the Management of Hypertension, Chinese Hypertension League, Chinese Society of Cardiology. 2018 chinese guidelines for the management of hypertension. Chinese Journal of Cardiovascular Medicine. 2019; 24: 24-56.

22.

Zhang

Zhou

. The Region Partition of Quality and Coating for Tongue image based on Color Image Segmentation Method. 2008 IEEE International Symposium on IT in Medicine and Education. 2008; pp. 817-821. doi: 10.1109/ITME.2008.4743981.

23.

Zhou

Fang

Zhang

Wang

Sun

. Computerized analysis and recognition of tongue and its coating color in tongue diagnosis. Journal of Shanghai University of Traditional Chinese Medicine. 2004; 3: 43-47. doi: 10.16306/j.1008-861x.2004.03.015.

24.

Sun

Zhang

Zhou

Bao

. Analysis and discrimination of tongue texture characteristics by difference statistics. Journal of Shanghai University of Traditional Chinese Medicine. 2003; 3: 55-58. doi: 10.16306/j.1008-861x.2003.03.017.

25.

Zhi

Qing

. Intelligent medical image feature extraction method based on improved deep learning. Technol Health Care. 2021; 29(2): 363-379. doi: 10.3233/THC-202638.

26.

Zhang

Ren

Sun

. Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016; pp. 770-778. doi: 10.1109/cvpr.2016.90.

27.

Chen

Guestrin

. Xgboost: A scalable tree boosting system. Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining. 2016; pp. 785-794. doi: 10.1145/2939672.2939785.

28.

Silva

GFS

Fagundes

Teixeira

Chiavegatto Filho

ADP

. Machine learning for hypertension prediction: A systematic review. Curr Hypertens Rep. 2022; 24(11): 523-533. doi: 10.1007/s11906-022-01212-6.

29.

Kainuma

Mitoma

Tsuji

Onozuka

Nakaguchi

Furue

. The association between objective tongue color and the static blood findings of yusho patients. Asian J Complement Altern Med. 2021; 9(3): 89-97. doi: 10.53043/2347-3894.acam90016.

30.

Skalidis

Zacharis

Hamilos

Skalidis

Anastasiou

Parthenakis

. Transient lingual ischemia complicating coronary angiography. J Invasive Cardiol. 2019; 31(3): E51.

Machine learning aided non-invasive diagnosis of coronary heart disease based on tongue features fusion

Abstract

BACKGROUND:

OBJECTIVE:

METHODS:

RESULTS:

CONCLUSION:

Keywords

1. Introduction

2. Materials and methods

2.1 Data sources

Table 1 Basic information of participants

2.3 Extraction tongue features by TDAS

Table 2 Statistical analysis of tongue features

2.4.1 Pretreating of tongue feature indexes

2.6 Deep feature extraction

2.7.1 Algorithm selection

Table 4 The performance comparison of different features fusion algorithm on the test set

3.1 CHD tongue features

3.2 Training set results

Table 5 Summary of cross validation result on the training set

Table 6 The result of XGBoost algorithm on the test set

5. Conclusion

Author contributions

Funding

Footnotes

Acknowledgments

Conflict of interest

References

Table 1
Basic information of participants

Table 2
Statistical analysis of tongue features

Table 4
The performance comparison of different features fusion algorithm on the test set

Table 5
Summary of cross validation result on the training set

Table 6
The result of XGBoost algorithm on the test set