Abstract
Objective
To develop and evaluate a novel feature selection technique, using photoplethysmography (PPG) sensors, for enhancing the performance of deep learning models in classifying vascular access quality in hemodialysis patients.
Methods
This cross-sectional study involved creating a novel feature selection method based on SelectKBest principles, specifically designed to optimize deep learning models for PPG sensor data, in hemodialysis patients. The method effectiveness was assessed by comparing the performance of multiple deep learning models using the feature selection approach versus complete feature set. The model with the highest accuracy was then trained and tested using a 70:30 approach, respectively, with the full dataset and the SelectKBest dataset. Performance results were compared using Student’s paired t-test.
Results
Data from 398 hemodialysis patients were included. The 1-dimensional convolutional neural network (CNN1D) displayed the highest accuracy among different models. Implementation of the SelectKBest-based feature selection technique resulted in a statistically significant improvement in the CNN1D model’s performance, achieving an accuracy of 92.05% (with feature selection) versus 90.79% (with full feature set).
Conclusion
These findings suggest that the newly developed feature selection approach might aid in accurately predicting vascular access quality in hemodialysis patients. This advancement may contribute to the development of reliable diagnostic tools for identifying vascular complications, such as stenosis, potentially improving patient outcomes and their quality of life.
Keywords
Introduction
Chronic kidney disease (CKD), a widespread health problem affecting millions of people worldwide, 1 is a progressive condition that can lead to end-stage renal disease (ESRD), which requires treatment to replace kidney function, such as hemodialysis. 2 Hemodialysis involves using a machine to filter the blood and remove waste products, excess fluid, and toxins. 3 Vascular access is a crucial component of hemodialysis for patients requiring regular treatments. Before the dialysis process can begin, a surgical procedure is often performed to create a blood channel in the arm of the patient. 4 This access point, known as an arteriovenous fistula and arteriovenous graft, enables the machine to effectively filter the blood.5,6 The surgical intervention involves connecting an artery and a vein, allowing for increased blood flow and easier access during dialysis sessions. Properly functioning vascular access ensures efficient removal of waste products, excess fluid, and toxins from the patient’s bloodstream.
One major complication in hemodialysis patients is the risk of stenosis and thrombosis at the vascular access site, which may lead to reduced blood flow and potentially life-threatening complications. ‘Vascular access quality’ specifically refers to the structural and functional integrity of the access point, including the absence of stenosis or thrombosis, adequate blood flow for effective dialysis, and minimal complications. Regular monitoring of these aspects is essential to preemptively identify and prevent complications associated with vascular access in hemodialysis patients.7–9 Photoplethysmography (PPG) sensors are non-invasive devices that measure variations in blood volume within the vascular system by detecting changes in light absorption or reflection. 10 They utilize light-based techniques to assess changes in the volume of blood vessels, primarily through the detection of variations in light absorption or reflection. This enables the estimation of parameters, such as pulse rate, blood flow, and oxygen saturation without the need for invasive procedures.11,12 The reliability and real-time capabilities of PPG sensors have led to their widespread application in fields including healthcare, physiological monitoring, and biomedical research. 13 Deep learning algorithms are recognized for their capability to automatically learn and extract complex patterns from large data volumes. These advanced algorithms can be adeptly trained using annotated datasets, which include PPG signals from individuals with established vascular conditions. Such training equips them to effectively classify and discern between typical and atypical states of vascular access. Integrating PPG sensors with deep learning offers significant potential for improving vascular access evaluation accuracy, leading to timely and enhanced diagnoses, and thus, furthering advances in vascular healthcare.14–16
However, the high dimensionality of PPG sensor data presents a significant challenge for the effective application of deep learning models. 17 Addressing this issue, feature selection techniques are employed to mitigate the adverse effects of dimensionality, thereby enhancing model performance by selectively retaining features crucial for accurately classifying vascular access quality. 18 These methods, encompassing filter, wrapper, and embedded approaches, systematically evaluate the relevance and redundancy of features. By employing statistical measures, iterative optimization, and integration within the learning algorithm, they aim to streamline the dataset without sacrificing its discriminatory power.19–22 Among these techniques, SelectKBest stands out for its efficacy in isolating the most informative features by assessing their statistical relevance, using metrics such as χ2, mutual information, or analysis of variance F-value. 23 This process not only simplifies the dataset, but also facilitates efficient training and inference, reducing the computational burden while ensuring the retention of essential data properties. Furthermore, by enhancing the interpretability of deep learning models, SelectKBest aids in improving the visualization and comprehension of complex datasets.24–27
The primary aim of the present study was to develop and evaluate a novel feature selection technique tailored for preprocessing PPG signals prior to their use in deep learning models – an approach designed to enhance the predictive accuracy of these models in assessing vascular access quality in hemodialysis patients.
Patients and methods
Overview of framework
The feature selection algorithm was developed to reduce the dimensionality of the dataset used for predicting vascular access quality in hemodialysis patients (Figure 1). The dataset, derived from vascular access in hemodialysis patients, was collected from a prototype box designed to interface with three PPG sensors, providing detailed measurements and insights related to vascular access. The dataset was prepared by normalizing the data using the MinMaxScaler technique and labeled by specialized staff. To perform feature selection, the dataset was divided into two groups: the first comprised the full feature dataset with all available data, and the second comprised the feature selection dataset, derived from dimensionality reduction using the SelectKBest method. The dataset was then split into training and testing sets to train and test the algorithm. The effectiveness of multiple models in predicting vascular access quality was subsequently assessed based on the results obtained. The reporting of this study conforms to STROBE guidelines. 28

Overview of framework to develop a novel feature selection technique and evaluate its effectiveness in enhancing deep learning models for predicting vascular access quality in hemodialysis patients.
Study population and datasets
This observational study included data gathered from all hemodialysis patients who attended the hemodialysis unit at the Kidney Therapy Center, Songklanagarind Hospital, Songkhla, Thailand, between February 2021 and February 2022, comprising data from both arteriovenous fistula and arteriovenous graft types of vascular access. The representative sample size was calculated using Taro Yamane’s formula.
All patient data utilized in this research, sourced from a real-world clinical setting, were rigorously de-identified to ensure individual privacy and confidentiality. The anonymization of data was strictly adhered to in both the primary training dataset used for developing the model and in the test dataset employed for validation purposes.
All methods were performed in accordance with relevant guidelines and regulations. The study protocol was approved by the Ethics Committee of the Faculty of Medicine, Prince of Songkla University, Thailand (Certificate of Approval No. REC.63-411-14-1). Written informed consent was obtained from the study participants prior to study commencement.
Data preprocessing
Following patient selection, data were collected by placing three PPG sensors on the forearm of each patient for a 5-min session. Each sensor generated 9 000 data points per 5-min recording, as detailed in Table 1, resulting in a total of 27 000 data points for each patient record. Subsequent data preprocessing involved applying the MinMaxScaler technique for normalization, a process that scaled the feature values to a range between 0 and 1, ensuring uniformity in feature scale across the dataset and preventing numerical instability during model training.29,30 Additionally, to tackle the issue of data imbalance, which poses a risk of bias in the model’s predictions, the Synthetic Minority Oversampling Technique (SMOTE) was employed. 31 SMOTE augmented the minority class within the dataset, thereby balancing class representation and enhancing the accuracy and robustness of the model.
Sample data points from photoplethysmography sensor readings.
Label distribution
Measurements were obtained using an HD03 hemodialysis monitor (Transonic; Ithaca, NY, USA), a device widely recognized as the gold standard for assessing vascular access blood flow during hemodialysis procedures.32–34 The HD03 monitor quantitatively displays blood flow rates in ml/min, facilitating the observation of both maximum and minimum blood flow rates across a diverse patient sample.
While vascular access blood flow measurements from the Transonic HD03 monitor provide valuable data, it is important to note that these data alone do not sufficiently define the entire spectrum of vascular access quality. Thus, clinicians skilled in interpreting vascular access blood flow readings were enlisted, in order to assess potential narrowing or complications in vascular access, based on Transonic HD03 data. Subsequently, the dataset for the present study was meticulously labeled with classes corresponding to varying levels of stenosis in the vascular access of hemodialysis patients, ranging from 1 (high chance of vascular access stenosis and thrombosis) to 5 (highest vascular access quality). Combining gold standard vascular access blood flow measurements with expert clinical interpretation ensured a robust and accurate classification of the vascular access states in the present dataset (detailed in Table 2).
Classification of vascular access (VA) quality and vascular access blood flow rate.
Feature selection
Two dataset groups were employed: the full feature dataset; and the feature selection dataset, derived using the SelectKBest feature selection technique, 23 which was chosen for its efficacy in isolating the most informative dataset features. This technique operates in tandem with χ2-test to assess the relationship between each feature and the target variable, and ranks features based on χ2 scores to identify the most influential features that exhibit a strong correlation with the target variable.
The χ2-test calculates the discrepancy between observed and expected frequencies of features, assuming independence between variables, with the following formula:
In applying SelectKBest to that PPG sensor dataset, the dataset was reduced down to its most informative dimensions by selecting the top ‘K’ features based on their χ2-test scores, with a predefined cut-off value to identify the most informative features for the model. Specifically, features with χ2-test scores indicating P < 0.05 were considered statistically significant and were therefore retained for model development. The threshold of P < 0.05 was established to ensure that only features with a significant correlation to the target variable were included, thereby enhancing the model’s predictive capability.
The performance of these selected features was then validated on a separate set, with the optimal number of dimensions being those that delivered the highest performance metrics. The rationale behind this choice extends beyond mere feature reduction; it encompasses enhancing the model’s efficiency by lessening computational demands and, importantly, mitigating the risk of overfitting, a common concern in deep learning that arises when a model excessively adapts to the training data, including noise and outliers. By concentrating on the most informative and relevant features, SelectKBest helps in reducing this risk, ensuring that the model remains robust and generalizable.
Model selection
The effectiveness of multiple models, including the 1-dimensional convolutional neural network (CNN1D), 35 Random Forest, 36 Support Vector Machine (SVM) with Radial Basis Function (RBF) kernel, 37 Gradient Boosting, 38 and Multilayer Perceptron (MLP), 39 were explored. The Random Forest model is an ensemble learning method that leverages the principle of ‘wisdom of the crowd’ to make predictions by aggregating the results of individual decision trees. Random Forests are known for their robustness, ability to handle high-dimensional data, and resistance to overfitting. SVM is a robust classification algorithm that separates data points using a hyperplane in a high-dimensional space. The SVM with RBF kernel is particularly effective in handling non-linear relationships in the data by mapping input data into a higher-dimensional space, enabling better separation of classes. Gradient Boosting is an ensemble learning method that sequentially combines multiple weak prediction models, typically decision trees, building the models iteratively, with each subsequent model focusing on correcting the mistakes made by the previous models. Gradient Boosting is known for handling complex relationships and producing accurate predictions. MLP is an artificial neural network comprising multiple layers of interconnected nodes, or ‘neurons’, that can learn complex patterns and relationships in data through forward and backward propagation. MLPs have been widely used in various fields for classification and regression tasks.
A CNN1D was included due to the CNN’s demonstrated ability to detect patterns in both spatial and temporal data. 27 While CNNs are typically associated with image processing, their capacity to extract significant features from various data types, including time-series data, such as PPG signals, makes them suitable for the present study. The convolutional layers of CNN1D can discern important patterns within PPG data, crucial for accurate classification of vascular access quality. The present evaluation involved training and testing each model on both dataset variations, allowing a thorough assessment of the performance of each model under different data conditions, and ensuring a comprehensive understanding of their respective strengths and weaknesses in the context of PPG signal analysis for vascular access assessment in hemodialysis patients.
Model evaluation
The performance of the proposed deep learning model was evaluated using various metrics, including accuracy, sensitivity, specificity, F1-score, and area under the curve score. 40 These metrics provide a comprehensive understanding of the model’s ability to predict the stenosis state from blood volume. To validate model performance, a train/test split of 70:30 was employed, with 70% of the data used for training the model, and 30% retained for testing. The best combination of hyperparameters was determined based on the highest accuracy score on the test set. The model’s performance was then evaluated on the test set using the aforementioned performance metrics.
Results
A total of 398 hemodialysis patients were included in the study. The study population demonstrated a predominant male representation (246/398 [approximately 62% of the sample]). The age of patients varied from 29 to 90 years, with a mean age of 63 years, indicating that the majority of the patient sample were middle-aged to elderly. In terms of vascular access for hemodialysis, an arteriovenous graft was employed in about 39% of patients (155/398), whereas an arteriovenous fistula was more common, accounting for roughly 61% of cases (243/398). Notably, the majority of these vascular accesses (75%) were located in the left arm, with the remaining 25% in the right arm. The demographic and clinical characteristics of the study population, a cohort in Southern Thailand, are summarized in Table 3, and provide critical insight into the regional characteristics of hemodialysis patients. In addition to the 398 patients, use of SMOTE to balance the dataset resulted in 397 synthetic cases, giving a total of 795 cases in the balanced dataset.
Demographic and clinical characteristics of 398 hemodialysis patients from Southern Thailand included in the present study.
Data presented as n (%) patient prevalence, or min, max and mean age.
Arteriovenous fistula, AVF; arteriovenous graft, AVG.
The primary study aim was to develop a novel feature selection technique and evaluate its effectiveness in enhancing deep learning models. The performance of models integrated with the feature selection approach was compared with those using the complete feature set.
The accuracies achieved using the test set by different models with varying K values are presented in Table 4. Among the models utilizing the SelectKBest feature selection technique, the CNN1D model demonstrated the highest accuracy at 0.9288 when trained on a reduced dataset of 14 500 components. Comparatively, the Random Forest model achieved an accuracy of 0.8661 with 20 000 components, the SVM model with RBF kernel attained an accuracy of 0.7782 with 22 000 components, the Gradient Boosting model achieved an accuracy of 0.8368 with 10 000 components, and the MLP model achieved an accuracy of 0.8702 with 5 000 components. These findings emphasize the importance of feature selection in optimizing model performance, and indicate that a reduced feature set of 14 500 components can yield highly accurate predictions when classifying vascular access quality in hemodialysis patients using PPG sensor data.
Performance comparison of feature selection algorithms in predicting vascular access quality in 30% of the balanced dataset of hemodialysis patients.
CNN1D; 1-dimensional convolutional neural network; SVM, Support Vector Machine; RBF, Radial Basis Function; MLP, Multilayer Perceptron.
*Highest accuracy achieved (CNN1D) or lowest number of features used (MLP).
To evaluate performance, the CNN1D model was trained on 70% of the dataset using the selected 14 500 components, and tested on the remaining 30% of the dataset. Table 5 presents the performance of various algorithms, with and without feature selection, including accuracy, sensitivity, specificity, precision, and F-measure.
Comparison of performance of various algorithms in predicting vascular access quality using full feature dataset versus selected feature dataset, assessed using the test set (30% of the data).
Data presented as mean ± SD.
CNN1D; 1-dimensional convolutional neural network; SVM, Support Vector Machine; RBF, Radial Basis Function; MLP, Multilayer Perceptron.
P < 0.05, SelectKBest feature selection versus full feature set.
Using the full feature data test set (30% of the data without feature selection), the CNN1D model achieved an accuracy of 0.9079, sensitivity of 1.0000, specificity of 0.8738, precision of 0.9009, and F-measure of 0.9001. In comparison, the Random Forest, SVM with RBF, Gradient Boosting, and MLP models exhibited lower performance metrics in this setting.
When feature selection was applied using SelectKBest, the performance of the CNN1D model significantly improved. With the selected features, the CNN1D model achieved an accuracy of 0.9205 (the highest among the SelectKBest models), sensitivity of 1.0000, specificity of 0.9006, precision of 0.9267, and F-measure of 0.9209. These results demonstrate that the combination of SelectKBest feature selection and the CNN1D model provided the best performance in predicting vascular access quality in hemodialysis patients, with a statistically significant increase in accuracy in the feature selection dataset versus full feature dataset (P = 0.041 [Student’s paired t-test]; Table 6). The model’s performance on the testing set further confirms its good generalization ability, which is crucial for practical applications.
Comparison of CNN1D accuracy in predicting vascular access quality using the full feature dataset versus feature selection dataset.
CNN1D; 1-dimensional convolutional neural network.
Statistically significant difference at P < 0.05 (Student’s paired t-test).
By running the CNN1D model five distinct times, the performance of the model was shown to be stable and consistent across five distinct iterations (Figure 2). The recorded accuracy values (± SD) for these iterations are as follows: 0.9247 ± 0.09, 0.9331 ± 0.09, 0.9289 ± 0.10, 0.8912 ±0.03 and 0.9289 ± 0.04. Significantly, during its second iteration, the model achieved an accuracy of 0.9331. This consistently high accuracy across multiple iterations underlines the robustness and reliability of the feature selection technique in tandem with the CNN1D model. Together, they provide a trustworthy tool for precisely predicting vascular access quality in hemodialysis patients.

Accuracy of the 1-dimensional convolutional neural network model across five distinct iterations (successive runs of the model using a distinct split of data from the whole set while maintaining the 70:30 ratio for training and test sets) in predicting vascular access quality in hemodialysis patients. Data presented as mean ± SD.
To further evaluate the effectiveness of the feature selection method, a classification confusion matrix of the CNN1D model for both the full feature dataset and the dataset refined through feature selection was analyzed (Figure 3). The red-highlighted confusion matrix corresponding to the full feature dataset revealed specific misclassifications in predicting vascular access quality. Notably, two class 2 samples were misclassified as class 4 and 14 class 2 samples were misclassified as class 3, while 10 class 3 samples were misclassified as class 2. Misclassifications were also observed across other classes.

Comparison of confusion matrix for classifying vascular access quality between the full feature dataset (left red-highlighted confusion matrix) versus the feature selection dataset (right blue-highlighted confusion matrix) using the 1-dimensional convolutional neural network model. Data comprises the test set (30% of the full balanced dataset of 795 cases).
In contrast, the blue-highlighted confusion matrix obtained with the dataset resulting from feature selection demonstrated improved classification performance. The number of misclassifications decreased, indicating a more accurate prediction of vascular access quality. The misclassifications observed in the full feature dataset, such as class 2 being misclassified as class 3, were reduced.
These data emphasize the value of the feature selection technique combined with the CNN1D model, resulting in more accurate predictions of vascular access quality in hemodialysis patients. However, further research is essential to investigate the generalizability of these findings to other datasets and models.
Discussion
In the present study, we have proposed a novel feature selection technique and its integration with deep learning models for improved analysis of PPG signals in hemodialysis patients. While the primary aim was to enhance model accuracy and performance, interpretability of the model’s predictions remains a critical aspect, especially in the medical field, and incorporation of the SelectKBest feature selection technique plays a significant role in enhancing interpretability of the deep learning model. By selecting the most informative features from PPG data, both the model’s accuracy and explainability were improved. Healthcare professionals benefit from this approach as it highlights the key features influencing the model’s predictions, aligning the outputs with clinical insights and expertise.
During model development, considerable emphasis was placed on addressing overfitting, a well-known issue in deep learning, where models show high accuracy on training data but underperform with data they haven’t seen before. 41 To tackle this, a smaller subset of features identified by the SelectKBest technique was strategically utilized, effectively reducing the model’s complexity. This step is crucial as it minimizes the model’s tendency to learn irrelevant details from the training data, which can hinder its performance on new data. Furthermore, to strengthen the model against overfitting, we integrated dropout layers and regularization methods within the neural network architecture. Dropout layers work by randomly disabling some neurons during training, which prevents the model from becoming overly dependent on specific features of the training data. 42 Regularization acts as a safeguard, penalizing the model for excessive complexity and thus deterring it from fitting too closely to the training data. To verify the model’s applicability and reliability beyond the training set, extensive testing was conducted on an independent test dataset to validated the model. This rigorous validation process was essential to confirm that the model can generalize its learning effectively and maintain robust performance across different scenarios. Sample size was calculated using Taro Yamane’s formula, a recognized method for determining a representative sample size from a known total population. This calculation was instrumental in ensuring that the study sample accurately reflects the broader hemodialysis patient population in Thailand, lending credibility and generalizability to the findings.
Despite these efforts, the limitations inherent in achieving full interpretability in complex deep learning models is acknowledged. While the SelectKBest technique enhances explainability, the intricate architecture of deep learning systems still poses challenges in providing comprehensive explanations for every prediction. This area remains a fertile ground for future research, particularly in developing deep learning models that strike an optimal balance between accuracy, complexity, and interpretability for medical applications. The present approach was designed to enhance the predictive accuracy of various deep learning models in assessing vascular access quality in hemodialysis patients. While the direct impact of this technique on patient outcomes and quality of life was not measured within this study, its potential to improve diagnostic precision may have significant implications for these aspects. Effective and accurate identification of vascular complications is key to advancing management strategies, potentially reducing adverse events and improving overall patient care. As such, the present research paves the way for future investigations that may explore how enhanced diagnostic capabilities might directly influence patient outcomes and quality of life in the context of hemodialysis treatment. This line of inquiry is crucial for a comprehensive understanding of the clinical benefits associated with precise and early diagnosis in hemodialysis therapy.
Conclusions
Within the continuously evolving realm of medical diagnostics, the innovative application of CNN1Ds to PPG sensor data heralds a significant advancement in the prediction of vascular access quality in hemodialysis patients. A cornerstone of the present research was the adoption of the SelectKBest feature selection technique. Its efficacy was demonstrated not only in adeptly reducing dataset dimensionality but also in amplifying the model’s predictive accuracy, thereby revealing the indispensable role of methodical feature selection in diagnostic model optimization.
The rigor of the present research methodology ensured a meticulous validation of the model, and the results offer insights into the nuanced interplay between deep learning architectures and physiological data, shedding light on the potential of this synergy in revolutionizing patient care. The high degree of prediction accuracy may act as a catalyst in early diagnosis and timely intervention, ultimately enhancing patient outcomes and quality of life.
The implications of the present findings span beyond immediate clinical applications. They resonate with the broader dialogue in the medical community regarding the integration of artificial intelligence in healthcare. The success of the present model stands testament to the transformative potential of deep learning, especially when paired with astute feature selection techniques, in shaping medical diagnostics. As such, this study not only contributes a novel diagnostic tool for hemodialysis patient care but also enriches the ongoing discourse on the symbiotic relationship between advanced computational techniques and medical science.
In summary, the holistic integration of CNN1D and SelectKBest in the present study accentuates the importance of marrying traditional medical diagnostics with advanced computational methodologies. This research, by setting new benchmarks, beckons further exploration and adoption of such techniques in the broader spectrum of healthcare applications.
Footnotes
Author contributions
Sarayut Julkaew, Thakerng Wongsirichot, Kasikrit Damkliang, and Pornpen Sangthawan contributed to conceptualization. Sarayut Julkaew and Thakerng Wongsirichot contributed to data validation. Sarayut Julkaew, Thakerng Wongsirichot, and Pornpen Sangthawan contributed to resources. Sarayut Julkaew, Thakerng Wongsirichot, and Pornpen Sangthawan contributed to data curation. Sarayut Julkaew and Thakerng Wongsirichot contributed to manuscript writing. Sarayut Julkaew, Thakerng Wongsirichot, Kasikrit Damkliang, and Pornpen Sangthawan contributed to visualization. Thakerng Wongsirichot was the supervisor and project administrator. All authors have read and approved the final manuscript.
Data accessibility statement
The datasets analyzed during the current study are available from the corresponding author upon reasonable request.
Declaration of conflicting interests
The authors declare that there is no conflict of interest.
Funding
This research was funded by the National Research Council of Thailand (NRCT), grant No. 65A115000069 and The Budget Revenue of Prince of Songkla University, grant No. SCI6402014S. This work was supported by the Digital Science for Economy, Society, Human Resources Innovative Development and Environment project funded by Reinventing Universities & Research Institutes No. 3674774, Ministry of Higher Education, Science, Research and Innovation, Thailand.
