Abstract
Objective
This study aimed to develop and evaluate deep learning models to improve the prediction of persistent smoking in patients with chronic obstructive pulmonary disease (COPD) by integrating behavioral and psychosocial variables with clinical data from a structured national dataset.
Methods
Three deep learning models and one machine learning model were developed and assessed using clinical, behavioral, and psychosocial data from 350 patients with COPD, including 51 current smokers. Data preprocessing involved imputation, variable transformation, and class weighting. Hyperparameter optimization was performed using the Optuna framework. Model performance was evaluated with repeated stratified K-fold cross-validation, and the macro F1 score was the primary metric. Shapley Additive Explanations (SHAPs) were applied to assess feature importance and improve interpretability.
Results
The Residual Neural Network achieved the highest performance, with a macro F1 score of .87 (95% confidence interval: .83–.89). SHAP analysis highlighted professional advice to quit, employment status, sputum symptoms lasting more than 3 months, perceived stress level, health check-up experience, and health literacy as key predictors of persistent smoking.
Conclusion
Incorporating behavioral and psychosocial data enabled the models to capture complex smoking patterns while maintaining interpretability. These findings emphasize the value of multidimensional data in identifying high-risk individuals and informing targeted smoking cessation strategies in COPD care. Future research should include synthesized behavioral variables often absent from large external datasets and validate model performance in more diverse populations.
Introduction
Recent advances in deep learning have demonstrated strong potential for analyzing medical data with high accuracy. 1 Building on this progress, deep learning-based predictive analytics are increasingly applied in nursing to improve workflow efficiency, enhance patient management and task allocation, 2 reduce nurse burnout, and improve quality of care. 3 Beyond workflow, deep learning models can also support patients directly by assessing adherence to self-management, a critical factor in chronic disease outcomes. 4 However, accurately predicting complex patient behaviors such as adherence remains a persistent challenge. 5
Our research team recently developed deep learning models to predict smoking status—a key component of self-management—in patients with chronic obstructive pulmonary disease (COPD) using datasets such as the Korean National Health and Nutrition Examination Survey (KNHANES). 6 KNHANES is designed to assess national health status and inform public health policy under the Ministry of Health and Welfare and the Korea Disease Control and Prevention Agency. 7 Despite its strengths, KNHANES lacks behavioral variables that provide deeper insight into health and nutritional conditions. These include psychosocial, clinical, and cognitive factors such as self-efficacy, sputum characteristics, and health literacy in COPD patients. While clinical factors may result from smoking, worsening clinical conditions can also influence smoking status, as perceiving a condition as serious often increases motivation to quit. 8 Combining clinical variables with behavioral and psychosocial data may therefore improve predictive accuracy by capturing both the consequences of smoking and the patient's ongoing health context. 9 Our previous work also emphasized the need to include such variables to strengthen model performance. 6 Since smoking cessation is closely tied to overall adherence to health-related behaviors, predicting persistent smoking after a COPD diagnosis offers valuable insights into patient self-management. 10
Selecting appropriate predictive models is equally important for optimizing performance. In our previous study, the Residual Neural Network (ResNN) outperformed both traditional machine learning and other deep learning models. 6 It is therefore necessary to compare its performance with other architectures designed to capture non-linear and high-dimensional interactions in tabular and healthcare datasets. 11 Including such models can ensure robust prediction of smoking behaviors among patients with COPD. 12
Although missing data in KNHANES is minimal, its potential impact on model performance should not be overlooked. 13 While some degree of missingness is inevitable in real-world datasets, it can reduce predictive accuracy and reliability. 14 To address this limitation, a prospective survey targeting patients with a confirmed COPD diagnosis who are currently undergoing treatment is recommended. 15 Such a strategy supports more comprehensive data collection, reduces bias from incomplete records, and improves both predictive performance and model generalizability. 16
Accordingly, this study evaluates whether incorporating key clinical variables can improve the prediction of smoking status in patients with COPD before integrating these variables with KNHANES data in future research. The study explores the use of clinically sourced datasets containing detailed behavioral and psychosocial information to address the limitations of large-scale national survey data and enhance model sensitivity to individual patient contexts. Ultimately, the goal is to develop a robust and generalizable deep learning model to support decision-making in identifying patients with COPD who are vulnerable to poor self-management, thereby contributing to improved patient outcomes.
Method
Study design
This study employed a cross-sectional design using data from a clinical setting.
Participants
Participants were adults aged 40 years and older diagnosed with COPD who visited the outpatient clinic at C University Hospital in Gwangju, Korea, between 26 December 2023 and 28 May 2024. A total of 575 patients were screened for eligibility, of whom 350 consented to participate and were included in the study (Figure 1).

Flow diagram of participants.
According to Riley et al. (2019), 17 a minimum sample size corresponding to up to five events per variable is recommended when developing a clinical prediction model. Because the structured questionnaire used in this study included 31 categories of variables, the appropriate sample size ranged from 93 (3 per event) to 155 (5 per event). However, the relatively small number of current smokers increased the risk of overfitting. To mitigate this risk, dropout was applied to enhance model stability and generalizability. 18 Among the four models used (eXtreme Gradient Boosting (XGBoost), ResNN, TabTransformer, and Feature Tokenizer (FT) Transformer), XGBoost inherently incorporates boosting and regularization through weight penalization, whereas the deep learning models required explicit dropout application. 19
Data selection and collection
In our previous study, 6 21 variables were selected to develop a smoking prediction model for patients with COPD, based on factors related to COPD self-management. 20 Of these, 20 were retained in the present study. The remaining variable—interpretation of lung function based on forced expiratory volume in 1 s (FEV1) or the FEV1/forced vital capacity (FVC) ratio—was excluded, as all participants had already been clinically diagnosed with COPD.
In the current dataset, the “household composition” variable was replaced with “family/friend support for quitting.” To expand the dataset with additional behavioral factors, 23 new variables were added (Supplementary Table 1), resulting in a total of 43 variables across 31 categories. Among these, self-efficacy and health literacy were measured using validated instruments. Self-efficacy was assessed using a modified Korean version of the SCES-COPD (Self-Care Self-Efficacy Scale for COPD), validated by Choi and Yun. 21 This version originally consisted of seven items rated on a 5-point Likert scale (total score 0–100) and was adapted for respondent convenience. COPD-specific health literacy was measured with a 66-point instrument developed by Kim and Choi. 22 Multicollinearity among variables was assessed, with correlations ranging from .01 to .67.
The dependent variable was smoking status. Following the methodology of our previous study based on a study using KNHANES data, 23 daily and occasional smokers were grouped as “smokers,” and ex-smokers and nonsmokers were classified as “nonsmokers.” For model development, smoking status was coded as a binary outcome: 1 = smoker, 0 = nonsmoker.
To ensure data reliability, two trained research assistants underwent standardized training before data collection. Training covered study procedures, ethical considerations, and question-and-answer techniques. For instance, when assessing sputum characteristics, participants providing multiple responses were instructed to report their most recent condition. Data were collected in a quiet room using a structured questionnaire administered by trained assistants. Completed surveys were reviewed by a researcher for completeness. Clinical indices such as the most recent FEV1 and FEV1/FVC values were extracted from electronic medical records by the same assistants.
FEV1/FVC data were unavailable for 29 patients (8.3%) who had been diagnosed at other hospitals and had not undergone repeat testing at the study clinic.
Data preprocessing
To optimize the performance and reliability of the smoking prediction models, preprocessing included missing value imputation, normalization, categorical encoding, class imbalance handling, and splitting of the training and test datasets (Supplementary Figure 1).
A comparative analysis of imputation methods was conducted to address missing values in FEV1 and FEV1/FVC. The methods tested were simple imputation (mean/median replacement), Multiple Imputation by Chained Equations (MICE), Iterative Imputer, K-Nearest Neighbors (KNN) Imputer, and MissForest. These were evaluated using quantitative metrics (mean and standard deviation comparisons). For normalizing continuous variables, both Min–Max normalization and standardization (z-score transformation) were tested. For encoding categorical variables, both Label Encoding and One-Hot Encoding were evaluated. However, One-Hot Encoding could not be applied due to internal algorithm constraints in the TabTransformer and FT Transformer.
Class imbalance was addressed by assigning higher weights to the minority class, consistent with a previous study.
24
Other methods, including Synthetic Minority Oversampling Technique, random undersampling, and deep learning-based oversampling, did not result in noticeable improvements compared with class weighting (Supplementary Table 2). Specifically, the positive class weight was defined as the ratio of nonsmoker to smoker labels and applied to the loss function using the pos_weight parameter in BCEWithLogitsLoss. This increased the penalty for misclassifying smokers, thereby improving the influence of the minority class and balancing the learning process. The dataset was randomly split into training and test sets in an 80:20 ratio using the
Model selection and development
Four models were evaluated to predict smoking status in patients with COPD: three deep learning models (ResNN, TabTransformer, and FT Transformer) and one machine learning model (XGBoost). These models were selected for their effectiveness in handling structured clinical data, including high-dimensional features, missing values, and class imbalance.
ResNN was prioritized based on its strong predictive performance in a previous study. 6 To broaden the range of approaches, TabTransformer and FT Transformer were also included, as both have shown success in capturing complex feature interactions in tabular datasets. 26 Transformer-based models are particularly effective for modeling contextual relationships between input variables, 27 and they perform well in scenarios with missing data, skewed distributions, and non-linear dependencies. 28 These architectures have also been applied successfully in healthcare prediction tasks such as modeling disease progression, estimating treatment response, and forecasting hospital readmissions. 29 XGBoost was included for its scalability and strong generalization capability, 30 as boosting algorithms were not evaluated in the previous study. It is also well suited for structured medical data with missing values and class imbalance. 31
Hyperparameter tuning was performed using the Optuna framework, 32 with Bayesian optimization across 100 iterations to identify the optimal combination of hyperparameters for each model.
Model validation and evaluation
Model performance was evaluated using fivefold cross-validation. Specifically, five validation runs with Repeated Stratified K-Fold were conducted to preserve the proportion of smoker cases, given their limited number, and to identify the model with the highest performance and the narrowest 95% confidence interval (CI). The macro F1 score was used as the primary evaluation metric. This score calculates the unweighted mean of F1 scores across all classes, treating each class equally regardless of frequency. Because the outcome variable was imbalanced, the macro F1 score was considered the most appropriate metric.
33
Interpretable artificial intelligence
This study applied SHapley Additive exPlanations (SHAPs) to interpret the output of the top-performing models and improve transparency. 34 SHAP assigns an importance value to each feature by estimating its marginal contribution to the model's prediction while accounting for feature interactions and dependencies. In SHAP plots, features are listed on the y-axis in order of importance. The x-axis distribution shows the magnitude of each feature's contribution, while color indicates the direction of effect: red for higher values and blue for lower values. For example, a cluster of red points on the right side of the plot indicates that higher feature values contribute to predicting smokers.
Ethical considerations
This study was approved by the Institutional Review Board of C National University Hospital.
Results
Participant characteristics
Table 1 presents the characteristics of the participants. The mean age was 73.3 ± 9.4 years, and 267 (76.3%) were male. Fifty-five participants (15.7%) had a college degree or higher. Most were married (97.7%), and 27 (7.7%) lived with grandchildren. Ninety-two (26.3%) had an occupation, and 100 (28.6%) were engaged in economic activity.
Characteristics of participants (N
Abbreviations: FEV1: forced expiratory volume in 1 s; FVC: forced vital capacity; QoL: quality of life; EQ-5D: EuroQol-5 Dimension; COPD: chronic obstructive pulmonary disease.
The mean FEV1 was 1.80 ± 0.68, and the mean FEV1/FVC ratio was 0.63 ± 0.15. The average duration of COPD was 9.49 ± 10.56 years, and 163 (46.6%) reported COPD-related hospitalization in the past year. A total of 143 (40.9%) had physician-diagnosed hypertension, 76 (21.7%) had diabetes, and 5 (1.4%) had lung cancer. Hypertension treatment was reported by 137 (39.1%), and 136 (38.9%) used antihypertensive medication. Diabetes treatment was reported by 70 (20.0%), and 67 (19.1%) used diabetes medication. Cough lasting ≥3 months was reported by 166 (47.4%), and 183 (52.3%) reported sputum for ≥3 months. Yellow sputum was reported by 43 (23.2%), and 51 (14.6%) experienced severe smoking withdrawal symptoms.
The mean quality of life score was 0.73 ± 0.30; self-efficacy was 6.30 ± 1.16; and health literacy was 41.20 ± 9.66. Support from family or friends to quit smoking was reported by 179 (51.1%), while 88 (25.1%) received professional advice to quit. A total of 167 (47.7%) reported little to no stress, and 278 (79.4%) reported no depressive symptoms lasting >2 consecutive weeks. Normal perceived health status was reported by 136 (38.9%).
The mean smoking amount was 17.88 ± 16.72 cigarettes per day, and the mean smoking duration was 24.29 ± 18.14 years. Alcohol consumption averaged 0.72 ± 2.24 bottles per week, and mean sleep duration was 6.45 ± 1.80 h per day. Fifty participants (14.3%) reported no walking days per week, and 244 (69.7%) reported no strength training. Activity limitation was reported by 92 (26.3%). Most participants (279, 79.7%) had received a health check-up. A total of 247 (70.6%) did not attempt body weight control in the past year, and 241 (68.9%) reported no change in body weight. Similarly, 273 (78.0%) reported no change in food intake. Influenza vaccination in the past year was reported by 310 (88.6%).
Data preprocessing
For missing value imputation, the MICE method was selected due to its strong theoretical foundation and superior performance in quantitative comparisons (Supplementary Table 3). Although MissForest demonstrated robust performance, it was excluded because of its high computational cost and limited clinical interpretability. The KNN Imputer generated the most realistic data distribution but showed lower predictive performance. Similarly, the Iterative Imputer exhibited stable distributional characteristics but was not selected because of its experimental nature and limited empirical support.
For variable transformation, the combination of standard normalization for continuous variables and label encoding for categorical variables yielded the most consistent results across all models. Under this configuration, the FT Transformer achieved the highest performance (macro F1 score .81, 95% CI .73–.88), followed by ResNN (.80, 95% CI .68–.91) and TabTransformer (.76, 95% CI .70–.82). XGBoost performed slightly lower, with a macro F1 score of .75 (95% CI .69–.82). Although One-Hot Encoding produced comparable results to label encoding in XGBoost, it was not applicable to transformer-based architectures. Therefore, standard normalization and label encoding were adopted as the final preprocessing methods, balancing compatibility and predictive performance across model types (Supplementary Table 4).
Model development
Table 2 presents the macro F1 scores of each model before and after hyperparameter tuning using fivefold cross-validation. ResNN showed the most notable improvement, increasing from .80 ± .09 to .85 ± .03. XGBoost, TabTransformer, and FT Transformer also improved slightly, but the gains were relatively modest.
Parameter settings after tuning by model.
ResNN: Residual Neural Network; FT: Feature Tokenizer; XGBoost: eXtreme Gradient Boosting.
Model validation and evaluation
Table 3 presents macro F1 scores with 95% CIs across two validation strategies. Overall, Repeated Stratified K-Fold produced slightly higher or comparable scores compared with baseline fivefold validation. Both ResNN (.87 ± .05, CI .83–.89) and FT Transformer (.87 ± .06, CI .80–.92) achieved strong performance under Repeated Stratified K-Fold. These findings suggest that both validation approaches are robust, with only minor variations depending on the model architecture. Aggregated confusion matrices from fivefold validation further illustrate classification performance (Supplementary Figure 2).
Comparison of macro F1 scores (95% CI) across validation methods and models.
ResNN: Residual Neural Network; FT: Feature Tokenizer; XGBoost: eXtreme Gradient Boosting; CI: confidence interval.
Feature importance
Figure 2(a) shows the SHAP plot for ResNN, which highlighted psychosocial and clinical predictors associated with persistent smoking. The most influential feature was professional advice to quit, followed by employment status, sputum symptoms lasting ≥3 months, health check-up experience, and perceived stress level. Patients who had not received cessation advice, were unemployed, had chronic sputum symptoms, or reported high stress levels were more likely to be classified as smokers. Conversely, those who had undergone regular health check-ups were more often predicted as nonsmokers.

SHAP summary plots of feature importance. (a) ResNN model. (b) FT transformer model. Abbreviations: AMU: antihypertensive medication use; AST: average sleep time; BWCY: body weight control for a year; BWChY: body weight change over a year; CL3 M: cough lasting ≥3 months; COPD: chronic obstructive pulmonary disease; CRH: COPD-related hospitalization; CUFI: change in usual food intake; DMU: diabetes medication use; DS2 W: depressive symptoms lasting >2 weeks; EEA: economic engagement activity; FEV1: forced expiratory volume in 1 s; FVC: forced vital capacity; FFSQ: family/friend support for quitting; LWG: living with grandchildren; PAQ: professional advice to quit; PDH: physician-diagnosed hypertension; PDLC: physician-diagnosed lung cancer; PHS: perceived health status; PSL: perceived stress level; QoL: quality of life; SP3 M: sputum for ≥3 months; STDW: strength training days per week; SWS-E: severe withdrawal symptoms (experienced); WDW: walking days per week; SHAP: Shapley Additive Explanation; ResNN: Residual Neural Network; FT: Feature Tokenizer.
Figure 2(b) shows the SHAP plot for the FT Transformer. Professional advice to quit smoking remained the most impactful feature, followed by smoking duration, family or friend support, hypertension treatment, and COPD-related hospitalization. Compared with ResNN, the FT Transformer tended to classify patients who had received cessation support or social encouragement as smokers. Based on previous studies,35–37 ResNN was considered explainable in terms of the causal direction of predictors for persistent smoking, whereas the FT Transformer appeared to overfit the data.
Discussion
This study developed an enhanced deep learning model to predict smoking status in patients with COPD by integrating behavioral and psychosocial variables into a clinical dataset. By expanding input features to include healthcare provider support, COPD-specific symptoms, and health literacy, the model better captured patterns of non-adherence to self-management associated with persistent smoking. In particular, professional advice to quit, sputum production lasting ≥3 months, and health literacy were newly incorporated and identified as influential predictors. SHAP analysis of the top-performing model consistently confirmed the importance of these factors in classifying smoking status. In contrast, although biochemical markers such as salivary cotinine provide the most accurate assessment, smoking status in this study was measured using a self-administered questionnaire. Self-reported smoking status shows high agreement with biomarkers such as cotinine levels, captures patients’ perception and disclosure of their smoking habits, and is more feasible in terms of cost and routine data collection. 38 Nevertheless, the potential for misclassification inherent in self-reported measures must be acknowledged.
One of the key findings was that the model achieved the highest macro F1 score with the narrowest 95% CI, consistent with our previous study,⁶ indicating stable and superior performance in predicting persistent smoking in COPD. This strong performance may be attributed to the residual learning structure, which captures essential non-linear interactions among clinically meaningful variables while minimizing overfitting in relatively small and imbalanced datasets. 39 In addition, the present study implemented refined model architectures and context-rich clinical and behavioral data, which enabled more effective learning of complex relationships between predictors. This improved predictive performance and allowed for more clinically meaningful risk stratification.
The model's performance and SHAP analysis also indicated that it was more suitable for predicting persistent smoking than the FT Transformer. One of the most influential predictors was whether patients received professional advice to quit smoking. Patients were more likely to continue smoking if they had not received cessation advice, were unemployed, had chronic sputum symptoms, or reported high stress levels. In contrast, the FT Transformer produced contradictory directional interpretations, placing greater predictive weight on features such as family and friend support for cessation, and unexpectedly predicting patients with support as smokers. This contradicts clinical evidence. For example, Cheung et al. (2021) showed that even brief physician advice significantly increased quit rates compared with no advice. 40 The discrepancy suggests that the model captured more clinically reasonable associations, whereas the FT Transformer may have overfit complex but less interpretable patterns. The unexpected prediction patterns in the FT Transformer likely reflect its higher model complexity, which can lead to overfitting in limited datasets despite regularization and repeated stratified K-fold validation. 41 In contrast, the model, being structurally simpler, produced more stable and generalizable results. Accordingly, the model demonstrated both high predictive performance and clinically meaningful interpretability. Due to this balance, the model appears particularly well suited for clinical applications where transparency and efficiency are essential. 42 However, achieving an optimal balance between model complexity and cost-effectiveness remains challenging, as increasing the number of input variables may limit practical applicability. 43
To address potential instability in performance evaluation due to the small and imbalanced dataset, we applied Repeated Stratified K-Fold cross-validation. 44 This method preserves class balance across training and validation sets and reduces variability from random sampling, yielding more reliable performance estimates. 45 By repeating the stratified K-Fold procedure multiple times, this approach improves consistency in evaluation outcomes. 46 However, it also increases computational time due to multiple model iterations. While this method enhances evaluation stability, its computational cost should be carefully considered in future applications involving larger datasets or more complex architectures.
Our SHAP analysis further highlighted clinically interpretable predictors of persistent smoking, such as lack of professional cessation advice, chronic sputum symptoms, unemployment, and high stress levels. These findings are biologically and behaviorally plausible. 47 Patients who do not receive cessation advice from healthcare professionals may underestimate the health risks of smoking, reducing their likelihood of quitting. 48 Patients with chronic sputum symptoms lasting ≥3 months were more likely to be classified as smokers, consistent with research linking these symptoms to tobacco-induced airway inflammation. 49 Misinterpreting such symptoms may reduce motivation to quit, underscoring the need for education that links respiratory symptoms to smoking behavior. 50 Psychosocial stress has also been consistently associated with smoking relapse. 51 Patients reporting frequent or intense stress were more likely to be smokers, reinforcing the role of psychological distress in maintaining tobacco use. 52 Stress may serve as a barrier to cessation, highlighting the value of integrating stress management into quit strategies for COPD patients.
Another important predictor identified by the model was occupation, with unemployed patients more likely to be classified as persistent smokers. This aligns with research showing that unemployment is associated with higher smoking prevalence, fewer quit attempts, and increased relapse risk.53,54 Unemployed patients with COPD may face greater psychosocial stress, reduced access to health information, and limited engagement with preventive healthcare, all of which contribute to sustained smoking behavior. Interestingly, patients who had undergone a recent health check-up were also more likely to be smokers in this study. This may reflect a false sense of reassurance from normal findings, reducing motivation to quit despite continued risk. 55 These findings underscore the need to integrate smoking cessation interventions into routine clinical care. 35 Taken together, the results support comprehensive, personalized cessation strategies targeting modifiable risk factors to improve smoking outcomes and support long-term disease management in COPD.
Limitations
Despite promising results, this study has several limitations. First, although Repeated Stratified K-Fold cross-validation improved the stability of performance estimates, it also imposed a moderate to high computational burden due to repeated model fitting. This method is commonly used in small datasets to reduce sampling variability and improve robustness, but it may be less practical for large-scale applications or real-time clinical settings where computational efficiency is critical. Second, the relatively small sample size may limit the generalizability of the findings. Although stratified cross-validation and dropout were applied to mitigate this issue, external validation with independent datasets will be needed to more robustly assess generalizability. Third, while behavioral and psychosocial variables significantly enhanced model performance, these factors may be misinterpreted as causal, are context-dependent, and may vary across populations, healthcare systems, and cultural settings. Caution is therefore warranted when applying this model in different contexts.
Clinical usability
This study highlights the potential clinical utility of deep learning models in identifying and managing smoking behavior among patients with COPD. By integrating behavioral, psychosocial, and routinely collected clinical data, the ResNN model provided accurate and interpretable predictions. Its reliance on commonly available variables and SHAP-based interpretability supports integration into clinical workflows and electronic health record systems, enabling timely and personalized smoking cessation interventions for high-risk patients.
Conclusions
This study demonstrated the effectiveness of an enhanced deep learning model in predicting smoking status among patients with COPD by integrating behavioral, psychosocial, and clinical variables. The inclusion of diverse features—particularly professional advice to quit, sputum symptoms, and health literacy—contributed to improved model performance and clinical relevance. Among the evaluated models, ResNN showed the most consistent and superior results, achieving the highest average macro F1 score. These findings provide empirical support for applying advanced deep learning models in predictive tasks involving clinical populations.
By incorporating behavioral and psychosocial data into a clinically feasible model, this research helps bridge the gap between artificial intelligence development and real-world healthcare applications. The performance and interpretability of the ResNN model represent a promising step toward integrating predictive analytics into personalized COPD care. These results underscore the importance of leveraging multidimensional data and selecting architectures suited to structured clinical datasets. The model's reliance on routinely collected variables also supports its feasibility for outpatient implementation. Future research should focus on external validation in larger and more diverse populations, real-world deployment, and prospective evaluation of clinical outcomes.
Supplemental Material
sj-docx-1-dhj-10.1177_20552076251393380 - Supplemental material for Enhancing deep learning models for predicting smoking Status using clinical data in patients with chronic obstructive pulmonary disease
Supplemental material, sj-docx-1-dhj-10.1177_20552076251393380 for Enhancing deep learning models for predicting smoking Status using clinical data in patients with chronic obstructive pulmonary disease by Sehyun Cho, Hyeonseok Jin, Kyungbaek Kim, Sola Cho and Ja Yun Choi in DIGITAL HEALTH
Supplemental Material
sj-docx-2-dhj-10.1177_20552076251393380 - Supplemental material for Enhancing deep learning models for predicting smoking Status using clinical data in patients with chronic obstructive pulmonary disease
Supplemental material, sj-docx-2-dhj-10.1177_20552076251393380 for Enhancing deep learning models for predicting smoking Status using clinical data in patients with chronic obstructive pulmonary disease by Sehyun Cho, Hyeonseok Jin, Kyungbaek Kim, Sola Cho and Ja Yun Choi in DIGITAL HEALTH
Footnotes
Acknowledgments
None.
Ethical approval
This study was approved by the Institutional Review Board of C National University Hospital.
Consent to participate
Informed consent was obtained from all individual participants included in the study.
Consent for publication
Consent for publication was obtained from all participants.
Author contributions
SC contributed to investigation, data curation, methodology, writing—original draft, review, and editing. HJ contributed to investigation, validation, visualization, writing—original draft, review, and editing. KK contributed to investigation, methodology, writing—original draft, review, and editing. SC contributed to data curation, writing—original draft, review, and editing. JYC contributed to conceptualization and/or methodology, funding acquisition, investigation, project administration and/or supervision, writing—original draft, review, and editing.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF), funded by the Ministry of Education [grant number NRF-2022R1A2C1010364]; Institute of Information & communications Technology Planning & Evaluation (IITP) under the Artificial Intelligence Convergence Innovation Human Resources Development (IITP-2023-RS-2023-00256629) grant funded by the Korea government (MSIT). This research was supported by the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2024-RS-2024-00437718) supervised by the IITP (Institute for Information & Communications Technology Planning & Evaluation).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability
Supplemental material
Supplemental material for this article is available online.
Disclaimers
Not applicable.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
