Abstract
Objective
Conventional scale-based diagnostic approaches are increasingly insufficient for addressing the growing mental health challenges among adolescents. Leveraging advances in artificial intelligence, this study aims to develop an accurate, efficient, and scalable model for early identification of adolescent depression risk using large-scale census data, and to identify key daily life factors associated with mental health outcomes.
Methods
Data were obtained from the 2021 National Survey of Children's Health, including 50,892 adolescents and 463 variables. Based on prior literature, 60 relevant variables were selected. Three progressively structured hypotheses concerning the relationships between adolescent depression and developmental environments were proposed. Machine learning models, including decision trees, XGBoost, support vector machines, and neural networks, were applied to predict depression risk. Mediation analysis was conducted to examine the pathways through which living conditions influence mental health.
Results
The optimal model demonstrated strong predictive performance, achieving an accuracy of 0.85 and an AUC exceeding 0.87. Feature importance analysis identified several key predictors. Mediation analysis indicated that living conditions exerted a direct effect of 0.225 on mental health, while physical activity and diet quality partially mediated this relationship.
Conclusion
Living conditions are critical indicators for early identification of adolescent depression risk. The use of nationwide census data enables timely screening and targeted intervention. Improving dietary habits and increasing physical activity may serve as effective preventive strategies for adolescent mental health disorders.
Introduction
In today's fast-paced society, mental health disorders have emerged as a critical public health issue. Adolescents are particularly affected, experiencing elevated levels of emotional distress, with prevalence rates ranging from 15% to 32% and showing a persistent upward trend. 1 This trend underscores the urgency of addressing psychological challenges in this demographic. Adolescents’ psychological and physiological immaturity, combined with adverse environmental factors such as parental neglect, lack of emotional support, tense family dynamics, harsh parenting styles, and strained relationships with peers, can significantly contribute to behavioral problems.2–4 Addressing these underlying causes through early identification and targeted interventions is therefore essential not only for promoting adolescents’ holistic development and well-being but also for advancing broader societal health. 5 This remains a critical global challenge.
In current clinical practice, mental disorders are typically diagnosed using inquiry-based scales guided by the International Classification of Diseases, 10th Revision (ICD-10), which outlines criteria for depressive episodes and recurrent depressive disorders. 6 For diagnosing depression, commonly used tools include Zung's 1965 Self-Rating Depression Scale (SDS), which has demonstrated strong reliability and validity. 7 In addition to the SDS, the Mini-International Neuropsychiatric Interview (MINI) is another prominent instrument, designed as a brief structured interview for diagnosing mental disorders. 8 This tool was developed collaboratively by American and European researchers and aligns with both the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV) and the ICD-10. 9 Furthermore, the Symptom Checklist-90 (SCL-90), created by Derogatis, consists of 90 items that evaluate mental health based on self-reported symptoms. 10 Although standardized scales offer a valuable basis for clinical assessment, most originate from specific cultural and regional contexts; even after translation and adaptation, they often exhibit limited applicability—especially among younger adolescents, for whom developmental and linguistic differences can compromise their validity and usability. 11 Prediction frameworks that utilize statistical models, regression analysis, support vector machines (SVMs), 12 neural networks, and Bayesian networks have demonstrated effective performance in mental health contexts.13–16 Leveraging advances in diagnostic technology, researchers have adapted risk prediction models from general medicine to schizophrenia, with the goal of identifying individuals at high risk for psychosis and enhancing early intervention efforts. 17 Early ultra-high risk models provided a new strategy for the early identification and intervention of mental disorders. 18 But they have now been proven to have drawbacks such as a high rate of false positives and limited predictive ability. 19 In addition to the aforementioned statistical approaches, 20 Hu et al. explored the association between well-being and depression among college students from both genetic and childhood abuse perspectives, providing a transition from objective biological factors to subjective environmental influences in understanding adolescent depression and its relationship with developmental contexts. Furthermore, prediction models for adolescent depression can be constructed by evaluating psychosocial factors and measuring biological markers, such as plasma cortisol and dehydroepiandrosterone sulfate levels, which provide valuable insights into the mental health of adolescents. 21
However, reliance on these tools alone may lead to diagnostic inaccuracies, as assessments typically depend on adolescent self-reports or caregiver responses during clinician-administered interviews. Adolescents may struggle to articulate their emotional experiences due to limited insight or reluctance, while caregivers may lack sufficient understanding of the adolescent's internal state, resulting in biased or incomplete evaluations. 22 Research has established that depression arises from a combination of biological, psychological, social, and lifestyle factors. 23 Notably, experiences of childhood abuse are strongly linked to the onset of clinical mental disorders during adolescence. 24 Drawing on psychological, public health, and social ecological theories, this study examines how measurable aspects of adolescents’ daily environments, such as physical activity, diet, sleep, and family economic conditions, affect mental health, as analyzing these factors can provide valuable supplementary information for diagnosing depression and related mental health conditions.25,26 Unhealthy dietary patterns, particularly high intake of fast food and sugary drinks, are linked to greater depression risk in adolescents, while nutrient-rich diets show protective effects, underscoring the potential of dietary interventions in promoting mental health. 27 Key environmental and social factors, including socioeconomic status and parental mental health, play a critical role in child outcomes 28 and should be integrated into mental health risk prediction models. Based on this, we formulated three progressively deeper hypotheses at the outset of our study: (1) machine learning can be used to predict depression risk; (2) developmental factors are associated with adolescents’ risk of depression; and (3) the relationship between environmental factors and adolescent depression risk is mediated by specific intervening variables.
Adolescent mental health disorders are often difficult to diagnose due to limited communication skills and reluctance to express emotions, leading to underreporting and misinterpretation. Recent advances in artificial intelligence, with their capacity for consistent and objective pattern recognition, provide a promising alternative to reliance on subjective reports. To address this challenge, we developed a predictive model using data from the National Survey of Children's Health (NSCH) to examine how objective aspects of adolescents’ living environments relate to depression risk. The model not only identifies key contributing factors but also captures their interrelationships, offering insights into the underlying mechanisms. To model depression risk, we applied machine learning and deep learning techniques, demonstrating their effectiveness through experimental validation. Additionally, to investigate potential causal factors, we employed a mediating effect 29 model to examine both direct and indirect effects between depression and physiological factors, offering insights into the underlying mechanisms of adolescent depressive disorders. Based on our findings, we propose relevant recommendations for early intervention and policy development.
Materials and methods
Data information
This study utilizes official comprehensive screening data to ensure the reliability of all findings. The NSCH from the United States 30 provides extensive information on various intersecting aspects of children's lives, including physical and mental health, access to quality healthcare services, and the family, community, school, and social environments that influence children. The available database encompasses annual census data from 2016 to 2021, derived from revised surveys conducted by the U.S. Census Bureau via mail and online methods. In our research, we have selected the 2021 survey data that is comprehensive and the latest revised.
The 2021 NSCH database includes a total of 50,892 survey cases, each comprising 443 variables. These variables cover broad domains such as caregiver (respondent) characteristics, basic sociodemographic information of the adolescent, physical and mental health status (including current conditions and medical history), as well as family and community environments. Each domain contains multiple structured questionnaire items with predefined response options. A simplified description of the survey instrument is provided in Table A1 in the Appendix, while detailed variable definitions and specifications can be found in the 2021 NSCH Topical Variable List.
Guided by the SDS, ICD-10, ICD-11, 31 DSM-5, 32 SCL-90, and MINI, we selected corresponding or potential causes and outcomes relevant to this study. The potential causes encompass other psychological disorders, physical diseases, growth environments, and lifestyle habits, while the outcomes include mental health issues such as depression, anxiety, autism, hyperactivity, and Attention Deficit Disorders (ADD). Logically, identifying potential cause factors would help develop preventive and treatment strategies.
Based on the aforementioned guidelines and a review of historical research related to adolescent mental health, we have selected the key reference variables as follows: gender, 29 age, 30 family environment, 31 and physical activity,33–35 along with other influencing factors.36–39 Based on a review of the literature, we identified 60 variables with potential impacts on mental disorders for further investigation. Descriptive statistics and sample variable details are presented in Table A2 in the Appendix.
Sample inclusion and exclusion criteria
The original dataset covers minors aged 0–17 years. As this study focuses on adolescents, we restricted the sample to individuals aged 12–17 years. After excluding cases outside this age range, the final analytical sample comprised 17,539 participants. Among these, 2212 had been diagnosed with depression, 3491 with anxiety, and 2769 with ADD/Attention Deficit Hyperactivity Disorder (ADHD), with some overlap across diagnostic categories. Notably, over 80% of all reported mental health diagnoses in the 2021 NSCH dataset were concentrated within the 12–17 age group, which is consistent with general epidemiological trends.
Machine learning dataset
We used a binary indicator of depression diagnosis (K2Q32A) as the target outcome variable. In the mediation analysis, 12 variables reflecting socioeconomic status, physical activity levels, and dietary habits were selected as observed predictors. All cases with missing values on any of the included variables were excluded to ensure data completeness and model reliability. After applying the data exclusion criteria and removing records with missing values, a total of 12,856 samples remained for machine learning analysis. Among these, 1778 were positive cases of depression. Additionally, the dataset was split into training and testing sets in an 8:2 ratio.
Hypothesis
Hypothesis 1: Machine learning can predict adolescent depression risk using daily-life data.
We hypothesize that machine learning models trained on objective daily-life factors from census data such as physical activity, sleep patterns, diet, and socioeconomic background—can effectively identify adolescents at risk of depression.
Hypothesis 2: Socioeconomic and environmental factors are associated with adolescent depression risk.
We propose that adolescents from lower socioeconomic or less supportive backgrounds are more prone to depressive symptoms, underscoring the strong influence of external living conditions on mental health—a foundation for our mediating effect model.
Hypothesis 3: The relationship between environmental factors and adolescent mental health is mediated by behavioral or psychological variables.
We further hypothesize that this association is not direct, but partially mediated by intermediate factors such as physical activity levels, dietary habits, or psychological capital (e.g. resilience, self-efficacy). This hypothesis is the basis for us to propose two mediating variables in the mediating effect model.
Method
This is an applied cross-sectional study using fixed-year census data, which does not permit tracking individual changes over time. We combine artificial intelligence prediction methods with mediation analysis theory to examine mental health associations. Prior to modeling, relevant variables were selected using Pearson correlation in SPSS 29.0, guided by diagnostic criteria from ICD-10 and ICD-11.
Prediction based on machine learning
In the context of machine learning modeling and depression risk prediction, we implemented various algorithms in Python, including linear regression method, 40 decision trees, 41 XGBoost, 42 SVMs, and neural networks. 43 The process involved first standardizing the dataset, followed by model development and training to establish a predictive framework. These models are capable of assessing the risk of mental disorders using general survey data, serving as effective tools for preliminary risk screening. In summary, the outcome variable in our prediction task is whether adolescents aged 12–17 have a history of depression, while the explanatory variables include other mental health conditions and objective daily-life factors derived from the corresponding samples. Consistent with methodological standards in predictive analytics, incomplete entries containing missing values were excluded to ensure the integrity and reliability of the model's input parameters.
Mediating effects statistical methods
Based on the experimental results and a review of related literature,44,45 we found a significant association between living conditions and adolescent mental health. To further investigate this relationship, we constructed a mediation model based on the methodological framework proposed by Frazier et al., 46 demonstrating that living conditions exert indirect effects on mental disorders through two key mediators: physical activity levels and dietary quality. These mediators jointly contribute to the observed impact on adolescent psychological well-being. The proposed mediation model is illustrated in Figure 1. The optimized calculation formulas implemented in the Python library for mediating effect analysis are as follows:

Receiver operating characteristic (ROC) curve.
Evaluation metrics
We employed distinct evaluation strategies for data screening and model assessment. In the initial variable selection, Pearson correlation coefficients (r) with p < 0.05 were used to identify factors significantly associated with depression.
For machine learning models, performance was evaluated on the held-out test set using multiple metrics. Accuracy (ACC) reflects the overall proportion of correctly classified cases. However, because the dataset is imbalanced—with far fewer depression cases than nondepression cases, we prioritized the area under the curve (AUC), which measures a model's ability to discriminate between classes across all decision thresholds and is insensitive to class distribution. We also reported the F1-score, the harmonic mean of precision and recall, which provides a balanced measure of performance on the minority (depressed) class.
Results
Descriptive statistics of the sample
After data cleaning and preprocessing, a final dataset comprising 17,539 valid entries was selected for analyses, representing the effective portion of the collected data. Table 1 presents the detailed information of the data employed in our study. Among the individuals within this large sample, 1963 adolescents were identified as having depression. The distribution of depression scores was 1.11 ± 0.316 (with 2 indicating depression and 1 indicating noncases). Additionally, 3369 adolescents were identified with anxiety, with an anxiety score distribution of 1.19 ± 0.395. The study focused on individuals aged 12 to 17, with an average age of 14.68 ± 1.697 years.
Descriptive statistics of the dataset.
As shown in Table 1, the distribution of adolescents by gender and age group is approximately equal. Notably, the proportion of individuals who have experienced depression is slightly higher among females than males, which aligns with findings from previous literature. Additionally, the prevalence of depression increases progressively with age, consistent with prior research studies. Descriptive analyses indicate that the dataset adheres to statistical patterns, making it suitable for further research and analyses. For the subsequent risk prediction experiments, the dataset was refined by removing all entries with missing values. The cleaned data was then divided into a training set and a validation set, using a ratio of 5:1. This resulted in 10,111 entries in the training set and 2000 entries in the validation set. The distribution of depression and nondepression cases in the processed dataset is detailed in Table 2.
Data samples used for prediction.
Correlation analyses
In the intercorrelation assessment, depression diagnosis was treated as the dependent variable while all other factors were considered independent variables. The correlation coefficients are presented in Table 3. All the selected factors were included in the correlation matrix, categorized into four groups: factors related to psychological disorders, physiological issues, life-related problems, and supportive assistance. Due to space limitations, only the factors that showed significant correlations from each category are listed.
Correlation analyses of related factors.
Except for autocorrelation items, all other correlation tests showed p < 0.01.
Test of intercorrelation coefficients
29
indicate that depression is significantly associated with symptoms such as anxiety and autism, all of which fall within the category of psychological disorders. The study focused on general behavioral performance and external objective factors, revealing that behaviors or factors with a high correlation to depression include the following:
Physiological issues: associated with bodily pain and trauma, health-related issues, and headaches. Life-related problems: related to emotional problems with parents, decreased work and study time due to health issues, interactions with individuals who have alcohol and drug abuse problems, shared bedtime routines, and frequency of family communication. Subjective issues: related to difficulties in concentrating or making decisions, behavioral problems, and managing issues independently.
Prediction and diagnosis of depression
Based on the descriptive analyses presented, the samples from this study can be effectively utilized to identify external factors and daily behavior metrics for predicting depression. In this effort, we employ various models, including logistic regression, decision trees, SVMs, and multilayer neural networks, to enhance prediction capabilities. Through the creation of a comprehensive dataset and validation processes, we established that the factors identified in the section “Correlation analyses” are valuable for prediction, achieving a high level of accuracy in forecasting the disease using standard machine learning techniques. As illustrated in Table 4 and Figure 2, each model demonstrated a relatively consistent and objective predictive performance.

XGBoost feature importance.
Logistic regression predicting the presence of depression.
AUC: area under the curve.
The dataset used in this study was collected directly from the general public, resulting in a class imbalance due to the higher prevalence of nonclinical (nondisorder) cases. In such a setting, prediction models may achieve an ACC of 0.8 or higher simply by favoring the majority class (negative), although such results lack meaningful statistical validity. To provide a more reliable evaluation of model performance, we employed the receiver operating characteristic (ROC) curve and the associated metrics. Results showed that all models achieved an ACC above 0.85, while the AUC exceeded 0.87 for all except for the decision tree model, indicating stronger predictive power in the first three methods. In terms of overall performance, the models ranked as follows: XGBoost, logistic regression, simple neural network, and decision tree. These findings suggest that models based on XGBoost, logistic regression, and simple neural networks are particularly well-suited for predicting psychological disorders using survey-based data. Moreover, in the XGBoost method, four research subjects received importance scores above 0.1, indicating a strong correlation with depression prediction, as shown in Figure 3.

Schematic diagram of the initial model.
Analysis of mediating effects on adolescent depression
The correlation matrix and subsequent analyses reveal a strong association between various psychological disorders. Notably, the diagnosis and treatment of depression demonstrate a significant correlation with anxiety disorders (r = 0.552, p = 0.000) and tic disorders (r = 0.364, p = 0.001), both exhibiting significant positive correlations (p < 0.01). Building on these correlation results, the selected neural network prediction method can effectively illustrate the relationship between living standards and psychological disorders, using depression as a focal point to explore potential factors and mediating effects related to the development of depressive symptoms. Through these analyses, we establish statistically significant relationships among various factors affecting depression research, with meaningful correlations identified. In this study, individuals with depression represented 11.2% of the total sample, and these correspond to the basic patterns observed in respective age surveys.
Mediating effects statistical study
Mediating effects were analyzed using SPSS and Python. Intercorrelations among variables were assessed, as correlation is a prerequisite for mediation testing. Model fit was evaluated via maximum likelihood estimation, and indirect effects were tested using a bootstrap method with p < 0.01. Consistent with prior literature, anxiety and history of psychological treatment were strongly associated with depression, whereas other associations were weaker, likely due to large data volume, irrelevant information increasing entropy, missing values, and dichotomization of variables, which reduced correlation strength. We conducted multiple sampling and framed our evaluations based on Root Mean Square Error of Approximation (RMSEA) and Root Mean Square Residual (RMR) principles. 47 We considered a correlation of 0.1 or higher as indicative of a relationship, and a correlation of 0.2 or higher as significantly related. These correlations are deemed effectively relevant.48,49
Mediating effect modeling in adolescent depression
As illustrated in Figure 4, we constructed the mediating effect model based on the analysis results presented in the section “Correlation analyses.” Model 1 establishes the relationship between living standards and depression. Model 2 introduces physical activity as a mediating variable. Building on model 2, model 3 incorporates both physical activity and dietary habits as dual mediating variables.

Adolescent lifestyle and depression model based on the mediating effects of physical activity and dietary levels.
Mediators of the impact of adolescents’ living standards on depression tendency
All potential mediators in this study were identified based on existing literature on factors associated with adolescent depression and further informed by empirical analyses of the variables available in the dataset. Specifically, we constructed three composite groups: living standards, physical activity levels, and dietary levels.
Correlation between adolescents’ living standards and depression diagnosis
Through factor analysis and dimensionality reduction of multiple experiences related to living standards (adverse childhood experiences, ACE series), we calculated the living standards component for adolescents. As shown in Table 5, the correlation results indicate a positive relationship between adolescents’ living standards and depressive symptoms (K2Q32), with a correlation coefficient of r = 0.225 and p = 0.001. The variables associated with living standards include: the frequency of difficulty maintaining basic expenses (ACE1), parental divorce or separation (ACE3), the death of a parent or guardian (ACE4), parental or guardian incarceration (ACE5), and living with individuals with tendencies toward alcohol or drug abuse (ACE9).
Correlation matrix of living standards variables.
ACE: adverse childhood experience. **Denotes a higher significance level of p < 0.01, showing even stronger statistical significance at the 99% confidence level.
Mediating effects of adolescent physical activity levels on depression tendency
Using structural equation modeling, our analysis showed good performance following maximum likelihood estimation. However, the data from public databases restricted our ability to specialize in various aspects, such as levels of physical activity and dietary habits. Consequently, we could not conduct further investigation into specific factors, including exercise duration, types of physical activities, and individual exercise preferences. To address this limitation, we selected relevant exercise-related variables and tested them for dimensionality reduction, ultimately forming a composite factor for physical activity levels. The specific physical activity variables included: participation in extracurricular physical education courses (K2Q30), involvement in sports clubs outside of school (K2Q31), frequency of participation in physical activities over the past year (K2Q33), and the number of days engaged in physical exercise in recent weeks (PHYSACTIV). All variables exhibited a significance level below 0.01, demonstrating their suitability for inclusion in the physical activity factor. The path coefficient for the lifestyle-physical activity-depression model is given in Table 6.
Correlation matrix for the physical activity factor.
**Denotes a higher significance level of p < 0.01 , showing even stronger statistical significance at the 99% confidence level.
The mediating effect of dietary levels on depression in adolescents
The nutritional levels obtained through dimensionality reduction based on multiple variables, including dietary habits, consist of the following components: food consumption levels over the past year (FOODIST), participation in food voucher or supplemental nutrition assistance programs (K11Q61), and utilization of free or low-cost breakfast or lunch provided by schools (K11Q62). The mediating effect indicates that in the dimensionality reduction analysis, all significance levels (p-values) were less than 0.01, making them suitable for integration into the dietary level variable. The path coefficients for the lifestyle-dietary level-depression model and the lifestyle-dietary level-physical activity-depression model are shown in Figure 1.
Discussion
Hypotheses and results analysis
In the “Method” section, we progressively proposed three hypotheses. Based on our experimental results, we now analyze and discuss the validity of each hypothesis.
Hypothesis 1: Machine learning methods can assist in predicting adolescent depression risk. The experimental results show that the models achieved relatively high performance in terms of both ACC and AUC, reaching levels that could be considered clinically informative. In particular, the XGBoost model performed best among all the tested models. Moreover, through variable importance analysis, we were able to interpret the key factors influencing model predictions. Therefore, we conclude that hypothesis 1 is supported by the results.
Hypothesis 2: Socioeconomic status influences the likelihood of adolescents developing mental health issues—in this case, depression. To test this hypothesis, we selected five variables from the census dataset representing socioeconomic status as independent variables, and used them to predict whether an adolescent had been diagnosed with depression. The overall effect size was found to be 0.225. While not extremely large, given the scale and geographic diversity of the census data, this represents a meaningful impact. Thus, we consider hypothesis 2 to be valid.
Hypothesis 3: Socioeconomic status affects depression risk through mediators such as physical activity and dietary habits. As shown in the section “Mediating effect modeling in adolescent depression,” socioeconomic status significantly influenced dietary quality, and to a lesser extent, physical activity levels—both of which align with common expectations: families with better living conditions tend to place greater emphasis on nutrition and are more likely to invest in sports-related activities. These two factors, in turn, were found to have an impact on mental health, although the effect of diet was relatively weaker. Based on these findings, we confirm that hypothesis 3 is also supported.
Taken together, the findings support all of our stated hypotheses. Lifestyle factors influence adolescent depression through both a direct pathway, which accounts for 66.4% of the total effect, and indirect pathways mediated by physical activity and dietary quality, which together constitute 33.6% of the total effect. Physical activity alone explains 15.6% of the total effect and represents 46.5% of the overall mediating effect.
Although the estimated path coefficients are relatively small, this is consistent with known limitations of large-scale survey data, such as measurement error, variable dichotomization, and contextual heterogeneity. Nevertheless, the statistically significant and directionally consistent mediation effects support the plausibility of the proposed behavioral pathways. These findings indicate that lifestyle conditions influence mental health both directly and indirectly. Dietary quality, for example, may promote mental well-being not only on its own but also by encouraging greater physical activity, thereby amplifying its indirect effect on depression risk.
Insights and comparison with existing work
Current approaches to identifying adolescent depression largely depend on standardized psychological scales or clinician-administered interviews. These methods, while clinically validated, face practical limitations in large-scale or preventive settings. They require trained personnel, rely heavily on subjective self-disclosure, and are often influenced by transient emotional states or social desirability bias. In contrast, our study demonstrates that machine learning models can achieve robust predictive performance using only routinely collected, nonclinical survey variables. Specifically, models such as XGBoost and logistic regression attained an ACC of 0.85 and an AUC exceeding 0.87 in distinguishing adolescents with and without a depression diagnosis. The key predictors included not only psychological indicators such as anxiety and difficulty concentrating but also physical health complaints, family dynamics, and aspects of daily functioning, all of which are easily observable and reportable without specialized training.
This finding highlights a key contribution of our work: depression risk can be reliably flagged through passive, population-level data rather than active clinical probing. By moving beyond symptom-based self-reports to objective markers of daily living conditions, our approach enables early, nonintrusive screening at scale, particularly valuable in school, community, or digital environments where traditional tools face feasibility, stigma, or resource barriers.
Equally important, our work moves beyond black-box prediction by incorporating mediation analysis to uncover the pathways linking environmental adversity to mental health outcomes. While many machine learning studies stop at feature importance rankings, we explicitly model how socioeconomic and familial stressors such as financial strain, parental separation, or exposure to substance use translate into depressive symptoms. Our results show that these living conditions exert both a direct effect and a substantial indirect effect mediated through lifestyle behaviors. Specifically, physical activity levels and dietary quality jointly account for over one-third of the total association, highlighting their role as modifiable buffers against environmental risk.
This integration of predictive modeling with causal inference represents a methodological advance. It bridges the gap between data-driven pattern recognition and theory-driven psychological understanding. Unlike purely correlational studies, our framework offers interpretable mechanisms; unlike purely mechanistic studies, it validates those mechanisms within a high-performing predictive system. Together, these contributions provide not only a tool for identification but also a roadmap for intervention, suggesting that promoting healthy behaviors may mitigate the mental health impact of adverse living conditions, even when those conditions themselves cannot be immediately changed.
The interventional role of physical exercise and activity levels in adolescent mental disorders
The second section of this study introduced a mediation effect modeling approach. The third section presented the results derived from applying this modeling framework to the empirical data, followed by a brief analysis. These findings provide partial support for our initial hypothesis regarding the mediating effect involved in adolescent depression. Through the use of factor analysis and dimensionality reduction with multiple items within the same dimension, we developed three descriptive variables: lifestyle, physical activity level, and dietary level. Our findings indicate that adolescent lifestyle has a direct impact on mental health, particularly regarding depression. Our mediation analysis shows that adolescent lifestyle is significantly associated with depression, with a direct effect accounting for about two-thirds of the total association and the remaining one-third mediated jointly by physical activity and dietary quality. These findings are consistent with the evidence mentioned in the Introduction, which highlights physical activity and healthy dietary patterns as protective factors against adolescent depression.
The results also suggest that adverse living conditions may indirectly increase depression risk by limiting engagement in these health-promoting behaviors. Notably, physical activity accounts for nearly half of the total indirect effect, underscoring its potential as a key modifiable factor, particularly for adolescents from disadvantaged backgrounds.
Limitations and suggestion for future studies
This study has several limitations. The data were drawn from a fixed, publicly available U.S. census-type database. As a secondary source, it did not permit access to additional details, refinement of variables, or longitudinal follow-up. Consequently, some project descriptions lacked completeness and methodological rigor. The sample encompassed the entire U.S. adolescent population, which enhances national representativeness but may limit applicability to specific regions or subgroups. Furthermore, the cross-sectional design, based on a single time point, precluded temporal analysis of symptom trajectories and made causal inference unfeasible. Many measures also relied on passively recorded items with limited response options, such as binary choices, which may reduce measurement sensitivity and weaken observed associations.
Future research should incorporate longitudinal data collection to track changes over time and enable more robust causal inference. Developing evaluative models using diverse time-series data would improve both accuracy and clinical relevance. Additionally, primary data collection with refined, multilevel items could address the limitations inherent in passive, pre-existing database records.
Conclusions
Establishing a census-based database plays a crucial role in supporting the physiological and psychological developments of adolescents. By developing machine learning models that leverage this census data, we can effectively assess and predict mental health status, facilitating risk evaluation and early intervention. The identification of mediating effect modeling through qualitative and quantitative analyses highlights how disparities in lifestyle significantly impact adolescent depression. Additionally, both physical activity and dietary levels contribute to mediating effects related to mental health. By concentrating on improving adolescents’ lifestyle choices and effectively guiding their exercise and dietary behaviors, we can help prevent and alleviate depressive symptoms to a certain extent. This strategic focus not only addresses existing issues but also promotes overall well-being in the adolescent population.
Footnotes
Acknowledgments
We are grateful to all participants in our study.
Ethical considerations
This study is based on data from a publicly available database, which has been previously deidentified and approved for academic research purposes. The data was released by the U.S. Census Bureau under approval number CBDRB-FY21-POP001-0161. The U.S. Census Bureau reviewed this data product for unauthorized disclosure of confidential information and approved the disclosure avoidance practices applied to this release. As the data are publicly available and contain no personally identifiable information, this study was exempt from institutional review board approval. As no live experiments were conducted and the data used are anonymized, this research does not require additional ethical approval or informed consent.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study supported by Jiangxi Provincial Department of Education Teaching Reform Project—Research on Physical Education Curriculum and Teaching Models Based on the Current Situation of Physical Fitness among University Students in Jiangxi under the OBE Concept (Project No. JXJG-22-16-15).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Appendix
The selected variables and their brief explanation.
| Category | Count | Codes | Explanation |
|---|---|---|---|
| Basic information | 10 | A1_AGE, A1_MENTHEALTH, A1_EMPLOYED, A1_PHYSHEALTH, A2_AGE, A2_EMPLOYED, SC_AGE, SC_SEX, SC_ENGLISH, SC_RACER | The basic information of the sample, as well as the information of its guardian |
| Child health status | 15 | K2Q01, K2Q30A, K2Q31A, K2Q32A, K2Q33A, K2Q38A, K2Q40A, MEMORYCOND, STOMACH, TOOTHACHE, HEADACHE, PHYSICALPAIN, OVERWEIGHT, ALLERGIES, BREATHING | The health conditions (psychological and physiological) and disease history of teenagers |
| Family and community | 15 | ACE1, ACE3, ACE4, ACE5, ACE6, ACE7, ACE8, ACE9, K10Q12, K10Q13, K11Q60, K11Q61, K11Q62, FAMCOUNT, FAMILY_R | Factors related to the family's economic situation (including housing, dietary levels, etc.) and the community environment |
| Child development | 12 | K6Q15, K6Q20, K6Q60_R, K7Q02R_R, K7Q30, K7Q31, K7Q32, K7Q33, K7Q37, BEDTIME, PHYSACTIV, STARTSCHOOL, MAKEFRIEND | The development of teenagers’ comprehensive qualities such as education and sports |
| Healthcare utilization | 8 | K5Q40, K5Q41, DOCPRIVATE, NOTOPEN, HCCOVOTH, ALTHEALTH, ARRANGEHC, COVIDARRANGE | The situation of health security, such as medical insurance, regular medical treatment, physical examinations, etc. |
