Sage Journals: Discover world-class research

Abstract

Socioeconomic status (SES) measures one’s access to social resources across various dimensions. Traditionally, studies on SES commonly use principal component analysis (PCA), a data-driven method, to condense these dimensions into components, typically selecting the first component to represent SES. However, PCA may lack specificity for particular outcomes. Decision tree analysis (DTA), a knowledge-driven approach that identifies outcome-specific dimensions, may address PCA’s weaknesses but might not comprehensively capture SES. This study hypothesized that combining DTA and PCA to create SES predictors could enhance predictive accuracy more than using PCA alone could. It also explored whether the DTA-PCA combination, incorporating only significant loading indicators (SLIs) of the first component, could simplify SES predictors without compromising predictive accuracy. The study analyzed 12 SES indicators from the Study of Mothers’ and Infants’ Life Events Affecting Oral Health (SMILE) birth cohort study, involving 2,182 children. Five SES composites were created: 1 solely from DTA-identified indicators and 2 pairs combining values from either the entire first PCA component or SLIs with and without DTA. These composites served as predictors for predicting dental caries in 5 predictive models. Model accuracy was evaluated using root mean squared error with 5-fold cross-validation. SES composites derived from the DTA-PCA combination demonstrated superior predictive accuracy compared with those from the PCA-only approach. By incorporating only SLIs, this hybrid method generated SES predictors that not only outperformed those using the entire first component but also demonstrated noninferiority relative to the DTA-only method. This approach offers a promising framework for developing SES composites to predict dental caries, potentially improving the precision of predictive models. In addition, this method offers a practical framework for creating composite predictors from multi-item measurements across various outcomes. For future research using this method, a 3-step process is recommended: (1) identify relevant items using DTA, (2) determine their weights through PCA, and (3) generate a composite using the SLIs.

Keywords

dental caries social class hybrid models principal component analysis decision tree analysis cohort studies

Introduction

Socioeconomic status (SES), defined as one’s access to social and economic resources (Antonoplis 2023), is consistently reported as a macro-level factor that significantly influences nearly all health-related outcomes (Jones et al. 2019). SES studies typically involve multiple dimensional measurements, which are ultimately combined into an SES composite. Principal component analysis (PCA), a data-driven approach, is commonly used to reduce data dimensionality into a set of linearly uncorrelated principal components. These components are ordered by the amount of variance they capture, with the first component capturing the most information (Sarstedt and Mooi 2019; Gewers et al. 2021). Researchers often select the first component as the SES composite for further analysis.

However, PCA faces challenges in achieving specificity for predicting particular outcomes of interest (Vieira et al. 2022; Jaadi 2023), as different sets of indicators can construct varying components (Antonoplis 2023). The selection of indicators for creating SES composite remains controversial and lacks consensus (Long and Renbarger 2023). Therefore, it is crucial to seek alternative methods for developing SES composites that provide better predictive accuracy beyond PCA (Yost et al. 2001; Antonoplis 2023).

The recently introduced decision tree analysis (DTA) identifies SES indicators based on a knowledge and experience approach. DTA pinpoints relevant indicators by revealing the mechanisms linking SES to specific outcome of interest (Antonoplis 2023). While this method addresses the specificity weakness in PCA, it advocates for the individual use of SES indicators, which may limit their comprehensive predictive ability (Long and Renbarger 2023).

This study hypothesized that the DTA-PCA combination could yield SES components with better predictive accuracy for caries experience at age 5 than using PCA only. In addition, it aimed to evaluate whether the DTA-PCA combination, incorporating only a subset of indicators that significantly load onto the first PCA component, demonstrates better accuracy than using the entire first component and noninferior predictive accuracy compared with DTA alone.

The Study of Mothers’ and Infants’ Life Events Affecting Oral Health (SMILE) provides 12 key SES indicators along with clinical examinations for decayed, missing, and filled surfaces (dmfs) at age 5 offering a valuable opportunity to address these hypotheses. The ultimate goal of this study is to identify strategies for creating a robust SES predictor for specific outcomes. The analyses involved: (1) creating different SES composites using data informed by 3 approaches: DTA only, PCA only, and PCA with inputs from DTA; (2) evaluating the predictive accuracy of the created SES composites by applying predictive regression models; and (3) validating the predictive accuracy of these SES composites using K-fold cross-validation (Winoto and Roy 2023).

Methods

This study conformed to the STROBE guidelines.

SMILE

The SMILE study, funded by the National Health and Medical Research Council (NHMRC) project grant 1046219 (2013–2016), received ethical approval from the Southern Adelaide Clinical Human Research Ethics Committee (HREC), South Australian Women and Children Health Network HREC, and clinical governance clearance from the participating maternity hospitals (Do et al. 2014). SMILE recruited 2,182 mother-infant dyads from Adelaide’s 3 largest public hospitals between mid-2013 and mid-2014, exceeding the target sample of 1,700. Mothers consented within 48 hours postpartum, with incentives provided. More details of participant recruitment can be found elsewhere (Do et al. 2014). Follow-ups occurred at 3, 6, 12, 24, and 60 months, with participant responses detailed in Appendix Table 2. The study followed approved protocols for human subjects and did not involve animal samples.

SES indicators

At baseline, mothers or caregivers answered 21 SES-related questions adapted from validated items used from the Australian National Child Oral Health Study (Do et al. 2016). Eleven questions were grouped into 7 child/family indicators, while the remaining 10 formed 5 parental indicators, as detailed in Appendix Table 1. The final 12 SES indicators and their distributions are presented in Appendix Table 2.

Dental caries experience at age 5

Oral epidemiological examinations were conducted at 2 and 5 years of age to score for dmfs. The dmfs score was measured as count data, with each tooth surface exhibiting decay, being missing, or filled receiving a score of 1. The total dmfs score represents the sum of these scores for affected teeth. This study used dmfs score at age of 5 for the analysis.

Study Design

This study was conducted in 3 stages: first, DTA and PCAs (with and without DTA inputs) were conducted to create SES components; second, 5 SES composites were developed from these components; and third, predictive models were fitted and evaluated with K-fold cross-validation to determine the best SES composite predictor for dmfs scores (Figure 1).

Figure 1.

Flow chart of study design.

Stage 1: Creating SES components

The 12 SES indicators from the SMILE study were analyzed and categorized using 3 different methods: DTA, PCA, and DTA and PCA combination (DTA-PCA) to create SES components.

DTA

DTA was undertaken with 4 main steps of pathway analysis including specifying dmfs at age 5 as the outcome, exploring the main pathways/mediators linking SES to dmfs, translating the identified pathways into directed acyclic graphs, and matching the 12 available SES indicators from the SMILE study to each of the main mediators. Indicators that matched well with the identified mediators were kept for creating the SES-DTA composite in stage 2 and served as inputs for PCA.

PCA with and without DTA inputs

Two groups of PCA were conducted: PCA with DTA and PCA without DTA. In PCA with DTA, the only indicators identified from DTA were employed for PCA, while in PCA without DTA, all 12 SMILE variables were used. The process of conducting PCA, both with and without DTA, followed the 5 main steps (Jaadi 2023).

Standardize values of the 12 SES indicators

Compute the covariance matrix of these 12 SES indicators

Estimate eigenvalues to preliminarily have an idea of how many components should be retained (eigenvalues ≥ 1)

Evaluate 3 criteria to decide the number of components to retain, including the cumulative variance (≥0.5), unexplained variance (≤0.5), and the adequacy of sample size (Kaiser-Meyer-Olkin ≥0.5)

Identify factor loadings of the indicators in the first component and identify which indicators load significantly in this component (factor loadings ≥ 0.3)

Stage 2: SES composites

In this stage, 5 SES composites were created based on the SES components derived from DTA, PCA, and the combined DTA-PCA approaches. Specifically, 1 composite created by summing values of indicators identified solely from DTA; 2 composites summing all 12 indicators used for PCA, weighting them by their factor loadings of the first component, 1 with and 1 without DTA combination; and 2 composites using only significant loading indicators of the first components, again with and without DTA combination.

Stage 3: Predictive regression modeling, model evaluation, and validation

Predictive regression models

The 5 SES composites created in stage 2 were used as predictors to fit 5 predictive models where dmfs at age 5 y served as the outcome. Given that 76.6% of the dmfs score skews toward the zero value, zero-inflated regression models were applied. Sugar intake trajectories, plaque index at ages 2 and 5 y, and the number of dental visits in the first 5 y of a child’s life were controlled in model fits. Sugar intake was categorized into “low and moderate increase” and “high increase” based on group-based trajectory modelling (Ha et al. 2023). The plaque index, used as indicator of oral hygiene (Toledo Reyes et al. 2023; Ugolini et al. 2023), was classified into 4 levels: 0 (no plaque) to 3 (abundance of plaque), and the number of dental visits was used as a cumulative count variable. As a rule of thumb, only responses from these study variables, excluding missing values, were used to conduct the regression models.

Model evaluation

Root mean squared error (RMSE) is a common metric to evaluate the accuracy of the predictive models due to its sensitivity to errors and its intuitive interpretation (Max Kuhn 2013). RMSE measures the average magnitude of prediction errors by calculating the square root of the mean squared error (MSE) between predicted and observed values. This approach gives more weight to larger errors, which helps in identifying models that may underperform on extreme values. RMSE is expressed in the same units as the response variable, making it easy to understand. A lower RMSE value indicates a better model fit and less prediction error. RMSE is particularly suited for continuous outcome variables and is commonly used in regression analyses (Chai and Draxler 2014). For this study, RMSE was estimated postmodeling to assess the predictive accuracy of SES predictors.

The equations of MSE and RMSE are as follows:

M S E = \frac{1}{n} \sum_{1}^{n} {(y_{i} - {y h a t}_{i})}^{2}

R M S E = \sqrt{M S E}

where n represents the number of observations, y_i the actual observed value of the outcome, and yhat_i represents the predicted values of the outcome.

K-fold cross-validations

K-fold cross-validation is a technique in which the entire data set is randomly divided into approximately k-equal folds. Each fold is used once as the test set, while the remaining k − 1 folds are used for training. For each iteration, the model is trained on k − 1 folds and tested on the remaining fold. This approach helps to validate the reliability of predictive regression models by reducing bias associated with a single train-test split and providing a more stable estimate of model performance through multiple train-test splits (Max Kuhn 2013).

In this study, the SMILE dataset was split into 5 folds, and RMSE values were calculated for each test set based on the training set parameters. The average RMSE values from the 5 test sets were compared. The model with the lowest average RMSE was deemed the most accurate for predicting dmfs at age 5.

Results

DTA

Appendix Figure 2 illustrates the use of DTA to identify pathways linking SES to dmfs at age 5 y. Three primary mediators were identified between SES and dmfs including the disadvantages of the area where the child lives (Armfield 2007), the financial and health care constraints of the family (Kumar et al. 2014), and the family’s cultural and educational disparities (Pezo Lanfranco and Eggers 2012). Among 12 SES indicators from the SMILE study, 10 were found to closely contribute to these mediators and were thus retained for creating SES composites and serving as inputs for PCA.

PCA

The number of responses for each of the 12 SES indicators, along with their distributions (mean, standard deviation, minimum, and maximum values) before and after standardisation (Appendix Table 5). The correlation matrix among the 12 indicators (Appendix Table 6) revealed significant correlations mainly among health insurance, income, occupation, education, English as a first language, attitude about family financial status, and the number of parents in the household.

PCA results, with and without DTA inputs, consistently indicated that 3-component models were more suitable than 2-component models. The eigenvalue thresholds for up to 3 components (Appendix Table 7), were all ≥1, and scree plots (Appendix Fig. 3) showed stabilization around an eigenvalue of 1. Model fit metrics (Appendix Table 8) also favored 3-component models, with cumulative variances meeting the >0.5 threshold better than those of 2-component models (PCA only: 0.58 vs. 0.46; DTA-PCA: 0.62 vs. 0.51). In addition, the unexplained variances for 3-component models met the <0.5 threshold better than those of 2-component models (PCA only: 0.42 vs. 0.54; DTA-PCA: 0.32 vs. 0.41). Consequently, only the first components of the 3-component PCA models were used for further analyses.

In PCA without DTA, 6 of the 12 indicators significantly loaded into the first component, while in PCA with DTA, 5 of the 10 indicators did so. Details of these indicators and their factor loadings are presented in Figure 2 and Appendix Table 9.

Figure 2.

Creating socioeconomic status (SES) components and composites (principal component analysis).

SES composite creations

Figure 2 visualizes the creation of 5 SES composites: SES-DTA, which is a composite of the 10 indicators identified by DTA; SES-all (PCA only), which sums 12 indicators weighted by their factor loadings from PCA; SES–significant loading indicator (SLI) (PCA only), a composite of 6 significant loading indicators weighted by PCA factor loadings; SES-all (DTA-PCA), which includes 10 indicators identified by DTA, weighted by factor loadings from DTA-PCA; and SES-SLI (DTA-PCA), a composite of 5 significant loading indicators weighted by DTA-PCA factor loadings. These 5 SES composites were used as predictors for fitting predictive regression models.

Predictive Regression Models

The distribution of dmfs at age 5 was highly skewed, with a skewness of 6.45 and kurtosis of 58.64. Up to 76.6% of the observations scored 0, while the mean dmfs was 1.34 with a standard deviation of 4.54, reflecting a wide range from 0 to 60 (Table 1, Appendix Fig. 1 and Appendix Table 3). Five zero-inflated regression models were initially conducted without controlling for confounders and later adjusting for them. Since participants had the right to decline participation at any round or stage of the survey or to refuse to answer any questions, only completed cases for both SES information and clinical examination at age 5 were included for each of these 5 regressions, respectively.

Table 1.

Distribution of SES Predictors, Confounders, and Outcome of the Predictive Models.

Variable	Variable	n	Mean	SD	Min	Max
SES predictors
DTA only	SES-DTA	1,842	0.62	4.80	−15.80	11.17
PCA only	SES-all	1,832	1.92^a	1.73	−5.79	4.23
PCA only	SES-SLI	1,873	0.15	1.51	−4.42	3.63
DTA-PCA combination	SES-all	1,842	2.35^b	1.72	−5.84	3.73
DTA-PCA combination	SES-SLI	1,878	0.15	1.46	−4.19	3.23
Controlling variables
Trajectories of sugar intake (1, 2, and 5 y)		1,395	0.87	0.34	0	1
Plaque index at 2 y of age		1,041	0.40	0.55	0	2
Plaque index at 5 y of age		866	1.22	0.64	0	3
Number of dental visits in the first 5 y of life		928	2.61	1.69	1	8
Outcome
Decayed, missing, and filling score at age of 5 y		830	1.34^c	4.54	0	60

DTA, decision tree analysis; PCA: principal component analysis; SD, standard deviation; SES, socioeconomic status; SES-all, SES of 12 indicators; SES-SLI, SES distilled from only 5 significant loading SES indicators.

10⁻⁹.

10⁻¹⁰.

Skewness: 6.45, kurtosis: 58.64, median: 0, 75th percentile: 0.

Of the 2,182 participants at baseline, 1,842 completed the SES indicators for the DTA-only composite, while 1,832 and 1,873 completed PCA-only indicators for SES-all and SES-SLI, respectively. For the DTA-PCA combination, 1,842 and 1,878 participants completed the SES-all and SES-SLI indicators, respectively (Table 1). Among the 830 participants clinically examined at age 5, 761, 758, 765, 761, and 767 completed SES information of these 5 SES composites, respectively (Table 2 and Appendix Table 4). The comparison of participants who remained at age 5 y versus those who were at the baseline revealed the remainers had higher standardized SES composite scores (0.55 vs. 0.15; P < 0.001, t test) (Appendix Table 10).

Table 2.

Root Mean Square Errors of 5 Predictive Models.

SES Composite		Preditive Model
		Entire Data				K-Fold Cross-Validation
		Unadjusted		Adjusted		Unadjusted		Adjusted
		N	RMSE	N	RMSE	n	RMSE	n	RMSE
DTA only	SES-DTA	761	2.52	586	1.68	609	2.53	469	1.69
PCA only	SES-all	758	2.04	583	1.72	606	2.03	466	1.73
PCA only	SES-SLI	765	1.88	590	1.73	612	1.87	472	1.74
DTA-PCA combination	SES-all	761	1.97	586	1.70	609	1.94	469	1.72
	SES-SLI	767	1.86	592	1.67	614	1.83	474	1.69

DTA, decision tree analysis; DTA-PCA, combination between DTA and PCA; N, sample size of the entire data set; n, sample size of the train set; PCA, principal component analysis; RMSE: root mean square error; SES, socioeconomic status; SLI, significant loading indicator.

The results showed that all 5 SES composites significantly predicted dmfs at age 5 (Fig. 3). However, the DTA-PCA combination consistently demonstrated better predictive accuracy, with lower RMSE values (Table 2). For SES-all, the RMSE was 1.97 versus 2.04 (unadjusted) and 1.70 versus 1.72 (adjusted). For SES-SLI, it was 1.86 versus 1.88 (unadjusted) and 1.67 versus 1.73 (adjusted). SES composites from DTA-PCA using only significant loading factors of the first component showed even better accuracy than those using the entire component. Notably, this composite outperformed SES-DTA in the unadjusted model (RMSE: 1.86 vs. 2.52) and was comparable in the adjusted model (RMSE: 1.67 vs. 1.68).

Figure 3.

Results of regression models using 5 SES composites to dmfs at the age of 5 y. DTA, decision tree analysis; DTA-PCA, combination between DTA and PCA; PCA, principal component analysis; SES, socioeconomic status; SLI, significant loading indicator.

K-fold cross-validations

Results of the regression models using the entire data set and those with spliting data (K-fold cross-validation) (Table 2, Appendix Table 10 and Table 11) supported these findings, reinforcing that SES composites from the DTA-PCA combination exhibited superior predictive accuracy compared with PCA alone. RMSE values for SES-all and SES-SLI were consistently lower for DTA-PCA than for PCA only, in both unadjusted and adjusted models. In addition, DTA-PCA composites with significant loading indicators showed better predictive accuracy than those using the entire first component and noninferior accuracy compared with the SES-DTA in the adjusted models.

Discussion

This study investigated effective strategies for creating robust SES predictors to forecast dental caries. Five SES composites were developed using different methods: DTA alone, PCA with and without DTA, and combinations including either the entire first loading component or only significant loading factors. The results indicated that the SES composites created from the DTA-PCA combination exhibited better predictive accuracy than those from PCA alone method. Specifically, using only significant loading indicators from the DTA-PCA combination yielded superior accuracy than the entire first component and was comparable with the DTA-only model.

These findings support the hypothesis that combining a data-driven method and expert knowledge enhances the model accuracy, aligning with recent trends in predictive modeling (Hasidi et al. 2024). Incorporating only significant loading indicators maintains model accuracy while simplifying it, supporting the parsimony approach (Antonoplis 2023) and the recommendation for strategic item selection while employing multi-item SES measurements (Long and Renbarger 2023).

Traditional methods for creating SES composites struggle with weighting indicators (Office of Behavioral and Social Sciences Research Access 2024, https://obssr.od.nih.gov/sites/obssr/files/Measuring-Socioeconomic-Status.pdf). While PCA addresses this issue (Vieira et al. 2022), it often falls short in creating outcome-specific composites (Vyas and Kumaranayake 2006). Although DTA addresses PCA’s limitations by incorporating expert judgment, it may not capture the full variance of the study variables (Antonoplis 2023). This study demonstrates that combining DTA and PCA effectively tackles these challenges, providing a nuanced understanding of SES and its impact on dental caries (Vyas and Kumaranayake 2006; Max Kuhn 2013).

Previous SES research has typically used either data-driven (Gewers et al. 2021; Klosterman 2021) or knowledge-driven approaches (Antonoplis 2023). This study extends the field by showing that a hybrid approach offers superior predictive accuracy compared with the data-driven method alone, supporting the growing recognition of hybrid strategies in machine learning (Li and Chu 2023). It also reaffirms the importance of the “big three” SES measures including income, education, and occupation, commonly reported overtime (Shavers 2007; Majumder 2021), which were significant in both PCA and DTA approaches.

The study’s strength lies in advocating for the hybrid DTA-PCA approach, which achieved comparable predictive accuracy to DTA alone but with fewer indicators. This combination leverages the knowledge-driven strengths of DTA while enhancing efficiency through PCA. In addition, k-fold cross-validation ensured robust model evaluation, mitigating overfitting and underfitting risks (Max Kuhn 2013). The consistent predictive accuracy observed in both standard and 5-fold cross-validated models underscores the effectiveness of this approach.

DTA, as a knowledge-driven strategy, is particularly suitable for studies such as ours, in which prior knowledge about relevant features is available (Antonoplis 2023). It incorporates this knowledge to create a pathway linking SES exposure to Early Childhood Caries (ECC) outcomes, ensuring only meaningful SES indicators within the pathway are included. In contrast, random forest, a purely data-driven method, relies on multiple randomly created decision trees to identify patterns and is better suited for cases lacking prior knowledge of relevant features (Klosterman 2021).

The use of only significant loading indicators for creating SES composites allows for efficient questionnaire design and provides guidance on essential indicators in contexts in which extensive data collection is challenging. Specifically, this study recommends including at least 5 key SES indicators (income, occupation, education, work, and health insurance) for predicting dental caries.

The study’s birth cohort design faced participant attrition, leading to differences between those who remained and dropped out. However, this likely had minimal impact, as low-SES mothers were oversampled at baseline (Do et al. 2014). Evidence supports this approach, as the dmfs score at age 5 y in SMILE (1.34, 95% confidence interval [CI]: 1.0–1.6) aligns with the South Australian population (1.40, 95% CI: 1.0–1.6), with comparable proportions of children affected (23.4%, 95% CI: 20.6–26.4 vs. 25.3%, 95% CI: 20.5–30.8) (Appendix Table 12) (Do et al. 2016). In addition, the remaining sample sizes used for modeling were verified with G*Power software (Faul et al. 2009) (Appendix Fig. 4), achieving 100% statistical power across all models (Appendix Table 13–16).

Future research should adopt a 3-step process to develop composite predictors from multidimensional measurements: first, identify relevant indicators using DTA; second, determine their weights with PCA; and third, generate a composite predictor using significant loading items. In addition, the model, developed in a low-caries population, should be validated in high-risk groups for generalizability.

Conclusion

The DTA-PCA combination, using significant loading indicators, offers a robust method for creating accurate SES composites to predict dental caries. This approach achieved performance similar to the decision tree model with fewer indicators. It enhances SES measurement precision and provides a practical framework for developing composite predictors from multi-item measures.

Author Contributions

A.T.M. Dao, contributed to conception, design, data acquisition, analysis, and interpretation, drafted and critically revised the manuscript; L.G. Do, N. Stormon, H.V. Nguyen, D.H. Ha, contributed to conception, design, data interpretation, critically revised the manuscript. All authors gave final approval and agree to be accountable for all aspects of the work.

Supplemental Material

sj-docx-1-jdr-10.1177_00220345251324494 – Supplemental material for Enhancing Socioeconomic Status Prediction for Cavities: A Hybrid Method

Supplemental material, sj-docx-1-jdr-10.1177_00220345251324494 for Enhancing Socioeconomic Status Prediction for Cavities: A Hybrid Method by A.T.M. Dao, L.G. Do, N. Stormon, H.V. Nguyen and D.H. Ha in Journal of Dental Research

Footnotes

Acknowledgements

We would like to express our gratitude to the research team of the SMILE for their efforts in securing the study grant and coordinating and implementing the surveys. We also thank the SMILE participants for providing information and taking part in clinical examinations. In addition, we would like to thank the Oral Health Centre (OHC) at the University of Queensland for their technical support of the present manuscript.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study used data from the Study of Mothers’ and Infants’ Life Events Affecting Oral Health (SMILE), which was conducted under the Project Grant APP1161581 support by the NHMRC. The authors received no financial support for authorship, and/or publication of this article.

ORCID iDs

A.T.M. Dao

L.G. Do

N. Stormon

A supplemental appendix to this article is available online.

References

Antonoplis

. 2023. Studying socioeconomic status: conceptual problems and an alternative path forward. Perspect Psychol Sci. 18(2):275–292.

Armfield

. 2007. Socioeconomic inequalities in child oral health: a comparison of discrete and composite area-based measures. J Public Health Dent. 67(2):119–125.

Chai

Draxler

R.R.

2014. Root mean square error (RMSE) or mean absolute error (MAE)? – Arguments against avoiding RMSE in the literature. Geosci. Model Dev. 7, 1247–1250.

Luzzi

Roberts-Thomson

Chrisopoulos

Armfield

Spencer

AJ.

2016. In: Do

Spencer

, editors. Oral health of Australian children: the National Child Oral Healthy Study 2012-14. Adelaide: University of Adelaide Press. Chapter 5 p. 288–305.

Scott

Thomson

Stamm

Rugg-Gunn

Levy

Wong

Devenish

Spencer

. 2014. Common risk factor approach to address socioeconomic inequality in the oral health of preschool children—a prospective cohort study. BMC public health. 14:429. doi:10.1186/1471-2458-14-429

Faul

Erdfelder

Buchner

Lang

A-G.

2009. Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses. Behav Res Methods. 41(4):1149–1160.

Gewers

Ferreira

De Arruda

Silva

Comin

Amancio

Costa

LDF

. 2021. Principal component analysis: a natural approach to data exploration. ACM Comput Surv. 54(4):1–34.

Nguyen

Bell

Devenish-Coleman

Golley

Thomson

Manton

Leary

Scott

Spencer

, et al. 2023. Trajectories of child free sugars intake and dental caries—a population-based birth cohort study. J Dent. 134:104559. doi:10.1016/j.jdent.2023.104559

Hasidi

Abdelwahed

El Alaoui-Chrifi

Qazdar

Bourzeix

Benzakour

Bendaouia

Dahhassi

2024. Data-driven and model-driven approaches in predictive modelling for operational efficiency: mining industry use case. In: Mosbah

Kechadi

Bellatreche

Gargouri

, editors. Model and data engineering. Proceedings of the 12th International Conference, MEDI 2023; Sousse, Tunisia; 2023 November 2–4. Cham (Switzerland): Springer. p. 116–127.

10.

Jaadi

. 2023. Principal component analysis (PCA): a step-by-step explanation. Chicago (IL): Built In; [accessed 2025 Feb 12]. https://builtin.com/data-science/step-step-explanation-principal-component-analysis.

11.

Jones

JRA

Berney

Connolly

Waterland

Denehy

Griffith

Puthucheary

. 2019. Socioeconomic position and health outcomes following critical illness: a systematic review. Crit Care Med. 47(6):e512–e521. doi:10.1097/ccm.0000000000003727

12.

Klosterman

. 2021. Data science projects with Python. Birmingham (UK): Packt Publishing.

13.

Kumar

Kroon

Lalloo

2014. A systematic review of the impact of parental socio-economic status and home environment characteristics on children’s oral health related quality of life. Health Qual Life Outcomes. 12:41. doi:10.1186/1477-7525-12-41

14.

Chu

2023. Machine learning for causal inference. Cham (Switzerland): Springer.

15.

Long

Renbarger

2023. Persistence of poverty: how measures of socioeconomic status have changed over time. Educ Res. 52(3):144–154.

16.

Majumder

2021. Socioeconomic status scales: revised Kuppuswamy, BG Prasad, and Udai Pareekh’s scale updated for 2021. J Family Med Prim Care. 10(11):3964–3967.

17.

Max Kuhn

. 2013. Applied predictive modeling. New York (NY): Springer.

18.

Pezo Lanfranco

Eggers

. 2012. Caries through time: an anthropological overview. In: Li

M-Y

, editor. Contemporary approach to dental caries. Rijeka (Croatia): IntechOpen. p. 3–34.

19.

Sarstedt

Mooi

2019. Principal component and factor analysis. In: A concise guide to market research. Heidelberg (Germany): Springer. p. 257–299.

20.

Shavers

. 2007. Measurement of socioeconomic status in health disparities research. J Natl Med Assoc. 99(9):1013–1023.

21.

Toledo Reyes

Knorst

Ortiz

Brondani

Emmanuelli

Saraiva Guedes

Mendes

Ardenghi

. 2023. Early childhood predictors for dental caries: a machine learning approach. J Dent Res. 102(9):999–1006.

22.

Ugolini

Porro

Carli

Agostino

Silvestrini-Biavati

Riccomagno

2023. Probabilistic graphical modelling of early childhood caries development. PLoS One. 18(10):e0293221. doi:10.1371/journal.pone.0293221

23.

Vieira

WdC

Neto

JAF

Roque

da Rocha

. 2022. Using principal component analysis to build socioeconomic status indices. Int J Stat Appl. 12(3):77–82.

24.

Vyas

Kumaranayake

2006. Constructing socio-economic status indices: how to use principal components analysis. Health Policy Plan. 21(6):459–468.

25.

Winoto

Roy

AFV

. 2023. Model of predicting the rating of bridge conditions in Indonesia with regression and k-fold cross validation. Int J Sustain Constr Eng Technol. 14(1):249–259.

26.

Yost

Perkins

Cohen

Morris

Wright

2001. Socioeconomic status and breast cancer incidence in California for different race/ethnic groups. Cancer Causes Control. 12(8):703–711.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.30 MB