Abstract
This study investigates the factors contributing to early school dropout in vocational and technical high schools in Turkey, utilizing machine learning techniques to analyze a dataset of personal, socio-economic, familial, and academic variables. The data was collected via a detailed survey administered to students at one of the largest Vocational and Technical High School in Istanbul, capturing 35 features (factors) relevant to dropout rates. Various classifiers, including Decision Trees and Random Forest, were employed to identify at-risk students with high accuracy. The Decision Tree model, enhanced by the Synthetic Minority Over-sampling Technique (SMOTE), demonstrated the best results for identifying potential dropouts, indicating its effectiveness in educational settings where early intervention is critical. By feature importance analysis this research reveals that parental education levels, family structure, and financial hardships are significant predictors of dropout likelihood. Despite the study’s limitations, such as a small dataset and some features with zero-filled columns, the results underscore the importance of data-driven approaches in developing targeted interventions to reduce dropout rates. This research not only enhances the understanding of dropout phenomena in Turkish vocational education but also provides practical insights for policymakers and educators to improve student retention through early and informed interventions. The findings highlight the potential of machine learning to enhance educational support systems, ensuring that every student can succeed.
Keywords
Introduction
As industries change and advance across the world, the need for specialized skills and knowledge continues to grow. This makes preparing young people for the workforce essential and positions vocational and technical high schools as a vital part of that process. However, simply having a skilled workforce is not enough. As the Organization for Economic Co-operation and Development (OECD, 2023) points out, these skills must be put to good use in the labor market to truly drive long-term economic growth. Vocational and technical education plays a crucial role in this process—not just by teaching students’ industry-specific knowledge, but by helping them gain real-world experience so they can apply what they learn effectively. Despite its importance, one major challenge remains: a high number of students leave these schools before completing their education, especially in Turkey. This issue has serious long-term consequences. When students drop out early, their career opportunities become limited, and the overall economy feels the impact as well. Addressing this challenge is essential, not just for the individuals affected, but for the strength and future of the workforce as a whole.
School dropout, which means quitting school because of failing or negative circumstances, happens at every educational level (Ogresta et al., 2021). The school dropout rate is seen as one of the biggest causes of a country’s level of education and future problems (Ozer & Perc, 2020). School dropout, which is an important education problem for developed and developing countries, can lead to waste of investments made by the state in individuals and therefore economic, cultural, and social problems (Kartal & Ballı, 2020). This situation concerns the school, family, and public authority as much as the student (Behr et al., 2020). Despite the allure of practical skills and a direct pathway to professional fulfillment, vocational and technical schools in Turkey face significant dropout rates (Korumaz & Ekşioğlu, 2022). A significant factor behind this decline in student retention can be traced back to past policy changes. In the 1998 to 1999 academic year, a coefficient system was introduced, making it harder for vocational school graduates to enter university (Sönmez, 2010). The “coefficient system” used in Turkey’s university admissions from 1999 to 2011 was a weighted scoring method that adjusted students’ placement scores based on the type of high school they graduated from and whether their intended university major aligned with their high school field. Students received a higher coefficient (c) if they applied to a program related to their high school specialization, and a lower one if unrelated, affecting their final score (Y-ÖSS). Additionally, vocational and religious high school graduates could receive extra points (via a second coefficient, d) when applying to corresponding university programs. The aim was to guide students toward fields aligned with their prior education, but it generated controversy for creating inequalities and limiting access to certain programs for students from specialized schools (Doğan & Yuret, 2015). This policy although later revoked, left a lasting impact, shaping how vocational education is perceived even today. Efforts have been made to address this issue. Programs such as the 2023 Education Vision (Ministry of National Education [MoNE], 2022) aimed to improve the situation, yet dropout rates remain a major concern. These numbers suggest that while progress has been made, further action is still needed to ensure that students in vocational schools are supported and encouraged to complete their education.
While dropout rates have been extensively studied in general high schools, vocational education remains underexplored, especially in Turkey. Existing studies primarily focus on academic performance and economic hardship as predictors of dropout, but vocational students face a distinct set of challenges, including family instability, financial difficulties, and peer influences barriers (Banaag et al., 2024). This research expands on prior studies by incorporating a broad spectrum of personal, socio-economic, familial, and academic factors to provide a more comprehensive understanding of dropout risk. Unlike conventional research that relies on statistical analysis, this study employs machine learning techniques to uncover complex, nonlinear relationships between these factors, as recommended by recent predictive modeling studies (Basnet et al., 2023; Niyogisubizo et al., 2022).
Using machine learning (ML) to predict school dropouts is a powerful method that improves accuracy and provides useful information for preventing students from leaving school. Early studies by Tan and Shao (2015) demonstrated the efficacy of ML methods in e-learning environments to predict dropout behavior, highlighting the need for diverse data samples for accurate models. Similarly, Mduma et al. (2019) emphasized the necessity of including school-level datasets in developing countries to address dropout effectively. Advanced predictive models have also been developed, as seen in the work of Fernández-García et al. (2021), who created a real-life ML model for predicting university dropout at different stages, showcasing the practicality of ML in higher education. Rodríguez et al. (2023) provided a comprehensive ML framework for predicting school dropout in Chile, focusing on the structural problems of dropout, and offering a methodology for designing and evaluating these models. Specific applications of ML techniques have further illustrated their benefits. Freitas et al. (2020) integrated IoT and ML to predict school dropout based on socioeconomic data, facilitating timely interventions. Lee and Chung (2019) developed a dropout early warning system using supervised learning, achieving significant improvements in prediction accuracy with extensive educational datasets. Key insights into feature importance have been provided by Colak Oz et al. (2023), who emphasized the role of socioeconomic and behavioral data in accurate dropout predictions. Sansone (2019) highlighted the importance of feature selection and model specificity in enhancing ML’s utility beyond traditional early warning indicators. Comparative analyses have shown that ML models often outperform traditional methods. Nagy and Molontay (2018) found that advanced ML algorithms, including deep learning, predicted dropout based on secondary school performance more accurately than traditional regression models. Selim and Rezk (2023) applied ML techniques in Egypt, showcasing the adaptability of these models across different socio-economic contexts and the importance of localized data. Dalipi et al. (2018) reviewed ML techniques for predicting MOOC dropouts, identifying challenges and future research directions to enhance prediction effectiveness in online learning environments. Collectively, these studies reveal that ML provides targeted actions to keep students in school and improve their education. By using advanced technology with educational data, schools can proactively address dropout rates, making the educational environment more supportive and responsive.
This study enhances the understanding of early school dropout in Turkish vocational and technical high schools by using machine learning (ML). Complementing prior research, this study encompasses a broader spectrum of personal, socio-economic, familial, and academic variables to offer a more comprehensive understanding of dropout. Specifically, it seeks to explore the key socio-economic, familial, and academic factors that contribute to early school dropout in vocational and technical high schools in Turkey. Additionally, it investigates the extent to which machine learning models can accurately predict students at risk of dropping out based on these factors and aims to identify which predictive features have the most significant impact on determining dropout risk. By addressing these research questions, this study contributes to the ongoing conversation about how predictive analysis can be effectively used in education, particularly in Turkish vocational education. The integration of innovation, theoretical frameworks, and a dedicated focus on Turkish vocational education presents promising steps toward addressing dropout rates. This approach enriches ongoing academic discussions and offers practical methods to enhance educational outcomes.
Literature Review
The Turkish Education System and Vocational & Technical Education
Turkey’s education system undergone multiple reforms, with one of the most significant being the introduction of the 4+4+4 education system in 2012. This education system divides schooling into three phases: 4 years of primary education, 4 years of lower secondary education (middle school), and 4 years of upper secondary education (high school, including both general and vocational-technical tracks). The reasons behind the introduction of the 4+4+4 education reform were making schooling more flexible and boost participation, especially in vocational and technical education (MoNE, 2012). Creating a system where students could progress more easily and gain practical skills that align with job market demands was the main goal. However, vocational education in Turkey still faces significant challenges. High dropout rates remain a major issue, and many students struggle within the system (Korumaz & Ekşioğlu, 2022).
Vocational and technical high schools in Turkey try to equip students with practical skills for labor market and they are an alternative to general academic high schools. These schools focus on specialized training in fields such as health sciences, woodwork and information technologies providing students with direct career pathways (Önan & Yıldırımer, 2023). However, despite their intended role in workforce preparation, various structural and socio-economic challenges continue to contribute to high dropout rates. Dropout rates in Turkey’s vocational education sector are notably higher compared to general high schools. The highest dropout rate, 34.4%, occurs in the 9th grade of vocational and technical high schools, indicating that students struggle with the transition into the vocational track (Güngör, 2019). One of the most significant issues affecting vocational students is the rigid vocational tracking system, where students are placed into these schools based on middle school performance and personal choices (Polat, 2014). Many students feel that vocational education restricts their opportunities, making them more likely to lose motivation and leave school before graduation (Taş et al., 2013). Another key factor influencing dropout rates is the socio-economic background of vocational students. A significant portion of these students come from lower-income families, making financial hardship a major barrier to continuing education (Önan & Yıldırımer, 2023). Many are forced to work while studying, which can create difficulties in balancing school and employment. Research has shown that students struggling with financial difficulties often experience academic challenges, increased absenteeism, and reduced engagement with coursework, ultimately leading to a higher risk of dropping out (Ayaz & Karacan Özdemir, 2023). Moreover, limited access to financial support programs is also an issue as vocational students often receive fewer scholarship and aid opportunities compared to their peers in general high schools. Although vocational education reforms have been implemented in Turkey to improve retention, challenges remain. Initiatives such as the “1,000 Schools in Vocational Education” project aimed to address inequalities and improve educational conditions for vocational students (Özer, 2021). But just changing the structure of the system hasn’t been enough to fix the dropout problem.
The Turkish MoNE has implemented several policies to improve vocational education and reduce dropout rates. The 2023 Education Vision established a comprehensive framework aimed at strengthening vocational education by aligning curricula with industry demands, improving teacher training, and expanding apprenticeship opportunities (MoNE, 2023). Additionally, scholarships and financial aid programs have been introduced to support economically disadvantaged students, ensuring that financial hardship does not force them to drop out. Another crucial intervention includes mentorship and career counseling programs, which have been initiated in select schools to enhance student motivation, engagement, and long-term academic persistence. These efforts aim to make vocational education more sustainable and aligned with both student needs and labor market expectations.
Determinants of School Dropout
School dropout in vocational and technical high schools is a complex issue shaped by various economic, social, academic, and psychological factors. Studies across different contexts reveal that dropout rates are strongly linked to socio-economic conditions, student engagement, and institutional support.
Socio-economic status is one of the strongest predictors of dropout risk (Finkenauer et al., 2023). Students from lower-income backgrounds often face financial hardships, the need to work while studying, and unstable home environments that interfere with their ability to focus on education (Eranıl, 2024). In Turkey, vocational students disproportionately come from lower Socio-economic status backgrounds, making them more vulnerable to dropout risks compared to their peers in general academic high schools. Parental education levels also significantly influence dropout probability, as students whose parents have lower educational attainment lack role models for academic persistence (Singh & Alhulail, 2022). Furthermore, family structure and stability play a crucial role, with students from single-parent homes, foster care, or child protection facilities showing higher dropout tendencies (Mohammad Faisal et al., 2023).
Poor academic performance is a key determinant of dropout, with research showing that students who struggle academically, receive low grades, and frequently miss school are at the highest risk (Holtmann & Solga, 2023). Tinto’s Student Integration Model suggests that students who fail to develop strong academic and social connections within their school environment are more likely to disengage (Tinto, 1993). The rigid tracking system in vocational education further limits student mobility, reducing opportunities to switch to general high schools if they experience difficulties.
Recent studies indicate that mental health challenges, chronic illnesses, and adverse childhood experiences significantly contribute to dropout risks (Ramírez Labbé et al., 2022). Students exposed to domestic violence, family members with criminal records, or substance dependence experience higher rates of school disengagement. Additionally, peer influences play a crucial role in dropout behavior. Research shows that students who associate with risky peer groups are more likely to prioritize social belonging over academic success, further increasing dropout probability (Tinto, 1993).
Theoretical Framework
Understanding the early dropout issue in vocational and technical high schools requires a strong theoretical foundation. Various educational, sociological, and psychological theories help explain the mixture of personal, familial, socio-economic, and academic factors influencing student retention.
Social capital by Bourdieu (1986) refers to the advantages people gain through their social connections and relationships, which can play a key role in shaping their educational opportunities and overall life outcomes. Students from families with higher social capital tend to have better educational support, parental involvement, and access to institutional resources, reducing the probability of dropping out. On the other hand, students from disadvantaged backgrounds may experience limited parental involvement, financial hardship, and fragmented family structures, which rise the chance of early dropout. Recent studies highlight the role of social capital in student retention (Almeida et al., 2021). Students whose parents have lower levels of education (Mother/Father with at most primary school education) tend to have fewer educational role models and lack guidance on academic pathways, increasing their dropout risk. Additionally, students living only with one parent, with grandparents, or in child protection facilities often experience weaker family support networks, further increasing their risk of dropout from school (Mohammad Faisal et al., 2023). These disadvantages align with Bourdieu’s assertion that unequal distribution of social capital leads to disparities in educational outcomes. Moreover, family instability and adversity, such as parents living separately, being divorced, deceased, or suffering from chronic or mental illnesses, reduce the protective influence of social capital, which is crucial for academic persistence.
Tinto (1993) suggests that students are more likely to persist in their education when they feel academically and socially connected to their school. If students fail to establish meaningful connections with their peers and educators, they are more likely to drop out. This theory is particularly relevant to vocational students, as they often enter high school with pre-existing disadvantages that make integration challenging. Academic struggles, such as low performance and continuous absenteeism, directly correlate with weak academic integration. Tinto’s model suggests that students with poor academic records feel alienated from the learning environment, leading to lower engagement and eventual dropout. Similarly, being part of a risky peer group may further disrupt academic engagement, as students prioritize social belonging over school performance. The role of economic barriers in academic integration is also critical. Students from families experiencing financial hardship or working while studying often struggle to balance school commitments with external responsibilities, which diminishes their engagement with school life. Recent research by Kaçar (2024) also shows that financial instability reduces students’ ability to participate in school activities and leads to an increased likelihood of dropout.
Bronfenbrenner (1979) points out that a person’s development is influenced by different layers of their environment, from close family interactions to larger societal factors. When it comes to school dropout, these layers—micro, meso, and macro—can either help a student stay in school or push them out. At the microsystem level, relationships within the family and school play a huge role in whether a student continues their education. For instance, dealing with domestic violence, having family members with criminal records, or living in a home with substance abuse can create a stressful environment that makes it difficult to focus on school. Ramírez Labbé et al. (2022) found that these types of childhood experiences are considerably influencing mental and physical health. Moving to the mesosystem, how schools respond to at-risk students is key. Providing counseling and educational support can help reduce dropout risks, but the impact really depends on how well these programs are designed and whether schools have enough resources to sustain them. Schools with solid intervention programs do a much better job at keeping students engaged, especially those from disadvantaged backgrounds. At the macrosystem level, larger social and economic factors come into play. The way vocational education is structured in Turkey—where students are often placed in these schools early and have limited opportunities to switch to general high schools—can lead to a sense of being “stuck” with fewer career options. This rigid system makes it harder for vocational students to stay motivated.
Resilience theory (Masten, 2001) suggests that students facing significant socio-economic and familial challenges can still succeed academically if they have access to protective factors. These include school support, mentorship, and personal coping strategies. Bryan et al. (2020) emphasize students who receive strong institutional support and mentoring exhibit higher resilience, even in the face of financial or familial adversity. In the current study, students under educational or counseling measures represent cases where interventions attempt to build resilience. Additionally, students diagnosed as gifted or receiving special education support may exhibit unique resilience pathways, depending on the effectiveness of tailored educational programs.
Method
This section describes steps for analyzing student dropout rates (Figure 1). First, we will start by detailing how we gathered data directly from the students of vocational and technical high school using a survey in Data Collection section, then we focus on refining our features, particularly ensuring that data are represented appropriately. In the Feature Engineering section, we describe the process adopted for data preparation including missing data handling. Next, we apply Synthetic Minority Over-sampling Technique (SMOTE) only to training data to overcome class imbalance and train classifiers to predict whether a student might drop out, based on the patterns learned from the data in the Model Training section. Finally, we will use specific techniques to make the results from the predictive models explainable, ensuring that the insights can be interpreted and acted upon in the Explaining Model Outcomes section.

Predictive modeling process to analyze student dropout rates.
Data Collection
This study took place during the 2023 to 2024 school year at one of the biggest, oldest, and most famous vocational and technical high schools in Istanbul. It was chosen because it is a significant representative of all the vocational and technical schools in Turkey. The school specializes in electronics and computer technologies, making it a good place to explore the factors linked to student dropout. Although the number of students in vocational and technical high schools in Turkey is 1,381,441; the research was limited to one vocational and technical high school, because of the sensitivity of student data. Data collection was made possible with the help of the school’s guidance counselor, who ensured that all student information remained private and confidential. Since the study looks at dropout risks related to personal, socio-economic, and academic factors, strict ethical rules were followed. Privacy policies and administrative restrictions made it difficult to expand the research to other schools
Students attending vocational and technical high schools in Turkey often come from middle to lower socio-economic backgrounds and face various challenges that increase their risk of early dropout. Many experiences financial hardship, family instability, and academic struggles, all of which contribute to disengagement from school. While some students work while studying, balancing school commitments remains a significant challenge for those who do. Academically, many students enter vocational high schools with low prior performance, which can further lead to disengagement from the education system. Family-related difficulties—such as single-parent households, foster care arrangements, and unstable home environments—also negatively impact their academic persistence. Another critical factor influencing student motivation is the limited post-graduation opportunities available to vocational high school graduates. While some continue their education, the majority transition directly into the workforce, often due to barriers in accessing higher education. Despite recent policy efforts to enhance vocational education pathways, many students still perceive vocational education as a restricted academic route, which reduces their commitment to school and increases dropout likelihood. Table 1 provides a clearer picture of how widespread these challenges are, showing the number of students affected by each risk factor in our sample.
Features of Riskmaps and Their Translation to English With Number of Students Affected for Each Feature.
To systematically capture these risk factors, a structured survey was conducted in collaboration with class teachers and school counselors. This survey, developed under the MoNE framework, was originally designed to assess student risks and provide intervention strategies for guidance teachers. The dataset was compiled through this detailed survey, administered at the beginning of the academic year, incorporating 35 variables related to personal, socio-economic, familial, and academic conditions. The survey questions included yes/no and categorical responses, covering:
Family structure (e.g., single-parent household, parental education level)
Socio-economic conditions (e.g., financial hardship, employment while studying)
Health-related issues (e.g., chronic illnesses, mental health conditions)
Academic performance (e.g., absenteeism, achievement levels)
Social environment (e.g., exposure to domestic violence, involvement in risky peer groups)
Surveys were administered in the first semester, allowing for early identification of at-risk students. They were distributed by class teachers and completed under the supervision of guidance counselors, ensuring accuracy. After data collection, students who had dropped out by the end of the first semester were identified through the “Okul Terk” (School Dropout) variable, allowing researchers to examine the underlying risk factors contributing to their disengagement.
This study involved human participants through a structured survey administered in collaboration with school staff. Ethical approval was obtained from the Kadir Has University Ethics Committee, with administrative permission from the participating school. Participation was voluntary, and informed consent was obtained from all students. For students under the age of 18, the survey was conducted with the knowledge and support of class teachers and guidance counselors, following school procedures. All responses were anonymous, and no personally identifiable information was collected. Data was stored securely and accessed only by authorized researchers. These procedures ensured that participants’ rights, confidentiality, and well-being were fully protected.
Data Preprocessing
The risk maps include crucial insights from various areas such as family background—like parents’ education levels and family structure—personal challenges, and academic performance. Central to our analysis is the target variable, “Okul Terk” (School Dropout), a binary indicator marking students at risk of dropping out (1) or not (0). This measure is crucial for creating predictive models that aim to pinpoint students who might drop out. Then we joined all the risk maps for the classes into one data set. It is also crucial to note that in our dataset, the dropout rate is notably higher in the 9th compared to the 10th, 11th, and 12th grades, which results in data imbalance. This pattern emerges from the educational policy that does not allow failing until high school. Consequently, many students entering the 9th grade lack fundamental skills in language, mathematics, and other core subjects. This educational gap significantly challenges these students, leading to a higher incidence of dropouts at this critical transition point into high school. So, to eliminate further data imbalance, we focused only on 9th grade where the dropouts’ rates are far greater.
In our survey conducted, the data was collected by class teachers using yes/no questions. The dataset primarily includes responses to these straightforward questions, which simplifies the handling of missing data. Despite this, some entries were incomplete due to various reasons, such as non-responses or data entry omissions. To address missing data in our dataset, we employed imputation techniques suitable for binary categorical data. As the data set includes only a few missing entries, it is imputed using the mode of each feature, ensuring that the most common response within the dataset was used as a substitute for missing entries. It is quick and easy for filling in missing data, but it can underestimate the differences in a variable (resulting in one category being overrepresented) if there are too many missing entries (Van Buuren, 2018).
Classifier Training
In the present study, several well-known algorithms recognized for their performance in classification tasks were selected because their use in dropout prediction analysis is documented in the literature. These include Decision Trees, Logistic Regression, Random Forest, AdaBoost, Gradient Boosting, Naive Bayes, Support Vector Machines (SVC), Stochastic Gradient Descent (SGD) Classifier, K-Nearest Neighbors, and Neural Networks.
The Decision Tree classifier, as described by James et al. (2013), builds a model based on making sequential decisions using criteria like entropy and information gain from the training data. This model provides an intuitive visualization of the decision-making process but is susceptible to overfitting if not properly pruned. Logistic Regression, outlined by Hosmer et al. (2013), models the probabilities of binary outcomes using a logistic function, which is particularly useful for its interpretability and the ease of applying regularization techniques to avoid overfitting. The ensemble methods, such as Random Forest and AdaBoost, as explained by Breiman (2001), and Freund and Schapire (1997) respectively, combine multiple weak learners to form a strong prediction model. Random Forest improves classification accuracy and robustness by averaging multiple deep decision trees trained on different parts of the dataset. AdaBoost focuses on difficult cases by adjusting the weights of incorrectly classified instances, which increases the model’s sensitivity to outlier data. Further, XGBoost, an advanced implementation of gradient boosting described by Chen and Guestrin (2016), optimizes traditional boosting techniques by integrating regularization parameters to control overfitting, making it highly effective for a wide range of predictive tasks. Our approach also included models such as the SGD Classifier, which optimizes linear models with a stochastic gradient descent learning method, and the K-Nearest Neighbors algorithm, which classifies data points based on the majority vote of their nearest neighbors, providing simple yet effective insights as supported by Altman (1992). The Naive Bayes classifier, a probabilistic model based on Bayes’ theorem, is recognized for its efficiency and good performance with a large dataset, despite the assumption of independence between predictors. Lastly, we incorporated a Neural Network using Keras, designed to model complex patterns and relationships in data through deep learning techniques, as outlined by Goodfellow et al. (2016).
Given the imbalanced nature of our dataset, which comprises 220 samples with 25 dropouts and 195 non-dropouts, we decided not to use cross-validation due to the small sample size. Instead, to address this imbalance and enhance the training process, we integrated the Synthetic Minority Over-sampling Technique (SMOTE) with our models. This approach helped to balance the class distribution, providing a more effective basis for training, and validating the classifiers employed. This tailored method allows for a better analysis of the factors contributing to student dropouts. As already mentioned, our dataset is not evenly distributed—some classes are more represented than others. This imbalance led us to choose precision and recall as our main metrics for evaluating our models. The precision of a predictive model shows how accurate the model’s predictions are. To put it simply, precision (given by Equation 1) measures the proportion of students identified by the model as likely to drop out who do end up dropping out. It’s a way of checking how many of the model’s “predicted dropouts” were correctly identified, combining both the true positives (correct predictions) and false positives (incorrect predictions of dropout). On the other hand, recall is all about the model’s ability to find all the potential dropouts. Given by Equation 2, recall measures how many of the actual dropouts were successfully predicted by the model. It compares the number of correct predictions (true positives) to the total number of students who dropped out, whether the model caught them or not (true positives plus false negatives). Both these metrics help us understand two crucial aspects of our model: Precision shows how trustworthy the model’s predictions are, and recall tells us how good the model is at catching all the cases it needs to catch.
Model Explanation
While achieving strong predictive results is crucial, it is equally important to understand the reasons behind student dropouts to help improve retention rates at schools. In this study, after determining the most effective model, we focused on digging deeper into the factors influencing these dropouts. We found that directly interpreting the feature importances provided by the Random Forest model was intuitive and informative. This approach allowed us to clearly see which variables most strongly influenced the likelihood of a student dropping out, providing actionable insights that could be used to prevent future dropouts.
The RandomForestClassifier, a widely recognized machine learning model, plays a crucial role in identifying and quantifying the factors contributing to school dropout rates among vocational and technical education students. As Breiman explains, a RandomForestClassifier comprises an ensemble of decision trees, each tasked with evaluating the same problem independently. Each decision tree in the ensemble makes predictions based on a subset of data features, and the final decision is determined by aggregating the most common outcome across all trees, a method known as majority voting. The construction of each decision tree follows a simple yet powerful approach: it splits the data by asking a series of yes/no questions about the features, navigating through the data points until it reaches a decision. The random selection of features and data points for each tree helps prevent overfitting, enhancing the model’s ability to generalize from the training data to unseen data effectively. This characteristic makes Random Forest particularly adept at handling classification problems with complex and high-dimensional datasets.
The Random Forest model was trained using a dataset encompassing 35 diverse features, reflecting socio-economic, familial, and academic variables that potentially influence student dropout rates. The importance of each feature in predicting dropout was quantified using the “feature_importances_” attribute of RandomForestClassifier. This metric evaluates the weight of each feature in the decision-making process, providing insights into how various factors impact dropout probabilities (Louppe et al., 2013). Through this analytical approach, we can discern which features hold the most predictive power, thereby aiding in the development of targeted interventions to reduce dropout rates. The method’s effectiveness in educational settings confirms its suitability for exploring complex issues like student retention.
Results
Initially, we compiled a dataset by collecting responses from a survey conducted in the school, where class teachers asked students a series of yes/no questions. This primary data, drawn directly from classroom interactions, was then systematically organized and enhanced with additional contextual information to bolster our predictive model.
SMOTE and Classification
After creating the dataset, classifiers were trained to predict dropouts using the data set. Our first step was to apply machine learning algorithms directly to the data without addressing the class imbalance.
From the results (Table 2), it is evident that models like Logistic Regression, SVC, and SGD Classifier demonstrated excellent precision (0.840) in predicting students who would not drop out (class 0) but failed to identify any students at risk of dropping out (class 1), with a recall of 0.000. This clearly indicates that these models are highly biased towards the majority class, capturing none of the minority class instances. Ensemble methods like Random Forest and Gradient Boosting, along with the Decision Tree model, showed high precision and perfect recall for the majority class, but these models also performed poorly on the minority class, with a recall of only 0.142 for class 1. This suggests that they are significantly affected by class imbalance. AdaBoost, like the previous ensemble methods, demonstrated high precision for the majority class (0.840) but failed to identify any students at risk of dropping out (class 1), with a recall of 0.000, indicating a high sensitivity to class imbalance. K-Nearest Neighbors (KNN) presented a different pattern. While KNN showed lower precision (0.826) and recall (0.513) for the majority class compared to other models, it managed to identify some at-risk students (class 1) with a recall of 0.428. This indicates that KNN might be less affected by class imbalance but still struggles with overall accuracy. The Naive Bayes classifier exhibited an interesting pattern. It had the highest precision (0.916) for the majority class but a very low recall (0.297). For the minority class, it achieved a precision of 0.187 and a recall of 0.857, resulting in an F1-score of 0.307. This suggests that Naive Bayes can detect some minority class instances but tends to misclassify many instances. Neural Networks also showed a similar trend to tree-based and ensemble models, with high precision and recall for the majority class and poor performance on the minority class.
Results of Using Classifying Algorithms Directly to the Dataset.
After recognizing the need to capture the minority class, we applied a technique known as SMOTE (Synthetic Minority Over-sampling Technique). SMOTE helps to balance the dataset by creating synthetic samples of the minority class (Figure 2). This approach improved the models’ ability to identify at-risk students, as evidenced by an increase in recall across all models.

Data set—before and after SMOTE.
Among the models tested, the Decision Tree model with SMOTE consistently delivered the most reliable results for the minority class with scores of 0.240 for precision and 0.857 for recall (Table 3). Despite the challenges with balancing recall and precision, this model maintained a reasonable balance, identifying true dropouts while minimizing false alerts. This balance suggests that Decision Tree models, which are inherently simpler and more interpretable compared to other complex algorithms, can be particularly effective in scenarios where the data is both complex and imbalanced.
Results of Using Classifying Algorithms After Using SMOTE to Dataset.
Explaining Model Results and Visualization
After training our classifiers to predict student dropouts, a crucial step was to explain the underlying factors contributing to these predictions. We analyzed the feature importance, which reveals how significantly each data point influences the model’s accuracy. For instance, factors like “Parental Education Level” and “Attendance Rate” might show strong correlations with dropout rates. To make these insights clear and actionable, we visualized the importance of each feature, often through charts that rank and display these attributes. This visualization helps educational stakeholders quickly grasp which factors are most critical, guiding focused interventions and resource allocation. By highlighting key predictors of dropout, the analysis provides a foundation for targeted educational strategies, enhancing efforts to reduce dropout rates effectively. The results obtained from the Decision Tree Model provide a detailed insight into the relative importance of various features for predicting school dropouts in vocational and technical education which can be seen in Figure 3.
Parental Education Level: Father with at most primary school education: This feature has the highest importance score (0.148), indicating a significant influence on the dropout likelihood. A lower educational attainment by the father may impact the socio-economic status of the family and the educational support available at home. Mother with at most primary school education: Also, highly significant with an importance score of 0.125, reflecting similar socio-economic and supportive implications as the father’s educational level.
Family Structure and Dynamics: The feature “Being an only child in the family” has an importance score of 0.111, indicating that the unique challenges only children face, such as higher parental expectations and fewer peer interactions at home, can affect their educational persistence. Similarly, the feature “Having five or more siblings” has a score of 0.072, suggesting that students from larger families might struggle with divided parental attention and fewer resources, impacting their school performance. Additionally, “Parental divorce” has an importance score of 0.091, showing that family instability can negatively affect a student’s ability to stay in school, due to emotional and financial stress.
Socio-economic and Health Challenges: With a score of 0.049, financial difficulties are a considerable factor, affecting a student’s resources for education. Similarly, students who work alongside their studies show a significant risk of dropout due to the potential for time management difficulties and fatigue, evidenced by a score of 0.092 also a score of 0.048 highlights the difficulty of managing students’ ongoing health issues along with school requirements.
Educational Performance and Behaviors: Low academic performance with a score of 0.043 directly affects a student’s motivation and confidence, highlighting the critical role of academic support.
Extended Family Health Issues: Chronic illness in the family with a score of 0.026 and Mental health issues in the family with a score of 0.001 have lower importance scores but are still relevant, indicating the broader environment’s impact on student stability and persistence in school.
Neglect and Protective Measures: Living under foster care, with a score of 0.017, and experiencing domestic violence, with a score of 0.013, have relatively low importance scores. This suggests that these factors might not be influential in this study’s context or that the dataset is not large enough to fully capture their impact.
Least Influential Features: Other features, such as having deceased parents, living only with grandparents, and being a child of seasonal workers, show zero or negligible importance in this model. This suggests that these factors might not be influential in the context of this study, or the dataset may not have enough variation in these categories to determine their impact.

Feature importance.
Discussion
The analysis of different models for predicting student dropouts in vocational and technical high schools showed mixed results, highlighting the challenges and complexities involved in accurately identifying at-risk students. Critically examining the results, particularly focusing on precision and recall metrics for both classes—students predicted to drop out (“Class 1”) and those not (“Class 0”)—provides valuable insight about the effectiveness of each model. Among the machine learning models tested, the Decision Tree classifier demonstrated the highest recall for identifying students at risk of dropping out (Class 1), achieving a recall of 0.857 alongside a moderate F1-score of 0.375. This suggests that the model was highly effective at capturing true dropout cases, making it a useful tool for early warning systems in vocational schools. However, the precision was relatively low, indicating a high false-positive rate. This trade-off suggests that while many at-risk students are identified, some students may be misclassified as dropouts when they are not. This aligns with the findings of Rodríguez et al. (2023), who also observed a trade-off between recall and precision in dropout prediction models, particularly in unbalanced datasets where dropout cases are the minority class. The Random Forest model, while slightly less effective in recall compared to the Decision Tree, provided better precision, suggesting a more balanced classification approach. This finding is consistent with studies such as Colak Oz et al. (2023) and Freitas et al. (2020), which demonstrate the efficiency of ensemble learning methods in dropout prediction. Conversely, models like the Naive Bayes and Gradient Boosting showed a balance between precision and recall, though with varying degrees of success. For “Class 0,” these models maintained a relatively high precision, suggesting that when they predict a student will not drop out, they are often correct. This is particularly crucial for efficiently allocating limited intervention resources, ensuring that those most likely to benefit receive the necessary support. Furthermore, the application of SMOTE noticeably improved recall figures across most models for “Class 1,” highlighting its efficacy in addressing the imbalance problem inherent in dropout prediction datasets. However, this improvement in recall often came with a reduction in precision, underscoring the persistent challenge of managing the trade-off between capturing as many true dropout cases as possible (high recall) and maintaining the accuracy of these predictions (high precision). In comparison, related studies such as Colak Oz et al. (2023) and Freitas et al. (2020) report high overall precision and recall, likely without class-specific metrics. Similarly, Kirana et al. (2024), Masabo et al. (2023), Sansone (2019), and Selim and Rezk (2023) present high general metrics but do not provide separate results for minority and majority classes. On the other hand, Rodríguez et al. (2023) applied LightGBM for dropout prediction in Chilean schools and provided separated results for both minority and majority classes. Their study demonstrated that LightGBM models offered high recall and precision, particularly for the minority class. Rodríguez et al. (2023) achieved high recall for class 1 using LightGBM, with recall rates of 0.881 for public schools and 0.877 for private voucher schools, and precision rates of 0.448 and 0.457, respectively. Our Decision Tree model achieved a similar recall rate of 0.857, indicating good performance in identifying dropouts. However, our precision for class 1 (0.240) is lower compared to the LightGBM model, suggesting that while we accurately identified a significant portion of dropouts, there is a higher rate of false positives. It is important to note that our data set includes 220 samples, with 25 in class 1 (dropouts). The relatively small size of the minority class and the diverse set of 35 features in our dataset may affect the results. This diverse feature set could contribute to the differences in precision and recall compared to the larger datasets used by Rodríguez et al. (2023). The detailed performance breakdown for both classes highlight the challenge of false positives in minority class prediction and emphasizes the importance of balancing recall and precision. This approach ensures effective early identification of at-risk students, aligning with advanced strategies in the field while underlining the need for class-specific evaluation to effectively address the unique challenges of dropout prediction
The feature importance analysis showed that parental education level, family structure and dynamics, socio-economic and health challenges, educational performance and behaviors, extended family health issues, and neglect and protective measures were the most significant predictors of student dropout. These findings align with theoretical perspectives such as Bourdieu’s Social Capital Theory, Tinto’s Student Integration Model, Bronfenbrenner’s Ecological Systems Theory, and Resilience Theory, confirming the necessity of intervention strategies.
From the Social Capital Theory (Bourdieu) perspective, parental education level and financial hardship play a major role in shaping students’ academic engagement. Students from lower-educated families often lack access to cultural and educational capital, limiting their ability to navigate the education system effectively. This aligns with Singh and Alhulail (2022), who found that parental education directly impacts students’ academic aspirations and persistence. Furthermore, Jimerson et al. (2000) and Battin-Pearson et al. (2000) emphasize that financial instability often forces students to prioritize work over school, leading to disengagement. Our findings show that students who work while studying face a significant dropout risk, which further supports the argument that economic constraints and limited social capital can push students out of the education system. Tinto’s Student Integration Model explains dropout risk through the lack of academic and social integration in school life. Our findings indicate that students with fewer peer interactions at home (e.g., being an only child) or those who struggle with divided parental attention (e.g., having five or more siblings) may experience challenges in social belonging, which can negatively impact school engagement. Dağlı and Can (2023) highlight that peer disengagement and absenteeism are strong indicators of dropout risks, which is consistent with our finding that students exposed to risky peer groups or those who exhibit chronic absenteeism are at higher risk. At a broader level, Bronfenbrenner’s Ecological Systems Theory provides a framework to understand how family dynamics and external influences shape dropout risk. Our findings show that family instability, including parental divorce and being under foster care, negatively impacts students’ educational persistence. Petre et al. (2024) found that students lacking familial social capital are more likely to disengage from education, reinforcing the role of the home environment in shaping student success. Additionally, chronic illness or mental health issues within the family emerged as relevant dropout predictors, indicating that external stressors beyond the student’s control can influence academic stability. Finally, Resilience Theory (Masten, 2001) highlights why some students persist in education despite facing adversity. This theory suggests that students with strong personal, social, and institutional support systems are more likely to overcome challenges and stay in school. While factors such as financial hardship, risky peer group exposure, and absenteeism increase dropout risk, students who develop coping mechanisms or have external support (e.g., school counseling, mentorship programs) may have better educational outcomes. This aligns with findings from Holtmann and Solga (2023), who emphasize the importance of career alignment and structured student support in vocational education.
Based on the findings of this study, several policy actions should be considered to reduce dropout rates in vocational and technical high schools. Targeted interventions can improve student retention by addressing key risk factors such as financial instability, academic struggles, and limited career flexibility. The following recommendations offer practical solutions:
Early Warning Systems for At-Risk Students
Machine learning models, like the ones used in this study, can serve as early warning systems to help schools identify students at risk of dropping out. Schools should implement predictive analytics dashboards to monitor absenteeism, academic performance, and socio-economic stressors in real time. Similar models have been applied in countries like Chile, where LightGBM-based prediction systems have improved student retention (Rodríguez et al., 2023).
Financial Aid and Work-Study Programs
Many vocational students struggle with financial hardship and the need to balance work and study, which increases their likelihood of dropping out. Expanding financial assistance, paid internships, and flexible work-study options can ease this burden. Countries with integrated vocational financing programs, such as Germany’s apprenticeship model, have seen reduced dropout rates (Beckmann et al., 2023).
Parental Engagement and Family Support Initiatives
Parental education level has been identified as a major dropout predictor. Schools should implement structured parental involvement programs, offering workshops and guidance to help parents support their children’s education. Studies have shown that strong parent-school collaboration improves student retention (Raftery et al., 2012).
Career Pathway Flexibility and University Access
One of the leading causes of dropout in vocational schools is the limited ability to transition to general education or higher education programs. Allowing credit transfer mechanisms and introducing dual-diploma pathways would give vocational students greater flexibility in their academic and career trajectories.
Psychosocial and Counseling Support Services
Given that many vocational students come from unstable family environments or face mental health challenges, on-campus counseling services and peer mentorship programs should be expanded.
By implementing these evidence-based interventions, policymakers and educators can take proactive measures to reduce dropout rates in vocational education. Ensuring that students receive the necessary financial, academic, and social support will not only enhance student retention but also contribute to a more resilient and skilled workforce.
Conclusion
This study helps us to see the early school dropout issue in vocational and technical high schools from a data-driven perspective; using machine learning to identify students most at risk while pointing out the key factors contributing to the issue. The importance analysis show that socio-economic background, family structure, and academic performance are among the strongest predictors of dropout which align with established theoretical frameworks—Bourdieu’s Social Capital Theory, Tinto’s Student Integration Model, and Bronfenbrenner’s Ecological Systems Theory—demonstrating that dropout is not simply an individual decision but is shaped by social capital, academic integration, and broader environmental influences. By recognizing these patterns, we can conclude that there is a need for systematic interventions that address both individual and structural factors contributing to student disengagement.
From a methodological standpoint, this study highlights the effectiveness of machine learning in predicting dropout risks. Among the models tested, the Decision Tree classifier with SMOTE showed the highest recall (0.857) alongside a moderate F1-score of 0.375 for drop out (“Class 1”), making it a powerful tool for early warning systems. However, its relatively lower precision suggests that while many at-risk students were correctly identified, some false positives occurred, reinforcing the need for further model refinement.
Students from less-educated families face a higher risk of dropping out due to limited academic resources, guidance, and support, reinforcing the need for parental involvement programs. Financial hardship is another key factor, as many vocational students must work while studying, leading to absenteeism and disengagement. Expanding financial aid and work-study opportunities can help alleviate this burden. Additionally, the lack of academic flexibility in vocational education restricts upward mobility, causing frustration and increasing dropout rates. More adaptable pathways between vocational and general education could improve retention.
The integration of predictive analytics in dropout prevention is still in its early stages, but this study demonstrates its real-world potential. If machine learning-based dropout tracking were integrated into national education databases, such as Turkey’s e-Okul system, schools could digitally monitor risk factors and intervene before students disengage completely. This would automate intervention efforts, helping educators detect and support struggling students before they reach the point of no return.
Despite its contributions, this study has several limitations. Due to administrative and privacy constraints, the research was conducted at a single vocational school with a relatively small sample (220 students, with only 25 dropout cases). While the findings provide valuable insights, they should be seen as a pilot study rather than a fully generalizable conclusion. Future research should expand to multiple schools across different regions to improve model robustness and applicability. Additionally, the study primarily focused on socio-economic and academic risk factors, without incorporating behavioral or psychological variables. Factors such as student motivation, mental health, and peer influences play a crucial role in dropout risk and should be included in future models. Integrating qualitative data—such as teacher observations, counselor reports, and student self-assessments—could help build more holistic predictive models. Another key limitation is the cross-sectional nature of the study, which captures dropout risks at a single point in time rather than tracking longitudinal trends. Future research should adopt long-term monitoring approaches to evaluate the effectiveness of different intervention strategies over time.
To make real change, everyone involved in education—policymakers, school leaders, and vocational training institutions—needs to work together. Expanding financial aid, making academic pathways more flexible, and using AI-driven early warning systems can help keep students on track. Schools should also focus on involving parents more and offering better support through counseling and mentorship. At the same time, vocational programs need to stay relevant to industry needs, keeping students engaged and motivated. With a more adaptable and student-centered approach, we can create a system where fewer students fall through the cracks and more have the opportunity to succeed.
Footnotes
Ethical Considerations
Ethical approval was obtained from the Kadir Has University Ethics Committee, with administrative permission from the participating school.
Consent to Participate
Informed consent was obtained from all student participants, and all data was anonymized to protect participant privacy.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.
