Sage Journals: Discover world-class research

Abstract

This research presents a proposed approach that could be applied in modeling students’ study strategies and performance in higher education. The research used key learning attributes, including intrinsic motivation, extrinsic motivation, autonomy, relatedness, competence, and self-esteem in the modeling. Five machine learning models were implemented, trained, evaluated, and tested with data from 924 university students. The comparative analysis reveals that tree-based models, particularly random forest and decision trees, outperform other models, achieving a prediction accuracy of 94.9%. The models built in this research can be used in predicting student study strategies and performance and this can be applied in implementing targeted interventions for improving learning progress. The research findings emphasize the importance of incorporating strategies that address diverse motivation dimensions in online educational systems, as it increases student engagement and promotes continuous learning. The findings also highlight the potential for modeling these attributes collectively to personalize and adapt learning process.

Keywords

classification models learning strategy predictive modeling regression models student academic performance student motivation supervised machine learning

Introduction

In recent years, the use of eLearning systems by both learning institutions and individuals for teaching and learning has become increasingly popular. Various courses in different programs and disciplines are made available to students through the systems. Despite the widespread use of the systems, issues of low engagement and high attrition rates are common among students learning with them. Thus, improving students’ learning experience with the systems by identifying their challenges and adapting appropriate learning interventions to address them is an important research field.

Students' learning experience could be improved by identifying and understanding those at risk of failing or dropping out of university/course and providing timely interventions targeting their needs. To monitor students' learning experience and identify those at risk, modeling their learning behaviors to predict whether they will succeed/fail in their studies is a well-known research area in student modeling. Research revealed that a variety of factors ranging from student motivation for learning, self-esteem, study strategies, basic psychological needs, and cognitive ability affect students’ learning behaviors and progress. The goal of predictive modeling is to automatically identify students’ needs and respond with appropriate learning interventions to enhance learning. Most predictive models in the context of higher education seek to identify differences in students’ learning in order to provide adaptive learning based on students’ needs.

Increasing evidence suggests that students’ motivation for learning has a direct impact on their learning behavior and performance. Moreover, research identified that student motivation changes during the learning process depending on the context (Du Boulay & Del Soldato, 2016). As a result, various studies explored ways of modeling and quantifying students’ motivation in order to manage and improve it dynamically (Orji & Vassileva, 2022). Thus, improving student motivation has attracted the attention of researchers as a tool for promoting student learning and progress. The impact of student motivation for learning is usually assessed based on dimensions of various motivation theories in education such as self-determination (Ryan & Deci, 2000), self-efficacy (Bandura, 1977), and self-esteem (Rosenberg, 2015). While there is an increased understanding of how each different motivational dimension affects learning individually, there is still a gap in understanding how they affect learning progress in combination. Thus, there is a need to investigate the combined influence of motivation variables (we will also call them “dimensions”) on student study strategies and academic performance. This will guide eLearning systems designers in determining whether it is necessary to implement various strategies that target many motivation dimensions, and it will also provide rich information for researchers demonstrating synergistic relationships among the key learning attributes.

In order to gain valuable insights into the influence of various motivation dimensions (intrinsic, extrinsic, autonomy, relatedness, competence, and self-esteem) on study strategies and academic performance, we have carefully selected five supervised machine learning (ML) algorithms that are commonly used in performing regression and classification tasks (Marbouti et al., 2016). These algorithms, namely Random Forest (RF), K-Nearest Neighbors (KNNs), Decision Tree (DT), Linear/Logistic Regression (LR), and Support Vector Machine (SVM), were employed to develop predictive models for study strategies and academic performance. The selection of these algorithms is based on their widespread usage and effectiveness in modeling students’ characteristics. Below, we present a concise overview of each algorithm:

Random Forest

RF is an ensemble learning technique that combines multiple DTs to generate predictions. It is particularly useful when dealing with complex datasets and high-dimensional feature spaces. In the context of predicting study strategies and academic performance, RF can handle both categorical and continuous variables, making it suitable for analyzing the various factors influencing student learning. It can capture interactions between different predictors and provide insights into feature importance.

K-Nearest Neighbors

KNNs is a simple yet powerful algorithm that makes predictions based on the similarity between data points. In this case, a KNN model can be used to predict study strategies and academic performance by finding the K most similar students (nearest neighbors) in terms of their intrinsic motivation, extrinsic motivation, autonomy, relatedness, competence, and self-esteem. The prediction is made by considering the majority class (e.g., study strategy or academic performance level) among the KNNs.

Decision Tree

A DT is a tree-like model where each internal node represents a test on an attribute, each branch represents the outcome of the test, and each leaf node represents a class label (prediction). DTs are suitable for both classification and regression tasks. In this context, a DT can be trained to predict study strategies and academic performance based on the provided input features. The tree structure is built by recursively partitioning the data based on the input features, with the aim of maximizing predictive accuracy.

Linear/Logistic Regression

Linear Regression is a supervised learning technique primarily utilized for regression tasks, whereas LR is specifically employed for classification tasks. In both cases, these algorithms establish a model that relates the input features to the target variable by fitting a linear function to the available data. In this scenario, Linear or LR can be employed to predict study strategies and academic performance based on the given features. The model will estimate the coefficients for each input feature, which represent the strength and direction of their influence on the target variable.

Support Vector Machine

SVM is a versatile algorithm that can be used for both classification and regression tasks. SVM finds a hyperplane in the feature space that best separates the data into different classes or predicts a continuous target variable. In the case of predicting study strategies and academic performance, an SVM model can be trained to find the optimal hyperplane that maximizes the margin between different levels of study strategies or academic performance, based on the input features.

The models built based on the algorithms were applied to a dataset of 924 university students from Chile, published in (Orsini et al., 2018). The best-performing regressor/classifier model for students’ study strategies and academic performance was obtained using the test dataset. The results from this study revealed the impact of the aforementioned student attributes on study strategies and academic performance which can help in implementing appropriate interventions to enhance the students learning progress and retention rates. This also provides information for online educational systems designers about the dynamic role of incorporating strategies that could address various motivation dimensions in promoting students learning.

This study answers the following research questions:

RQ1. Can intrinsic and extrinsic motivation, autonomy, relatedness, competence, study strategies (deep and surface), and self-esteem be utilized to identify at-risk students?

RQ2. How do intrinsic and extrinsic motivation, autonomy, relatedness, competence, and self-esteem influence students’ study strategies?

RQ3. Among the five ML algorithms investigated which regressor/classifier is the most successful at predicting student academic performance/study strategies?

These three questions are important as they address key aspects related to study strategies and academic performance, providing valuable insights and contributing to the existing knowledge in the field. The RQ1 explores the possibility of using various factors related to motivation, study strategies, and self-esteem to identify students who may be at risk academically. The question aims to understand if these specific factors can serve as indicators or predictors of students who may require additional support or intervention to improve their academic performance. The rationale behind this question is grounded in the belief that students’ motivation, study strategies, and self-esteem play crucial roles in their academic success. Thus, it is essential to investigate further the comparative impact of these factors to provide a better understanding and insight about their role in influencing academic performance. Understanding the roles of the factors will help educational stakeholders gain insights into effective teaching strategies and interventions that can foster positive motivation, study skills, and self-esteem, thus improving overall student success and well-being. The answer to this question can contribute to the development of early warning systems and interventions that can help at-risk students succeed academically.

The RQ2 is crucial for gaining insights into the factors that shape students’ study behaviors and strategies. By investigating how these factors influence study strategies, educators and researchers can gain deeper insights into the underlying processes that drive students’ approach to learning. Understanding the impact of intrinsic and extrinsic motivation, autonomy, relatedness, competence, and self-esteem on study strategies helps in identifying effective interventions and teaching methods that can enhance students’ engagement, motivation, and learning outcomes. This knowledge contributes to the development of evidence-based strategies to optimize students’ study approaches and ultimately improve academic performance.

For RQ3, identifying the most successful regressor/classifier among the investigated algorithms provides valuable guidance for building effective predictive models in the context of study strategies and academic performance. Different algorithms have different strengths and limitations, and their performance can vary depending on the dataset and task at hand. Determining the best algorithm helps researchers and practitioners choose the most effective approach for predicting study strategies and academic performance. This knowledge aids in the development of accurate and reliable predictive models that can be used for early identification of at-risk students so that personalized interventions can be applied to support them.

Overall, the answers to these questions will contribute to a deeper understanding of the factors influencing study strategies and academic performance. They will provide actionable insights for educators, policymakers, and researchers to design interventions, support systems, and educational strategies that promote students success and well-being.

Related Work

Student Motivation

One potential theory that is predominantly used in exploring student motivation for learning is the self-determination theory (SDT) (Ryan & Deci, 2000). SDT recognizes the role of intrinsic and extrinsic motivations in facilitating people to achieve a specific goal. The authors hypothesized that fulfilling three psychological needs—competence, autonomy, and relatedness—plays a crucial role in promoting intrinsic motivation, self-regulation, and mental well-being. Conversely, when these needs are not met, it leads to a decline in motivation and overall well-being. This theory has been widely applied in various educational settings to elucidate how the desire for personal growth and the pursuit of fulfilling individual goals influence students’ learning behaviors.

According to the theory, feelings of autonomy, competence, and relatedness/connection influence students’ learning experiences which consequently affect their progress and performance. The importance of these factors in effectively enhancing students’ learning skills has been highlighted, emphasizing that as students acquire different skills and achieve mastery of tasks relevant to their specific goals, they are more likely to take actions that lead to goal attainment. For example, in a study conducted by Feri et al. (2016), multiple regression analysis was applied to students’ data collected through validated questionnaires measuring autonomous and controlled/extrinsic motivation. The students’ academic performance was evaluated using a 100-item multiple-choice test. The findings indicate that even a small 1% rise in students’ autonomous motivation was linked to a substantial 15.2% enhancement in their academic performance. Research (Chen et al., 2010) revealed that addressing students’ needs for autonomy, competence, and relatedness is likely to improve student engagement, course satisfaction, and performance in an online educational system. Thus, students’ social context, skills for a specific task, and their ability to direct their actions to complete the task have a direct influence on their learning performance. Furthermore, a number of studies (Areepattamannil et al., 2011; Vansteenkiste et al., 2004) have demonstrated that intrinsic motivation has an impact on students’ learning experience and academic performance. Students with high intrinsic motivation sustain for a longer time their participation in academic activities to achieve required learning objectives more than those with low intrinsic motivation. On the other hand, extrinsically motivated individuals perform activities because of the external rewards that they will gain from completing the activities. The desire to perform an activity is driven by compulsion, rewards, or punishments rather than by pleasure and satisfaction obtained from accomplishing the activity. Extrinsic motivation is associated with a high level of willpower and more engagement which could help at the initial stage of a task and as the learning process goes deeper may transform into intrinsic motivation for sustaining high-quality learning and creativity (Gopalan et al., 2017). Thus, intrinsic and extrinsic motivation have a role to play in helping students to face learning challenges, understand processes, master and apply skills learned in real circumstances.

Predictive Modeling and Academic Performance

The measurement of academic performance is considered crucial in evaluating students’ progress and success in their learning endeavors. It serves as an indicator of their skills, comprehension of concepts, knowledge, and ability to achieve predetermined learning objectives (Tuckman, 1975). Academic performance plays a significant role in assessing the extent to which educational institutions, educators, and students are fulfilling their short-term and long-term learning goals. This assessment provides valuable insights for educators, aiding in decision-making processes such as identifying struggling students who may be at risk of failing and allocating appropriate counseling or limited tutoring resources (Sundar, 2013). It also enables the adaptation of suitable learning interventions to support students. Researchers investigate the perceived connections between academic performance and other learner attributes to identify factors influencing it and explore ways to enhance students’ performance. This understanding assists educators in refining their teaching methods.

Numerous studies employing different algorithms have been conducted to examine the impact of various learner attributes on students’ academic performance. For example, Chen et al. (2012) used psychosocial factors, coursework grades, and learning log data from an advanced programming course in a higher education institution in predicting students’ final grades. According to the researchers’ findings, coursework grades were identified as the most influential factor in determining academic performance, followed by the total number of learning materials downloaded from the learning system. Another study focused on predicting academic performance using variables such as time on task, the total number of logins to a learning system, average assessment grades, and the percentage of learning activities accessed. This study revealed that the average assessment grade was the most significant contributing variable, followed by time on task (Orji & Vassileva, 2020). The authors emphasized that such prediction models can be utilized to identify students at risk of failing a course, allowing online educational systems to automatically implement appropriate interventions that may involve both internal and external motivators.

In a separate study conducted by Jamjoom et al. (2021), models were developed to predict students’ performance in an introductory computer programming course with the aim of identifying at-risk students. The attributes used in these models were based on students’ self-efficacy and included cumulative high school grades, quizzes, midterm exams, practical evaluations, and final exam grades. The study indicated that DT models and SVM classifiers achieved the highest performance. Furthermore, it revealed a strong correlation between student self-efficacy and practical evaluation grades, highlighting the importance of self-efficacy in programming courses.

Marbouti et al. (2016) compared seven prediction models to identify at-risk students, using attributes such as quizzes, attendance, homework, team participation, project milestones, mathematical modeling activity tasks, and exams. The Naïve Bayes classifier and an Ensemble model demonstrated the best results in their analysis. Additionally, attributes like race, family income, and university entry mode were employed in predicting student academic performance (Aziz et al., 2014). The study employed Naïve Bayes, Rule-Based, and DT classification methods to determine the best model for predicting academic performance based on these attributes. The results indicated that race was the most influential variable, followed by family income, and that the Rule-Based and DT models outperformed Naïve Bayes.

Some studies in the literature utilized predictive modeling to identify at-risk students and provide interventions to assist them. For example, Greer et al. (2015) employed predictive modeling to identify at-risk students and offered personalized learning supports and resources tailored to their individual needs. Significant improvement in student performance was reported as a result. Similarly, Essa and Ayad (2012) utilized predictive models and segmentation techniques to identify at-risk students, employing data visualization to gain diagnostic insights. The studies revealed that the use of predictive modeling combined with adaptive interventions can have significant impacts on student learning experience and performance. Current research trends in predictive modeling have shown that it can provide useful information about specific attributes or variables impacting the students’ learning progress which can help online educational systems designers and educators in enhancing teaching and learning in higher education.

Although previous studies investigated several predictive models of student academic performance using various learning attributes, there is a need to explore the broad effect of autonomy, competence, relatedness, intrinsic motivation, extrinsic motivation, self-esteem, study strategies, and demographic attributes on student performance in higher education using supervised ML techniques. While previous research has explored these factors individually, this study expands on the existing knowledge by considering multiple factors simultaneously and leveraging ML techniques. By examining the collective impact of these factors, this research sheds light on their combined influence on study strategies and academic performance. This holistic approach helps to uncover nuanced relationships and interactions between these factors, study strategies, and academic performance. It enhances understanding of how these factors work together to shape student behaviors and performance, providing valuable insights for educators, researchers, and policymakers to improve educational practices, interventions, and support systems.

Furthermore, this study utilizes ML algorithms to build predictive models. This approach offers a data-driven perspective, enabling the identification of patterns, correlations, and predictive power of the selected factors. By employing ML algorithms, the research explores the potential for more accurate and reliable predictions of student study strategies and academic performance based on these factors.

Moreover, limited research has examined the influence of the motivational dimensions explored in this study on two prominent study strategies: Deep and surface study strategies. Also, few studies have shown that the motivational dimensions and the study strategies used in this study have an impact on academic performance. The predictors in our study were assessed using instruments grounded in well-established theories of self-determination, self-esteem, and study strategies, ensuring the applicability of our model across diverse learning environments.

Method

To determine the impact of the various motivation dimensions and demographic attributes on the study strategies and academic performance of higher education students, we employed well-known ML techniques which are summarized below.

We performed some preprocessing to prepare the dataset we collected for analysis. Some data balancing techniques were applied.

We developed five supervised ML regressors and classifiers for predicting students’ academic performance and study strategies.

We split our dataset into training and test sets. We trained and evaluated our regression models for academic performance prediction and classification models for study strategies prediction using 10-fold and 5-fold cross-validation, respectively. We used the test sets to determine the performance of the models.

We compared the performance of the models built to determine the best-performing regressor/classifier.

Data Description and Preprocessing

Based on SDT theory, the quality and dynamics of people's behavior are influenced by various forms and sources of motivation. These variations affect behavioral consequences such as persistence and performance experiences that accompany them. “SDT therefore explicitly differentiates the concept in order to consider the varied effects of different types of motivation on such relevant outcomes” (Ryan & Deci, 2017). Thus, different scales such as the academic motivation scale and basic psychological needs satisfaction scale developed based on SDT theory are commonly used for assessing different sources of motivation and basic psychological needs in education.

The anonymized dataset used for this research was shared by Orsini et al. (2018) on the Harvard Dataverse website and the data use agreement expects that proper credit should be accorded to the researchers through citation. The researchers obtained ethics approval from the University of San Sebastian, Chile, and the university dental students from years 1 to 6 were recruited to voluntarily participate in the study. A paper-and-pencil questionnaire was administered to the recruited students. The researchers told the students that they want to understand their motivation for attending university and how it affects other educational variables. The dataset consists of 924 university students’ data on intrinsic motivation, extrinsic motivation, and amotivation acquired using the academic motivation scale (R. J. Vallerand et al., 1993; Robert J. Vallerand et al., 1992). The academic motivation scale (AMS) is a 7-point Likert scale (ranging from 1 (strongly disagree) to 7 (strongly agree)) for assessing three types of motivation; intrinsic motivation, extrinsic motivation, and amotivation. The AMS consists of 28 items divided into seven subscales, each measuring a different aspect of students’ academic motivation. The subscales are as follows: amotivation (4 items), intrinsic motivation to know (4 items), intrinsic motivation towards accomplishments (4 items), intrinsic motivation to experience stimulation (4 items), extrinsic motivation consists of external regulation (4 items), introjected regulation (4 items), and identified regulation (4 items).

The second instrument employed for data collection from students was the basic psychological needs satisfaction scale (Domínguez et al., 2010). This scale was utilized to gather information regarding students’ autonomy, relatedness, and competence. The basic psychological needs satisfaction scale is a 5-point Likert scale, ranging from 1 (strongly disagree) to 5 (strongly agree). It consists of a total of 15 items, with 5 items dedicated to measuring autonomy, 5 items focused on assessing relatedness, and 5 items dedicated to measuring competence. The third instrument used to obtain students’ self-esteem data was the academic self-esteem scale (Rosenberg, 2006). This scale comprises 10 items designed to evaluate positive and negative feelings about oneself. Respondents were asked to indicate their responses on a 4-point Likert scale, ranging from strongly agree to strongly disagree. Furthermore, the study process questionnaire (Biggs et al., 2001) was utilized to obtain data on students’ study strategies, specifically deep and surface strategies. The questionnaire consists of a total of 10 items, with two subscales, each containing 5 items, dedicated to measuring deep and surface strategies. Participants provided their responses using a 5-point Likert scale format, ranging from never to always. Other student data in the dataset include students’ concurrent academic performance from the administrative department of the university; and their demographic information which includes the year of study, gender, and age. The scales have been subjected to various psychometric analyzes such as reliability analysis, confirmatory factor analysis, and constructor validity to assess their reliability and validity. As such the scales are the most frequently used instruments in education for assessing their various constructs because of their validity and reliability.

For preprocessing of the dataset, the respective scale items that form intrinsic motivation, extrinsic motivation, autonomy, relatedness, competence, self-esteem, deep and surface study strategies were combined to provide a general estimate of each of the features. We performed some exploratory analysis that shows the distribution and description of the dataset. The descriptive statistics of the dataset is presented in Table 1. Figures 1, 2, and 3 show the distribution of students based on their demographic information. Moreover, we provided ground truth labels for the classification of study strategies based on students’ self-reports. The most commonly used methods for providing ground truth labels for classification based on supervised ML are self-reports, human observation, and trained raters.

Figure 1.

Students’ distribution according to the year of study.

Figure 2.

Students distribution based on gender.

Figure 3.

Students distribution based on age.

Table 1.

Description Statistics of the Dataset.

Features	Mean	SD	Min	Max
Intrinsic motivation	4.98	0.61	2.17	6.58
Extrinsic motivation	5.25	0.75	2.42	7.00
Autonomy	5.01	0.89	2.00	6.25
Relatedness	4.46	0.90	1.5	6.25
Competence	4.77	0.84	2.25	6.25
Self-esteem	4.17	0.17	1.75	7.00
Deep study strategies	4.11	0.72	2.00	6.25
Surface study strategies	3.32	0.79	1.50	6.25
Study year	3.24	1.48	1	6
Age	22.83	3.36	18.00	44.00
Academic performance	4.72	0.54	2.92	6.40

Regression and Classification Experiments

Preprocessing of the dataset and prediction experiment in this study were performed using Python and the scikit-learn library. We implemented and compared RF, Linear/LR, SVM, DT, and KNNs, for both the regression and classification problems of student academic performance and study strategies. The dataset used in the model building was split into training and test sets in the ratio of 70%:30% for the regression experiment and 80%:20% for the classification. We examined many options in training and evaluating the regressors and classifiers using the training sets; the best results were obtained using 10-fold cross-validation for the regression and 5-fold for the classification. For the classification task, 729 students were labeled as using the deep study method (majority class) out of the 924 student records in the dataset, whereas 172 students were labeled as using the surface study technique (minority class). This revealed that our dataset for the prediction of students’ study strategies is not balanced. Creating a model based on the unbalanced data will introduce bias in the training dataset which can lead some ML algorithms to ignore the minority class. Because most ML algorithms need a balanced class distribution or an equal cost of misclassification, unbalanced class distributions affect the learning process significantly (He & Garcia, 2009). This means that the classification models could have overall high accuracy but only correctly predict the samples in the majority class while ignoring the minority class. According to research, application of resampling techniques is an effective way of solving unbalanced class problems (Branco et al., 2016). Thus, class balancing techniques ensure that effective models with equal probability to identify all classes are created. Random oversampling is one of the most common methods that have been applied by researchers to handle the issue of unbalanced classes (Branco et al., 2016). To address this imbalance problem, we applied random oversampling techniques (Seiffert et al., 2010) using the resampling method in the scikit-learn library. The random oversampling method randomly duplicates some samples in the minority class depending on the number of samples the minority class needs to match the number of samples in the majority class. The datasets obtained after the sampling processes were used for prediction. Table 2 shows the size of data obtained using the technique. To prevent overfitting, we trained and tuned the models using cross-validation.

Table 2.

Balanced Dataset for Study Strategies Prediction.

Label	Original dataset	Balanced dataset (using oversampling)
Deep study strategy	729	729
Surface study strategy	195	729

Results

Various evaluation metrics have been used in studies to understand the performance of regression models (Botchkarev, 2019). The performance metrics assess how closely predicted outcomes match the actual values. Mean absolute error (MAE), root mean squared error, and other metrics are frequently used in research investigations. In this research, we compared the accuracy of the regression models by computing their MAE. Using this measure, the average models’ prediction errors can be directly interpreted (as the average difference between actual values and predicted values). The accuracy of the model is better when the MAE is lower. For the classification problem, we applied four frequently used evaluation metrics: accuracy, F1-score, precision, and recall (Japkowicz & Shah, 2011). Using the test dataset, the performance of the regression and classification models were compared based on the above evaluation metrics.

The result of testing the five different ML models is shown in Tables 3 and 4. Table 3 shows the MAE for each of the regression models. The DT model outperforms the other models while the KNN (the acronyms are defined in Table 3 and LR in Table 4) model produced the least accurate result. Table 4 shows accuracy across classifier models. Among the classifiers, RF achieved the best overall score in terms of accuracy, precision, recall, and F1 followed by DT. The least accurate performance result was produced by LR. These results show that the features used in this research are sufficient to predict students’ study strategies and academic performance.

Table 3.

Regressors’ Performance for Academic Performance Prediction.

Regressors	MAE
RF	0.3913
LR	0.4003
SVM	0.4026
DT	0.3777
KNN	0.4242

Note. MAE = mean absolute error; RF = random forest; LR = linear regression; SVM = support vector machine; DT = decision tree; KNN = K-nearest neighbor. The bold values were used to indicate the performance of the best machine learning model for the regression and the classification task.

Table 4.

Classifiers’ Performance for Study Strategies Prediction.

Metrics	RF	LR	SVM	DT	KNN
Accuracy	0.949	0.589	0.599	0.880	0.685
Precision	0.949	0.569	0.632	0.892	0.686
Recall	0.950	0.570	0.611	0.885	0.687
F1	0.949	0.568	0.587	0.880	0.685
True positive	139	85	62	122	101
False positive	18	69	92	32	53
True negative	127	81	113	135	99
False negative	8	57	25	3	39

Note. RF = random forest; LR = linear regression; SVM = support vector machine; DT = decision tree; KNN = K-nearest neighbor. The bold values were used to indicate the performance of the best machine learning model for the regression and the classification task.

Insights on Features’ Significance

To examine the direction of the relationship between each predictor and criterion variables, we did not compute linear regression and examine the direction of the significant coefficient to estimate the effect of each predictor based on beta weights because the tree models (RF and DT) performed better than the linear models. Feature importance is typically used in tree-based ML algorithms for providing valuable insights to help explain the direction of the relationship between each predictor and the criterion variable in a predictive model. It indicates the relative contribution or influence of each predictor in the model's decision-making process. The feature importance scores identify which predictors have a stronger impact on the model's predictions. Higher feature importance suggests that a predictor plays a more significant role in determining the outcome, while lower importance indicates a relatively weaker influence. As can be seen in Figures 4 and 5, all the features are positive predictors of academic performance and study strategies. For Figure 4, the strongest predictor of academic performance is the year of study followed by intrinsic and extrinsic motivation. Also, the figure indicates that age, autonomy, relatedness, competence, deep and surface study strategies, and self-esteem are associated with better academic performance as they impacted the model positively. Gender shows a negligible relationship with academic performance as it has a low or close-to-zero feature importance score. This means that gender may have less relevance or predictive power in the model and may not strongly influence the direction of the relationship. For study strategies prediction (see Figure 5), intrinsic and extrinsic motivation play a more significant role in the prediction of study strategies as revealed by their highest feature importance scores. Age, autonomy, relatedness, competence, and self-esteem also have positive relationships with study strategies. The gender feature importance score of close to zero indicates that it has minimal impact on study strategies.

Figure 4.

Feature importance for academic performance prediction.

Figure 5.

Feature importance for study strategy prediction.

In study strategies and academic performance models, the high positive feature importance of intrinsic and extrinsic motivation suggests that students who exhibit higher levels of both are more likely to perform well or adopt effective study strategies. Given the significance of intrinsic and extrinsic motivation in the models, the findings suggest that educational interventions and support systems should consider fostering both types of motivation to promote positive outcomes. Encouraging students’ intrinsic motivation by creating engaging learning environments and promoting their autonomy, curiosity, and sense of competence can enhance their motivation. Additionally, providing appropriate external rewards, recognition, or incentives can further bolster their motivation and performance.

Deep study strategy refers to approaches that involve meaningful engagement with learning material to understand concepts, make connections, and critically analyze information. Surface study strategy, on the other hand, typically involves superficial approaches like memorization or rote learning without deeper understanding. In the academic performance model, the positive feature importance of both deep and surface study strategies suggest that students who employ two of them in learning are more likely to perform well. The model's results underscore the value of considering both deep and surface study strategies as complementary components of a holistic study approach. Encouraging students to use a combination of these strategies, while emphasizing understanding and meaningful learning, can lead to better academic performance. This suggests that students who can adapt their study strategies based on the nature of the task or content, employing deep strategies when understanding and critical thinking are required, and utilizing surface strategies for information retention, are predicted to have better outcomes.

Given the significance of deep and surface study strategies in the model, the findings suggest the importance of teaching and fostering a range of study skills. Educators can emphasize the development of the deep study strategy to promote comprehension, critical thinking, and meaningful engagement with the subject matter. At the same time, teaching students effective surface-level strategies, such as mnemonic techniques or summarization, can aid in retaining information and improving performance in certain contexts.

Discussion

The present study aimed to explore the predictive factors of study strategies and academic performance, focusing on dimensions such as intrinsic and extrinsic motivation, autonomy, relatedness, competence, and self-esteem. Additionally, the study examined the performance of five popular ML algorithms, RF, KNNs, DT, Linear/LR, and SVM, in predicting study strategies and academic performance. The findings provide valuable insights into the relationships between these factors and their impact on students’ study strategies and academic performance. In line with our RQ1 and RQ2, the findings demonstrate the feasibility of applying these learning attributes for predicting student study strategies and academic performance. The findings in this study revealed that these attributes are important to learning success as they provided good accuracy value and show positive relationships with student study strategies and academic performance. For RQ3, the best prediction model for detecting at-risk students based on academic performance is the DT followed by RF. The best classification model for study strategies is the RF followed by the DT; they both achieved much better accuracy values than the other models. Accurate prediction of at-risk students is crucial for early intervention and support. By utilizing the DT and RF models, educational institutions can effectively identify students who may be at risk of academic challenges. This early identification allows for timely and targeted interventions to be implemented, including additional tutoring, mentoring programs, counseling services, or personalized support systems. Such interventions can help address academic difficulties promptly, improve student outcomes, and prevent potential negative consequences.

The results of this study revealed that both intrinsic and extrinsic motivation, as well as study strategies (deep and surface), contributed positively to predicting academic performance. This finding suggests that the combination of intrinsic and extrinsic motivation plays a critical role in shaping students’ academic success. Students who possess higher levels of both types of motivation are more likely to demonstrate effective study strategies and achieve better academic outcomes. This aligns with previous research highlighting the importance of motivation in driving engagement and achievement in educational settings (Wu, 2019).

Furthermore, the identification of deep and surface study strategies as having positive feature importance highlights their significance in predicting academic performance. This finding suggests that students who employ a balanced approach, utilizing both deep and surface strategies depending on the task, are more likely to excel academically. The ability to adapt study strategies based on the nature of the content and task may facilitate better understanding, critical thinking, and information retention, leading to improved academic performance. These findings underscore the importance of teaching and promoting a range of study skills to support students in developing effective study strategies. Educators and policymakers can leverage these findings to make informed decisions about curriculum design, instructional practices, and the allocation of resources to optimize study strategies and improve overall academic performance.

The utilization of ML algorithms, particularly RF and DT, provided robust predictive models for study strategies and academic performance. These algorithms demonstrated their effectiveness in capturing complex relationships between the predictors and the criterion variables. By employing these algorithms, this study yielded effective prediction models and feature importance rankings, facilitating a better understanding of the relative importance of different factors in predicting academic performance and study strategies. Personalized support to individual students can be adapted to their changing needs based on the model's predictions. And this will result in varying the degree of support to each student depending on their specific learning context needs as well as their competency level to provide real-time support driven by artificial intelligence.

The practical significance of these findings lies in their potential to inform educational interventions and support systems. The identification of influential factors and the understanding of their relationships with study strategies and academic performance can guide educators and policymakers in designing targeted interventions. Based on the findings, incorporating different design strategies (as shown in Figure 6) which include external motivators, discussion forums that allow students to connect and share knowledge with others (relatedness), autonomy (through allowing students to control their goals), competence (by providing relevant skills needed to achieve specific goals), and fascinating instructional design that will attract students in the design of online educational systems will influence how committed students will engage in their learning resources and consequently their learning outcome. By fostering intrinsic and extrinsic motivation, promoting a balanced approach to study strategies, and addressing the unique needs of students with varying levels of self-esteem and competence, educational institutions can enhance students’ academic success, retention rates, and overall well-being.

Figure 6.

Implication of this study's results.

Selecting relevant and useful learner attributes for effective prediction of student academic performance has led to the development of various predictive models which were applied to interventions that helped to minimize students’ failure rates (Essa & Ayad, 2012; Greer et al., 2015). For instance, course-based predictive models have been developed using regression modeling techniques such as linear regression. Important variables such as the number of posts on discussion, the total number of quizzes completed, views of lessons, reports, current and previous grades, etc. were employed in building the models. The sample course variables used for model fitting may limit the generalizability of the models (Macfadyen & Dawson, 2010). The models may be effective when applied to courses with comparatively consistent structures. However, the models’ prediction results may not supply any further information that could be interpreted by practitioners to design useful interventions, thus limiting potential benefits that institutions might derive from their data by developing predictive modeling for predicting student success. However, building models with generic features as shown in this study could lead to the generation of models suitable for supporting the diverse needs of higher educational institutions and online educational systems while also allowing them to take full benefit of predictive analytics in applying interventions that will help to minimize students’ failure rate.

Limitation

The models created in this research depend on data collected using self-reports. We acknowledge that the use of self-report responses has strengths (in terms of validity and reliability) and biases. Our data on study strategies considered only deep and surface strategies; strategic learning was not considered. Although we have shown that the models built in this research can predict study strategies and academic performance with good accuracy using our available data, deep learning, and ensemble approaches were not investigated to know if they will further improve the performance of the models. Future research could investigate the approaches and also incorporate a more range of predictors to provide a further comprehensive understanding of the factors influencing study strategies and academic performance.

Conclusion

We applied the supervised ML approach to understand the predictive ability of some psychological attributes with respect to student study strategies and academic performance. Specifically, we implemented five ML regression and classification models and compared their performance to determine the collective impact of the attributes on study strategies and academic performance. This study developed effective models that can predict students’ study strategies and academic performance using generic attributes, which implies that the models can be utilized for a variety of higher education courses to determine whether a student will graduate. In contrast, it may be challenging to generalize some course-specific prediction models to other courses. Based on the results of our models, motivational and other learner attributes provided good accuracy values in predicting study strategies and academic performance. The best-performing regressor in this study has an MAE of 0.3777 in predicting academic performance while the best-performing classifier has an F1 score of 94.9%. The results imply that addressing varied student needs by incorporating design strategies that improve the learning attributes investigated in this research into educational systems will facilitate students to make better progress in their learning. The models generated in this study can be applied by educational administrators in identifying students’ study strategies and those at risk of dropping out of a course/higher education in order to provide necessary support and interventions.

In order to enhance the effectiveness of future eLearning systems, designers should incorporate suitable motivational support by employing diverse design strategies and leveraging ML models within the applications. These measures will enable the provision of automatic adaptive support for students.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by Natural Sciences and Engineering Research Council of Canada through the Vanier Canada Graduate Scholarship CGV-175722 and Discovery Grant program (grant number CGV-175722, RGPIN-2021-03521).

ORCID iD

Fidelia A. Orji

Author biographies

Fidelia A. Orji is currently pursuing a PhD degree with the Department of Computer Science, University of Saskatchewan. She is interested in solving problems associated with online educational systems by investigating learner-centred approaches for improving the systems to motivate learners in achieving the desired learning objectives. Her research involves designing and developing adaptive and persuasive systems to promote learners’ motivation and engagement to achieve desired learning objectives. She has over 7 years of industrial experience as a Software Engineer. She has published over 15 peer-reviewed research papers in this area and served as a reviewer for international conferences and journals. Her research area is at the intersection of Artificial Intelligence, Data Analytics, User Modelling, Adaptive Learning, Persuasive Technology, and Human-computer Interaction. She is also a Vanier graduate scholar.

Julita Vassileva BSc (1984), MSc (1986), PhD (1992), Bulgarian Academy of Sciences, Federal Armed Forces University Munich, University of Saskatchewan, Editor in Chief: Frontiers in AI - Human Learning and Behaviour Change, Co-Editor: PeerJ: Computer Science (HCI), Editorial Board: User Modeling and User-Adapted Interaction (Springer), ACM Transactions on Social Computing, International Journal of AI in Education (Springer). Distinguished Researcher Award (University of Saskatchewan, 2021), Distinguished Graduate Supervision Award (University of Saskatchewan, 2014), over 250 peer-review papers, H-index: 55, Research Areas: personalization, user modelling, persuasive technology, AI in Education, social computing, trust and reputation mechanisms, ethical AI. Member of IEEE and ACM.

References

Areepattamannil

Freeman

J. G.

Klinger

D. A.

(2011). Intrinsic motivation, extrinsic motivation, and academic achievement among Indian adolescents in Canada and India. Social Psychology of Education, 14(3), 427–439. https://doi.org/10.1007/s11218-011-9155-1

Aziz

A. A.

Hafieza

Ahmad

(2014). First semester computer science students’ academic performances analysis by using data mining classification algorithms. International Conference on Artificial Intelligence and Computer Sciencee (AICS 2014), 100–109. http://worldconferences.net

Bandura

(1977). Self-efficacy: Toward a unifying theory of behavioral change. Psychological Review, 84(2), 191–215. https://doi.org/10.1037/0033-295X.84.2.191

Biggs

Kember

Leung Doris

Y. P.

(2001). The revised two factor study process questionnaire: R-SPQ-2F. British Journal of Educational Psychology, 71(1), 133–149. https://doi.org/10.1348/000709901158433

Botchkarev

(2019). A new typology design of performance metrics to measure errors in machine learning regression algorithms. Interdisciplinary Journal of Information, Knowledge, and Management, 14, 45–76. https://doi.org/10.28945/4184

Branco

Torgo

Ribeiro

R. P.

(2016). A survey of predictive modeling on imbalanced domains. ACM Computing Surveys, 49(2), 1–50. https://doi.org/10.1145/2907070

Chen

K. C.

Jang

S. J.

Branch

R. M.

(2010). Autonomy, affiliation, and ability: Relative salience of factors that influence online learner motivation and learning outcomes. Knowledge Management & E-Learning: An International Journal, 2(1), 30–50. https://doi.org/10.34105/j.kmel.2010.02.004

Chen

Y. Y.

Mohd Taib

Che Nordin

C. S.

(2012). Determinants of student performance in advanced programming course. International Conference for Internet Technology and Secured Transactions, ICITST 2012, 304–307. https://ieeexplore.ieee.org/document/6470965

Domínguez

Martín

Martín-Albo

Núñez

J. L.

León

(2010). Translation and validation of the Spanish version of the “echelle de satisfaction des besoins psychologiques” in the sports context. The Spanish Journal of Psychology, 13(2), 1010–1020. https://doi.org/10.1017/S1138741600002651

10.

Du Boulay

Del Soldato

(2016). Implementation of motivational tactics in tutoring systems: 20 years on. International Journal of Artificial Intelligence in Education, 26(1), 170–182. https://doi.org/10.1007/s40593-015-0052-1

11.

Essa

Ayad

(2012). Improving student success using predictive models and data visualisations. Research in Learning Technology, 20(SUPPL), 58–70. https://doi.org/10.3402/RLT.V20I0.19191

12.

Feri

Soemantri

Jusuf

(2016). The relationship between autonomous motivation and autonomy support in medical students’ academic achievement. International Journal of Medical Education, 7, 417–423. https://doi.org/10.5116/IJME.5843.1097

13.

Gopalan

Bakar

J. A. A.

Zulkifli

A. N.

Alwi

Mat

R. C.

(2017). A review of the motivation theories in learning. AIP Conference Proceedings, 1891, 40002. https://doi.org/10.1063/1.5005376

14.

Greer

Frost

Banow

Thompson

Kuleza

Wilson

Koehn

(2015). The student advice recommender agent: SARA. Proceedings of PALE 2015: Workshop on Personalization Approaches in Learning Environments in Conjunction with the International Conference User Modeling, Adaptation, and Personalization UMAP, 2015, p.1388.

15.

Garcia

E. A.

(2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284. https://doi.org/10.1109/TKDE.2008.239

16.

Jamjoom

M. M.

Alabdulkareem

E. A.

Hadjouni

Karim

F. K.

Qarh

M. A.

(2021). Early prediction for at-risk students in an introductory programming course based on student self-efficacy. Informatica, 45(6). https://doi.org/10.31449/INF.V45I6.3528

17.

Japkowicz

Shah

(2011). Evaluating learning algorithms: a classification perspective. In Evaluating learning algorithms: A classification perspective (Vol. 9780521196). Cambridge University Press. https://doi.org/10.1017/CBO9780511921803

18.

Macfadyen

L. P.

Dawson

(2010). Mining LMS data to develop an “early warning system” for educators: A proof of concept. Computers & Education, 54(2), 588–599. https://doi.org/10.1016/J.COMPEDU.2009.09.008

19.

Marbouti

Diefes-Dux

H. A.

Madhavan

(2016). Models for early prediction of at-risk students in a course using standards-based grading. Computers & Education, 103, 1–15. https://doi.org/10.1016/J.COMPEDU.2016.09.005

20.

Orji

Vassileva

(2020). Using machine learning to explore the relation between student engagement and student performance. Proceedings of the International Conference on Information Visualisation, 2020-September, pp.480–485. https://doi.org/10.1109/IV51561.2020.00083

21.

Orji

F. A.

Vassileva

(2022). Automatic modeling of student characteristics with interaction and physiological data using machine learning: a review. Frontiers in Artificial Intelligence, 5, https://doi.org/10.3389/frai.2022.1015660

22.

Orsini

C. A.

Binnie

V. I.

Tricio

J. A.

(2018). Motivational profiles and their relationships with basic psychological needs, academic performance, study strategies, self-esteem, and vitality in dental students in Chile. Journal of Educational Evaluation for Health Professions, 15, 11. https://doi.org/10.3352/jeehp.2018.15.11

23.

Rosenberg

(2006). Rosenberg Self-Esteem Scale. Www.Apa.Org/Obesity-Guideline/Rosenberg-Self-Esteem.Pdf. https://doi.org/10.32388/bcazmm

24.

Rosenberg

(2015). Society and the adolescent self-image. Princeton University Press, 1989.

25.

Ryan

Richard M.

Deci

E. L.

(2017). Self-determination theory: Basic psychological needs in motivation, development, and wellness. In Self-determination theory: Basic psychological needs in motivation, development, and wellness. Guilford Press. https://doi.org/10.1521/978.14625/28806

26.

Ryan

R. M.

Deci

E. L.

(2000). Self-determination theory and the facilitation of intrinsic motivation, social development, and well-being. American Psychologist, 55(1), 68–78. https://doi.org/10.1037/0003-066X.55.1.68

27.

Seiffert

Khoshgoftaar

T. M.

Van Hulse

Napolitano

(2010). RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Transactions on Systems, Man, and Cybernetics. Part A: Systems and Humans, 40(1), 185–197. https://doi.org/10.1109/TSMCA.2009.2029559

28.

Sundar

P. V. P.

(2013). A comparative study for predicting student’s academic performance using Bayesian network classifiers. IOSR Journal of Engineering, 03(02), 37–42. https://doi.org/10.9790/3021-03213742

29.

Tuckman

H. P.

(1975). Teacher effectiveness and student performance. The Journal of Economic Education, 7(1), 34–39. https://doi.org/10.1080/00220485.1975.10845419

30.

Vallerand

R. J.

Pelletier

L. G.

Blais

M. R.

Brière

N. M.

Senécal

C. B.

Vallières

É. F.

(1993). Academic motivation scale (ams-c 28) college version. Educational and Psychological Measurement, 52(53), 1992–1993. https://doi.org/10.1037/t25718-000

31.

Vallerand

Robert J.

Pelletier

L. G.

Blais

M. R.

Briere

N. M.

Senecal

Vallieres

E. F.

(1992). The academic motivation scale: A measure of intrinsic, extrinsic, and amotivation in education. Educational and Psychological Measurement, 52(4), 1003–1017. https://doi.org/10.1177/0013164492052004025

32.

Vansteenkiste

Simons

Lens

Sheldon

K. M.

Deci

E. L.

(2004). Motivating learning, performance, and persistence: The synergistic effects of intrinsic goal contents and autonomy-supportive contexts. Journal of Personality and Social Psychology, 87(2), 246–260. https://doi.org/10.1037/0022-3514.87.2.246

33.

(2019). Academic motivation, engagement, and achievement among college students. College Student Journal, 53(1), 99–112.

Modeling the Impact of Motivation Factors on Students’ Study Strategies and Performance Using Machine Learning

Abstract

Keywords

Introduction

Random Forest

K-Nearest Neighbors

Decision Tree

Linear/Logistic Regression

Support Vector Machine

Related Work

Student Motivation

Predictive Modeling and Academic Performance

Method

Data Description and Preprocessing

Regression and Classification Experiments

Results

Insights on Features’ Significance

Discussion

Limitation

Conclusion

Footnotes

Declaration of Conflicting Interests

Funding

ORCID iD

Author biographies

References