Abstract
The purpose of this study is to develop an optimized prediction model for Champions League qualification using machine learning and SHAP methods. The research data comprises 14 match statistics from 300 teams over 15 seasons, from the 2009–2010 season to the 2023–2024 season of the English Premier League. The prediction models were evaluated by combining four algorithms with six data balancing techniques, resulting in the evaluation of 24 models. Additionally, the hyperparameters of the top-performing models were tuned to evaluate the optimized models’ performance. Finally, SHAP methodology was applied to assess the importance of variables in the optimized models. The results of this study are as follows. First, the performance of the SMOTE XGBoost model was the highest, followed by the SMOTE-Tomek Gradient Boosting Trees model. Second, the most important variable in the SMOTE XGBoost model was the Shots OT pg, followed by Six Yard Box. These results can contribute not only to predicting team performance in advance but also to improving operational efficiency and can serve as foundational data for teams aiming for long-term success.
Get full access to this article
View all access options for this article.
