Sage Journals: Discover world-class research

Abstract

The Sustainable Development Goals (SDGs), established by the United Nations in 2015, offer a comprehensive framework to address major global challenges, including poverty, inequality and environmental degradation, with the overarching aim of achieving prosperity and well-being for all by 2030. Reliable predictions of SDG indicators are crucial for proactive policy-making and optimizing resource allocation, ensuring that interventions are effectively targeted to areas of greatest need. This paper introduces a two-step process for constructing machine learning models to forecast SDG indicators. In the first step, we apply a shape-based clustering method to group countries with similar underlying characteristics, thereby forming more homogeneous clusters for analysis. In the second step, machine learning models, based on XGBoost and LSTM, are trained for each cluster, tailored to the specific characteristics of the countries within these groups. Additionally, models are also trained on the full, unclustered dataset for comparison. We apply this approach to SDG indicator 9.2.1, which tracks manufacturing value added per capita. Our results show that the cluster-specific machine learning models consistently outperform traditional time series forecasting methods such as ARIMA and Holt’s damped trend model, underscoring the potential of this method to enhance the accuracy of SDG forecasting. Furthermore, we use the machine learning-based forecasts to conduct an outlook assessment of SDG 9.2.1, which reveals that the majority of countries remain significantly off-track to achieving the 2030 target, emphasizing the urgent need for more targeted and timely policy interventions.

Keywords

forecasting short panel data clustering machine learning SDGs

Get full access to this article

View all access options for this article.

References

United Nations. Transforming Our World: The 2030 Agenda for Sustainable Development. New York: United Nations, Department of Economic and Social Affairs, 2015. https://doi.org/10.1891/9780826190123.ap02.

United Nations. The Sustainable Development Goals Report 2025. New York: United Nations, Department of Economic and Social Affairs, 2025.

Biggeri

Clark

Ferrannini

, et al. Tracking the sdgs in an ‘integrated’ manner: A proposal for a new index to capture synergies and trade-offs between and within goals. World Dev 2019; 122: 628–647.

Huan

Liang

, et al. A systematic method for assessing progress of achieving sustainable development goals: A case study of 15 countries. Sci Total Environ 2021; 752: 141875.

Firoiu

Ionescu

Pirvu

, et al. Achievement of the sustainable development goals (sdg) in portugal and forecast of key indicators until 2030. Technol Econ Dev Econ 2022; 28: 1649–1683.

Dello Strologo

D’Andrassi

Paoloni

, et al. Italy versus other european countries: Sustainable development goals, policies and future hypothetical results. Sustainability 2021; 13: 3417.

Chenary

Pirian Kalat

Sharifi

. Forecasting sustainable development goals scores by 2030 using machine learning models. Sustain Dev 2024; 32: 6520–6538.

United Nations. SDG Global Database. https://unstats.un.org/sdgs/indicators/database/, 2025.

Hyndman

Kostenko

, et al. Minimum sample size requirements for seasonal forecasting models. foresight 2007; 6: 12–15.

10.

Box

GEP

Tiao

. Intervention analysis with applications to economic and environmental problems. J Am Stat Assoc 1975; 70: 70–79.

11.

Bidarbakht-Nia

. A weighted extrapolation method for measuring the sdgs progress. Work Paper ESCAP 2017.

12.

Zhang

Patuwo

. Forecasting with artificial neural networks:: The state of the art. Int J Forecast 1998; 14: 35–62.

13.

Palit

Popovic

. Computational intelligence in time series forecasting: theory and engineering applications. London: Springer Science & Business Media, 2006.

14.

Ahmed

Atiya

Gayar

, et al. An empirical comparison of machine learning models for time series forecasting. Econom Rev 2010; 29: 594–621.

15.

Makridakis

Spiliotis

Assimakopoulos

. Statistical and machine learning forecasting methods: Concerns and ways forward. PLoS ONE 2018; 13: e0194889.

16.

Bontempi

Ben Taieb

Le Borgne

. Machine learning strategies for time series forecasting. In: European Big Data Management and Analytics Summer School. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp.62–77.

17.

Efremova

West

Zausaev

. Ai-based evaluation of the sdgs: The case of crop detection with earth observation data. arXiv preprint arXiv:190702813 2019.

18.

Yeh

Meng

Wang

, et al. Sustainbench: Benchmarks for monitoring the sustainable development goals with machine learning. In: Thirty-fifth conference on neural information processing systems, datasets and benchmarks track (Round 2). https://openreview.net/forum?id=5HR3vCylqD.

19.

Persello

Wegner

Hänsch

, et al. Deep learning and earth observation to support the sustainable development goals: Current approaches, open challenges, and future opportunities. IEEE Geosci Remote Sens Maga 2022; 10: 172–200.

20.

Antoniou

. Volunteered geographic information, citizen science and machine learning in the service of sustainable development goals and the sendai framework. Citizen Sci: Theory Pract 2023; 8: 37.

21.

Alharbi

Arribas-Bel

Coenen

. Sustainable development goal relational modelling and prediction. J Data Intell 2021; 2: 348–367.

22.

García-Rodríguez

Núñez

Pérez

, et al. Sustainable visions: unsupervised machine learning insights on global development goals. PLoS ONE 2025; 20: e0317412.

23.

Chen

Guestrin

. Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 2016, pp.785–794.

24.

Hochreiter

Schmidhuber

. Long short-term memory. Neural Comput 1997; 9: 1735–1780.

25.

Aghabozorgi

Shirkhorshidi

Wah

. Time-series clustering–a decade review. Inf Syst 2015; 53: 16–38.

26.

Sardá-Espinosa

. Comparing time-series clustering algorithms in r using the dtwclust package. R package vignette 2017; 12: 41.

27.

Arbelaitz

Gurrutxaga

Muguerza

, et al. An extensive comparative study of cluster validity indices. Pattern Recognit 2013; 46: 243–256.

28.

Wang

Zhang

. On fuzzy cluster validity indices. Fuzzy Sets Syst 2007; 158: 2095–2117.

29.

UNIDO. UNIDO Statistics Portal. United Nations Industrial Development Organization (UNIDO), Vienna, Austria, 2024. https://stat.unido.org.

30.

R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2023. https://www.R-project.org/.

31.

United Nations Industrial Development Organization (UNIDO). UNIDO Country Classifications 2025, 2025.

32.

World Bank. World bank country and lending groups. Available at https://datahelpdesk.worldbank.org, 2023.

33.

Arachchige

Prendergast

Staudte

. Robust analogs to the coefficient of variation. J Appl Stat 2022; 49: 268–290.

34.

Hoaglin

Mosteller

Tukey

. Understanding Robust and Exploratory Data Analysis. New York: Wiley, 1983. ISBN 9780471097773.

35.

Stock

Watson

. Forecasting using principal components from a large number of predictors. J Am Stat Assoc 2002; 97: 1167–1179.

36.

Chen

Benesty

, et al. xgboost: Extreme Gradient Boosting, 2024. https://CRAN.R-project.org/package=xgboost. R package version 1.7.8.1.

37.

Allaire

Chollet

. keras: R Interface to ’Keras’, 2024. https://CRAN.R-project.org/package=keras. R package version 2.15.0.

38.

Allaire

Tang

. tensorflow: R Interface to ’TensorFlow’, 2024. https://CRAN.R-project.org/package=tensorflow. R package version 2.16.0.

39.

Bengio

Delalleau

. On the expressive power of deep architectures. In: International conference on algorithmic learning theory, 2011, pp.18–36. Springer.

40.

Ismailov

. On the approximation by neural networks with bounded number of neurons in hidden layers. J Math Anal Appl 2014; 417: 963–969.

41.

Rolnick

Tegmark

. The power of deeper networks for expressing natural functions. arXiv preprint arXiv:170505502 2017.

42.

Thompson

Greenewald

Lee

, et al. The computational limits of deep learning. CoRR 2020; abs/2007.05558. https://arxiv.org/abs/2007.05558.

43.

Wolpert

Macready

. No free lunch theorems for optimization. IEEE Trans Evolut Comput 1997; 1: 67–82.

44.

United Nations, Department of Economic and Social Affairs, Population Division. World Population Prospects 2024, 2024.

45.

Tukey

. Exploratory Data Analysis. Addison-Wesley, 1977.

46.

Carlstein

. The use of subseries values for estimating the variance of a general statistic from a stationary sequence. The Ann Stat 1986; 14: 1171–1179.

47.

Künsch

. The jackknife and the bootstrap for general stationary observations. The Ann Stat 1989; 17: 1217–1241.

48.

Politis

Romano

. A circular block-resampling procedure for stationary data. Purdue University. Department of Statistics, 1991.

49.

Politis

Romano

. The stationary bootstrap. J Am Stat Assoc 1994; 89: 1303–1313.

50.

United Nations Industrial Development Organization (UNIDO). Statistical Indicators of Inclusive and Sustainable Industrialization: Biennial Progress Report 2021. Vienna: UNIDO, 2021.

51.

Bidarbakht-Nia

. Measuring sustainable development goals (sdgs): An inclusive approach. Global Policy 2020; 11: 56–67.

52.

United Nations Economic and Social Commission for Asia and the Pacific (ESCAP). Asia and the Pacific SDG Progress Report 2023: Championing Sustainability Despite Adversities. Bangkok: ESCAP, 2023.

53.

United Nations Statistics Division. Standard Country or Area Codes for Statistical Use (M49). Available at https://unstats.un.org/unsd (accessed on June 2025), 2025.

54.

United Nations Industrial Development Organization (UNIDO). Statistical Indicators of Inclusive and Sustainable Industrialization: Biennial Progress Report 2025. Vienna: UNIDO, 2025.

Forecasting short panel data using shape-based clustering and machine learning: Insights from an SDG indicator

Abstract

Keywords

Get full access to this article

References