A Bayesian optimized LSboost framework for energy usage intensity prediction in a university office building

Abstract

Accurately predicting Energy Use Intensity (EUI) has become increasingly important in efforts to enhance energy efficiency and sustainability in buildings. This study aims at comparing the performance of three machine learning approaches, namely, Baseline Ensemble, Auto Hyperparameter Optimized Ensemble and Bayesian Optimized Ensemble, using real world sensor data collected from four zones of a university building in Thailand via a Building Energy Management System. The models are tested on both un-normalized and min-max normalized datasets to examine how data preprocessing influences prediction accuracy, error reduction, and training efficiency. The results of the study demonstrate that normalization improves prediction precision by significantly reducing mean absolute error and mean squared error values, although it has a limited effect on R² values. Among the three approaches, the Bayesian Optimized model trained on normalized data provides the most accurate and stable results while maintaining reasonable training times. These results highlight the practical value of integrating normalization and automated tuning when designing building energy models. The proposed Least Squares Boosting (LSBoost) – Bayesian Optimization model in the study offers a reliable and adaptable tool for forecasting Energy Use Intensity, with potential applications in real-time control, diagnostics, and long-term energy planning.

Practical application

This study presents a robust, data-driven framework for accurately forecasting Energy Use Intensity (EUI) in real-world building operations using Bayesian-optimized LSBoost models. By integrating indoor sensor data, external weather variables, and advanced machine learning, the proposed method supports energy managers, building operators, and HVAC control engineers in enhancing predictive maintenance, operational efficiency, and real-time energy management. The approach is particularly suited for smart building systems and retrofitting strategies that require scalable, accurate, and resource-efficient energy modeling under variable occupancy and environmental conditions.

Keywords

energy use intensity prediction LSboost bayesian optimization building energy modeling machine learning

Introduction

Research background and motivation

Building operations contribute to 30% of the world’s final energy consumption and 26% of global carbon emissions, highlighting the importance of improving energy efficiency in buildings as a key measure in addressing climate change.¹ Therefore, the International Energy Agency (IEA), has recognized the building sector as one of the most economical areas for cutting energy use and issued 25 policy recommendations targeting to improve energy efficiency and decrease CO₂ emissions.² Hence, it is no surprise that three out of the 17 United Nation’s Sustainable Development Goals (SDGs), namely SDG 7: Affordable and Clean Energy, SDG 11: Sustainable Cities and Communities and SDG 12: Responsible Consumption and Production are directly related to the promotion of energy efficiency and sustainability in the building sector.^3,4 Predicting building energy consumption is fundamentally presented as a strategy for conserving energy and enhancing decision making to reduce overall energy use in the literature.^5,6

Building energy performance refers to the efficiency with which a structure consumes energy, and one of its most widely recognized indicators is Energy Use Intensity (EUI). Energy Use Intensity (EUI), defined as the total annual energy consumption per unit of gross floor area (kWh/m²/year), is widely adopted as a standardized metric to evaluate and compare the energy efficiency of buildings across different types and sizes.⁷ EUI is not only an internal performance metric but also plays a significant role in ensuring regulatory compliance and achieving recognized sustainability standards. Increasingly mandated by jurisdictions and referenced in benchmarking systems, EUI serves as a foundational metric for retrofit planning, design optimization, and policy compliance.⁸ Accurate EUI data enables building owners and managers to benchmark performance, identify improvement areas, and evaluate the effectiveness of energy saving initiatives.^9–12 Predicting EUI involves accounting for a range of building specific and operational variables. Energy performance is influenced not only by heating, ventilation and air conditioning (HVAC), lighting, and plug loads, but also by broader environmental and behavioral factors, including user demographics, financial conditions, building design features, local climate, and operational energy characteristics, all of which interactively affect overall building energy consumption.¹³ Temperature is particularly recognized as a reliable predictor for HVAC demand across various building types, with heating and cooling loads correlating strongly to indoor-outdoor thermal variances.^14,15 Solar radiation and humidity further modulate thermal loads, the former significantly influencing lighting energy demand as well. Thus, their inclusion in data sets enhances predictive accuracy.^15–17

Energy performance is shaped by a complex interplay of physical, mechanical, meteorological, and behavioral systems.¹⁸ Occupancy, in particular, plays a pivotal role in influencing both thermal and nonthermal loads. Lighting and plug loads are especially sensitive to occupant presence and behavior. Studies suggest that occupancy schedules drive lighting usage patterns, while occupant driven activities account for a significant portion of plug load consumption in office buildings.^19,20 Additionally, the development of occupant related schedules contributes to modeling precision by aligning simulations with real world behavioral patterns.^21,22 However, modeling these effects remains challenging due to the unpredictable and interdependent nature of usage patterns.^21,23 Discrepancies between predicted and measured energy use persist due to factors such as inaccurate modeling assumptions, diversity of end use behaviors and variations in occupancy. Uncertainties in material properties and construction practices can also contribute to these discrepancies.^24,25 Moreover, energy models employed during the design phase often fail to reflect actual operational performance once the building is in use.²⁶ This gap between values underlines the importance of using multiple, data rich predictors and accounting for temporal patterns through daily load profiles and correlated environmental data.^17,19,20,23 Data variability and the lack of sufficiently detailed measurement data further complicate reliable modeling efforts.^27,28 In practice, measured energy consumption can differ substantially from predicted values, sometimes by several multiples.²⁶

To summarize, accurate prediction of EUI remains challenging due to the influence of various factors such as climatic variation, occupant behavior, building typologies, data availability, and the natural variability in building energy performance.²⁶ These challenges underline the limitations of conventional modeling techniques and emphasize the need for more advanced, data-driven approaches to bridge the building energy performance gap. In this context, interpretability and accountability of machine learning models are not only highly desirable but also increasingly demanded, as they enhance transparency and trust in data-driven decision-making within the field of building energy consumption.²⁹

In their study, Ciampi et al.³⁰ propose analyzing prediction techniques across three categories: conventional, artificial intelligence (AI)-based, and hybrid models. Their study emphasizes that while conventional models offer ease of implementation, their performance tends to decrease in the presence of nonlinear relationships. In contrast, AI-based models are capable of addressing such complexities more effectively, however with increased computational complexity and greater data requirements. Similarly, hybrid modeling approaches may achieve superior predictive performance but necessitate more sophisticated methodologies.^30–33 Given these tradeoffs, the characteristics of the available dataset and the compatibility between data and selected prediction methods are critical factors in enhancing overall prediction accuracy. Therefore, selecting an appropriate method using a comparative analysis of multiple approaches can provide a significant advantage in achieving more reliable predictions. For energy performance prediction of buildings, these considerations are particularly relevant, as traditional methods can be time consuming and require extensive manual analysis. On the other hand, machine learning based models are distinguished by their ability to learn more complex relationships. Therefore machine learning models, or referring to Ciampi et al.’s classification, AI-based models and hybrid models offer a powerful alternative, enabling dynamic energy management through the efficient processing of large datasets.^34,35

This study investigates the prediction of Energy Use Intensity (EUI) using Regression Ensemble Trees trained with the Least-Squares Boosting (LSBoost) algorithm. It specifically explores the impact of three different optimization strategies—namely, non-optimized, auto hyperparameter optimized, and Bayesian optimized LSBoost ensembles—on model performance.

The primary contribution of this study lies in addressing a notable research gap: while ensemble methods and boosting algorithms have been increasingly applied to building energy prediction, the integration of Bayesian optimization with LSBoost for EUI prediction has not been systematically explored. By benchmarking this approach against traditional and automated hyperparameter tuning methods, this work contributes new empirical evidence on its performance advantages and generalization capabilities. Furthermore, the study offers insights into how feature normalization affects model performance when dealing with mixed-scale building datasets, providing guidance for future data-driven energy modeling frameworks.

To this end, the study is guided by the following research questions:

(1) Can Bayesian optimization enhance the predictive accuracy of LSBoost ensembles for EUI prediction?

(2) How does feature normalization influence model performance under different optimization schemes?

The investigation uses a publicly available dataset—CU-BEMS—containing energy consumption and environmental measurements from the Chamchuri five building at Chulalongkorn University in Bangkok, Thailand.³⁶ The findings provide insights into the impact of feature scaling and model optimization on EUI prediction, offering guidance for data-driven energy performance assessments in similar commercial building types.

Literature review

Machine learning (ML) is a widely adopted and powerful tool for prediction tasks across various domains, primarily due to its capacity to model complex, nonlinear relationships in data.^34,37,38 For building energy performance prediction, a range of ML algorithms are commonly used, including linear regression (LR), neural networks (NN), decision trees (DT), random forests (RF), K-nearest neighbors (KNN), gradient boosting (GB), and support vector regression (SVR).^38–43 Deep learning models such as recurrent neural network (RNN), long short-term memory (LSTM), gated recurrent unit (GRU) are also applied for energy forecasting purposes.^44–47

Traditional regression models, such as linear regression, require assumptions about the underlying data distribution, which limits their flexibility. In contrast, machine learning based regression approaches can learn more complex and nonlinear relationships without these constraints, making them more suitable for energy modeling tasks.⁴⁸ Each algorithm offers unique strengths: decision trees generate interpretable, rule based predictions by partitioning data based on informative features; neural networks, inspired by biological neurons, capture highly complex patterns through interconnected layers; and linear regression remains valuable when relationships among variables are predominantly linear. Among these approaches, tree-based methods, such as decision trees, random forests, and gradient boosting, are particularly notable for their robustness to heterogeneous data and their balance between predictive accuracy and interpretability. Numerous studies have confirmed their effectiveness in energy forecasting applications.^38–42

Ensemble learning is a modeling framework that integrates multiple individual models to produce superior predictions by utilizing their collective strengths.⁴⁹ It mitigates the weaknesses of individual models, reducing both variance and bias, and ultimately improving generalization of data.⁵⁰ Ensemble methods are typically categorized into bagging, boosting, and stacking.^51,52 Bagging, exemplified by random forests, reduces variance by averaging predictions across multiple models trained on different data subsets. Boosting methods such as Gradient Boosting, Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM) and Least Squares Boosting (LSBoost), focus on sequentially correcting the errors of weak learners, thereby reducing bias. Stacking, on the other hand, combines the predictions of several base models by training a meta-learner to generate the final prediction. Each strategy employs a distinct mechanism for model aggregation, contributing to prediction accuracy, especially in complex tasks like building energy forecasting.^51,52

A prominent method within ensemble learning is the application of ensemble trees, particularly through bagging techniques. Ensemble Bagging Trees (EBT) enhance prediction stability by training several decision tree models on slightly varied versions of the original dataset, created through random sampling. Their individual predictions are then combined to produce a more reliable and balanced final output.^37,53 Compared to conventional decision tree methods such as Classification and Regression Trees (CART), EBT offers superior accuracy and reduced variance, making it well suited for applications requiring real-time fault detection and system monitoring.^37,54

Boosting algorithms such as Gradient Boosting Decision Trees (GBDT), XGBoost, and Categorical Boosting (CatBoost) have also been widely adopted to iteratively refine model performance by emphasizing errors in previous iterations.^53,55,56 Among these, LSBoost has been particularly applied to minimize squared error loss. Although LSBoost has shown promise, its specific applications in building energy prediction remain underexplored compared to more commonly used frameworks such as XGBoost and LightGBM.⁵³ LSBoost improves its performance by building each new model to learn from the mistakes of the ones before it, gradually making the predictions more accurate with each step. Its simplicity and effectiveness in handling regression tasks make it a suitable candidate for energy forecasting. Despite LSBoost’s potential, its performance can be sensitive to hyperparameter settings, including the number of learners, learning rate, and maximum tree depth. Therefore, hyperparameter optimization plays a critical role in achieving optimal model performance. Automated hyperparameter tuning methods such as Grid Search, Random Search, Genetic Algorithms (GA), and Particle Swarm Optimization (PSO) are commonly employed to reduce manual intervention while improving model generalization and accuracy.^38,43,53,54

Bayesian models rely on Bayes’ conditional probability theory to solve classification and regression problems. These include approaches such as Naïve Bayes, Gaussian Naïve Bayes, Bayesian Networks (BNs), and Bayesian Belief Networks. While these methods are widely used in other predictive domains, they remain relatively underutilized in energy consumption forecasting.³⁰ Notable applications of BNs include Huang et al.,⁵⁷ who applied BNs for cooling load prediction and observed comparable performance with reduced computational time compared to SVM and ANN; Soares Geraldi et al.,⁵⁸ who investigated energy prediction in schools with a focus on model design; and O’Neill et al.,⁵⁹ who explored building energy performance prediction while addressing discretization and uncertainty analysis. More recently, Ciampi et al.³⁰ used Bayesian Networks to forecast industrial HVAC energy consumption. These studies highlight the broader potential of Bayesian approaches for energy modeling, although their application is still limited compared to tree-based and boosting methods. Bayesian optimization, in particular, has emerged as a robust technique for tuning hyperparameters by balancing exploration and exploitation in the search space. It builds a probabilistic model of the objective function and uses it to select the most promising hyperparameters for evaluation. In the context of LSBoost, Bayesian optimization can significantly enhance performance by identifying optimal parameter combinations without exhaustive search.⁵³

Across the reviewed studies, ensemble learning consistently emerges as a promising approach for improving the prediction accuracy, robustness, and scalability of building energy models. While challenges such as increased computational complexity remain, the integration of ensemble methods with optimization algorithms provides a flexible and powerful framework for addressing the complexities of building energy data.^{37,38,53–55}

In summary, the choice of machine learning technique for building energy prediction should be informed by the complexity of the dataset, the need for interpretability, and the required prediction accuracy. Ensemble learning approaches, especially those incorporating LSBoost with advanced hyperparameter tuning techniques, offer a compelling solution for achieving more reliable and precise energy consumption predictions.

Methodology

Data characterization and temporal patterns

The data set used in this study is the publicly available CU-BEMS dataset which includes detailed building operation data, with power consumption separately measured for individual air conditioners, lighting loads and plug loads in kW and indoor environmental measurements, namely temperature (°C), relative humidity (%), and ambient illuminance level (lux) measurements of a seven story 11,700 m² office building, the Chamchuri five building of Chulangkorn University in Bangkok, Thailand.⁶⁰ CU-BEMS is an advanced Building Energy Management System (BEMS) developed by the Smart Grid Research Unit team in partnership with the University of Tokyo, both local and international industry collaborators, and funded by Thailand’s Ministry of Energy through the Energy Conservation Fund.⁶¹ The entire dataset can be reached through figshare.³⁶ In the training of the ML models, the available data on lighting, plug and air conditioning power consumption levels, measured illuminance levels, temperature and relative humidity values for Floor 7 of the building have been utilized. Floor 7 has been selected as it is one of the floors that has the lowest amount of missing data and also being the topmost floor, it has a higher susceptibility to weather conditions, compared to lower floors. In the original data collection and presentation, different floors have been divided into different zones; for Floor 7 there are five zones available; the configuration of the zones for each floor has been meticulously explained in (Pipattanasomporn et al. 2020a). Data collection has been made on a minute basis for the power consumption in kW of a total of 7 AC units, four in Zone 1, and 1 each in Zones 2, 4 and 5; the lighting systems and the plug systems of all individual zones and the four sensors installed in each zone were used for the collection of the aforementioned indoor conditions on Floor 7. Among the five zones, Zone 3 is a circulation area including the stair cases and therefore does not include any sensors and thus has not been included in the scope of this study.

While the dataset is available between the dates of July 1^st 2018 to December 31^st 2019, there has been a maintenance in the sensors which measured illuminance levels, temperature and humidity values from September 15^th 2018 to March 5^th 2019. Therefore, the data set focused in the study begins from March 6^th 2019 and goes up till December 31^st 2019. In addition to these, outdoor weather data has been gathered from the Reliable Prognosis database⁶² for the specific time period for which the data has been collected in Chulangkorn University. The weather station that was closest to the University and had the highest frequency of data collection was selected for this purpose. A final addition to the data has been made from the Photovoltaic Geographical Information System (PVGIS).⁶³ The total number of predictors ended up to be 51, with 8 predictors coming from the sine-cosine encoding of the time variable (month, day, hour, minute), 29 predictors coming from the original data for the consumption and indoor condition values for all the zones of Floor 7 and finally 14 variables coming from outdoor weather conditions from the Reliable Prognosis and PVGIS databases. Table 1 summarizes the complete list of variables used as model predictors and the target in the study.

Table 1.

Predictors and the target of the study.

Category	Variable	Unit	Role
Temporal (encoded)	Month (sin, cos)	—	Predictor
	Day (sin, cos)	—	Predictor
	Hour (sin, cos)	—	Predictor
	Minute (sin, cos)	—	Predictor
Building – Zone 1	Indoor air temperature	°C	Predictor
	Relative humidity	%	Predictor
	Ceiling illuminance level	lux	Predictor
	Individual AC loads (AC1-AC4)	kW	Predictor
	Lighting load	kW	Predictor
	Plug load	kW	Predictor
Building – Zone 2	Indoor air temperature	°C	Predictor
	Relative humidity	%	Predictor
	Ceiling illuminance level	lux	Predictor
	AC load	kW	Predictor
	Lighting load	kW	Predictor
	Plug load	kW	Predictor
Building – Zone 3	Lighting load	kW	Predictor
Building – Zone 3	Plug load	kW	Predictor
Building – Zone 4	Indoor air temperature	°C	Predictor
	Relative humidity	%	Predictor
	Ceiling illuminance level	lux	Predictor
	AC load	kW	Predictor
	Lighting load	kW	Predictor
	Plug load	kW	Predictor
Building – Zone 5	Indoor air temperature	°C	Predictor
	Relative humidity	%	Predictor
	Ceiling illuminance level	lux	Predictor
	AC load	kW	Predictor
	Lighting load	kW	Predictor
	Plug load	kW	Predictor
Outdoor weather	Outdoor temperature	°C	Predictor
	Atmospheric pressure at station level	atm	Predictor
	Atmospheric pressure reduced to sea level	atm	Predictor
	Relative humidity	%	Predictor
	Mean wind speed	m/s	Predictor
	Maximum gust value	m/s	Predictor
	Dew point temperature	°C	Predictor
Solar/irradiance data	Beam (direct) irradiance on the inclined plane	W/m²	Predictor
	Diffuse irradiance on the inclined plane	W/m²	Predictor
	Reflected irradiance on the inclined plane	W/m²	Predictor
	Sun height	degrees	Predictor
	2-M air temperature	°C	Predictor
	10-M total wind speed	m/s	Predictor
	Global horizontal irradiance (GHI)	W/m²	Predictor
Target variable	Energy use intensity (EUI)	kWh/m²	Target

For the initial part of the study, an analysis of the data was made to grasp the general nature of the inputs in hand. The raw sensor data are recorded as power measurements (kW) at one-minute resolution. As the building is a University office building, the occupation profile is quite distinct; the building is extensively used during weekdays, sometimes on Saturdays and almost never on Sundays. A look at the weekday profiles shows specific behavioral patterns; weekdays exhibit clear peaks during typical working hours, while Saturday and Sunday profiles demonstrate reduced and more variable usage, particularly for plug loads and lighting. This separation offers a more detailed view of occupancy driven loads and supports the inclusion of temporal features in model development. Because annual EUI represents cumulative energy consumption, it may mask temporal variations in building performance, such as those associated with occupancy patterns or seasonal heating and cooling demands. As a result, complementary EUI metrics evaluated at shorter time scales—ranging from monthly and weekly to daily or hourly—are increasingly recognized as valuable for real-time energy performance assessment.⁶⁴ Therefore, for a better understanding of the utilization profile, the air conditioning (AC), lighting, plug power consumption values as well as the minute based incremental energy use intensity contributions illustrating diurnal electricity consumption patterns have been plotted for the equinox and solstice days of 2019 as representative days of seasonal profiles. The corresponding plots are given in Figures 1–4 for March 21^st, June 21^st, September 23^rd and December 20^th, respectively. As the 21^st of December was a Saturday, December 20^th, which is a workday, has been specifically plotted instead to be able to compare the seasonal changes more clearly. While the analysis included data from both workdays and weekends, only weekdays are presented in this manuscript due to space constraints.

Figure 1.

Power consumption for (a) AC; (b) lighting; (c) plug and; (d) incremental EUI on March 21^st, 2019.

Figure 2.

Power consumption for (a) AC; (b) lighting; (c) plug and; (d) incremental EUI on June 21^st, 2019.

Figure 3.

Power consumption for (a) AC; (b) lighting; (c) plug and; (d) incremental EUI on September 23^rd, 2019.

Figure 4.

Power consumption for (a) AC; (b) lighting; (c) plug and; (d) incremental EUI on December 20^th, 2019.

As can be seen from the figures, the three major end-use loads of the building have very distinct patterns that are quite different from each other, but similar in terms of the different dates of the year. The daily utilization appears to begin approximately around 8.00 AM, reaching full capacity during regular work hours, dropping down to nearly zero during lunchtime and ending around 17.00 PM on a regular basis when the AC consumption figures are taken into consideration. However, a look at the lighting consumption values shows that the building is actually occupied from much earlier hours of the day till the late hours of the evening, from almost 5.00 AM up till 21:00 PM. Although air conditioning, lighting, and plug loads exhibit distinct consumption profiles, each category demonstrates a consistent internal pattern across seasonally representative days throughout the year. This indicates that, despite their structural differences, each end use maintains seasonal consistency in its operation, while remaining clearly distinguishable from the others.

The highest power demand, as would be expected, is from the AC units, with Zone 1 having the largest contribution that reaches values of approximately 40 kW during the year, followed by Zones 2 and 4. The lowest power demand of AC appears to be at Zone 5, which is quite reasonable as this Zone is the smallest one with the least number of windows, meaning the least susceptibility to sunlight during the day as well. Unsurprisingly, the largest contributor for the lighting load is Zone 5, probably due to the same reason; the least number of windows and the least façade area translates into the least amount of daylight on the work plane, resulting in a higher utilization of lighting even though the Zonal area is smaller than the rest of the Zones of Floor 7. For lighting, Zone 4 closely follows Zone 5, then followed by Zone 2 and the lowest power demand is in Zone 1; the Zone with the largest façade area among all Zones. Finally for plug loads, it is difficult to make a clear distinction between the Zones as they are quite intertwined and show seasonal differences as it appears in Figures 1(c), 2(c), 3(c) and 4(c).

Given that AC systems constitute the largest portion of the total energy consumption, the EUI profile across the day exhibits a pattern that is strongly correlated with the AC load. An example of this can be seen for the values of March 21^st, where both the AC consumption profile and the corresponding minute based EUI values (Figure 1(a) and (d)) demonstrate parallel fluctuations, featuring similar highs and lows across the daily cycle. However, a closer examination of zonal differences shows that the lighting and plug loads also play a significant role in shaping the EUI through intensifying the amplitude values. For instance, while Figure 1(a) shows that Zone 1 has the highest total AC load on March 21^st, the EUI is highest in Zone 4 (Figure 1(d)), followed closely by Zone 5, the two zones which had the highest lighting load (Figure 1(b)). It’s not only the lighting load which contributes to the amplitude values; Figure 1(b) and (c) which present the lighting and plug load values on March 21^st, respectively, clearly indicate that Zones four and five exhibit markedly higher trends in these two categories on the same day. These findings underscore the need to account for diverse end use contributions when interpreting normalized energy metrics like EUI, particularly in mixed use or variably occupied buildings. These observations will further be taken into account in the machine learning stage, through the comparison of normalized and un-normalized data use.

The machine learning approach

In this study, the predictive capabilities of the Regression Ensemble Trees framework are explored through the application of the LSBoost algorithm for EUI forecasting. LSBoost has been particularly chosen due to its balance between predictive accuracy, interpretability, and computational efficiency, making it a practical choice for modeling EUI under real world data constraints. Three modeling strategies are considered: a baseline LSBoost model without hyperparameter optimization, an LSBoost model with automated hyperparameter tuning, and an LSBoost model with hyperparameter tuning using Bayesian techniques.

During the data preprocessing stage, which was carried out using MATLAB, the first step involved computing the EUI value. The original dataset included individual power consumption values for air conditioning, lighting, and plug loads; these were aggregated and normalized to generate the final EUI value used as the target variable. The original time series data were then decomposed into year, month, day, hour, and minute components. To preserve the cyclical nature of temporal features, these attributes were transformed using sine and cosine encoding, enabling the model to capture periodic consumption patterns effectively.

The final dataset was assembled by integrating time encoded variables, indoor sensor data, and external meteorological/irradiance inputs, followed by missing value imputation via nearest neighbor interpolation. While the data of Floor 7 had one of the lowest amounts of missing data, an intervention was still necessary to complete the dataset. This comprehensive preprocessing ensured consistency and readiness of the input data for subsequent training.

The regression model was trained using Least-Squares Boosting (LSBoost), an ensemble approach that integrates multiple shallow decision trees in sequence to gradually reduce squared prediction errors. To ensure strong generalization performance and minimize potential bias caused by data partitioning, a 10-fold cross-validation scheme was applied to the dataset with 432,501 observations and 51 predictors. During validation, zone-specific predictions were generated through fold-wise inference on unseen data, allowing for a comprehensive and reliable assessment of model accuracy. The generalization metrics, expressed as the Coefficient of Determination (R²), the Mean Squared Error (MSE) and the Mean Absolute Error (MAE) across learning cycles, were computed to ensure convergence. To ensure the reliability of the trained models, prediction performance plots were examined. These visual diagnostics showed a consistent alignment between predicted and observed values across all validation folds. Overall, the ensemble model demonstrated stable and accurate generalization performance, supporting its suitability for practical applications in energy forecasting of real-world buildings.

The data obtained from the CU-BEMS dataset were un-normalized values, directly collected from the sensors within the building. As the main machine learning method was decided to be gradient boosting regression trees for this study, the initial approach was to work with the un-normalized data as decision tree systems are generally considered robust models that typically do not require extensive normalization. However, after a careful analysis of this specific dataset, it has been seen that the predictors selected for the study exhibit significant differences in amplitude. As has been explained in detail, although the target variable, EUI, tends to follow a pattern similar to that of AC consumption, its overall amplitude is influenced collectively by all consumption predictors. Therefore, it was hypothesized that normalization might help ensure a balanced contribution from each predictor. Based on this reasoning, for each zone, three distinct models were developed using both un-normalized and normalized predictor values. To obtain the normalized values, min-max scaling was used on all numeric input features to ensure uniform feature ranges across the predictors. Each proposed model was trained and validated in MATLAB, using both un-normalized and normalized datasets to assess the impact of normalization on predictive accuracy and computational efficiency.

Results

Table 2 gives the R², MSE, MAE values as well as the duration of training for this study, where PN stands for Post-Normalization. Figure 5 on the other hand, shows the Observed vs. Predicted Values graphs for all Zones. As can be seen from the table, training times for Baseline Ensemble models were not recorded, as their computation completed almost instantaneously. Due to their minimal complexity and lack of optimization routines, the training durations were negligible and not deemed meaningful for comparative analysis.

Table 2.

R², MSE, MAE and duration values for all zones using baseline ensemble, auto hyperparameter optimization and bayesian optimization on un-normalized and normalized data.

Zone	Model	R²	R² (PN)	MSE	MSE (PN)	MAE	MAE (PN)	Time (s)	Time (PN) (s)
1	Baseline ensemble	0.95420	0.95390	0.0103000	0.0054700	0.047100	0.034400	—	—
	Auto optimization	0.99090	0.97570	0.0020500	0.0028800	0.017300	0.026400	24506.00	282.00
	Bayesian optimization	0.98890	0.99140	0.0025000	0.0010200	0.017000	0.011300	18339.00	1059.00
2	Baseline ensemble	0.96699	0.96699	0.0068700	0.0013300	0.041580	0.018310	—	—
	Auto optimization	0.99472	0.98589	0.0011000	0.0005700	0.013060	0.008430	1105.00	546.00
	Bayesian optimization	0.99474	0.96923	0.0010900	0.0012400	0.013310	0.015080	1182.00	478.00
4	Baseline ensemble	0.96040	0.96056	0.0100270	0.0016912	0.045625	0.018879	—	—
	Auto optimization	0.99121	0.99047	0.0022240	0.0004088	0.019773	0.008428	1121.86	1121.86
	Bayesian optimization	0.99187	0.99200	0.0020580	0.0003433	0.019180	0.007925	1093.00	1147.00
5	Baseline ensemble	0.89593	0.89576	0.0144490	0.0309710	0.064772	0.003300	—	—
	Auto optimization	0.95329	0.95240	0.0064855	0.0014894	0.037972	0.017992	24506.00	24506.00
	Bayesian optimization	0.95256	0.95366	0.0065863	0.0014501	0.038326	0.018008	1151.00	1143.00

Figure 5.

Observed vs. Predicted values graphs for all zones using baseline ensemble (BE), auto hyperparameter optimization (AutoHO) and bayesian optimization (BO) on un-normalized (Un ND) and normalized data (ND).

Across all zones, the Baseline Ensemble model showed minimal changes in R² after normalization, demonstrating the fundamental scale invariance of tree-based models. However, a consistent and substantial decrease in both MAE and MSE was observed, suggesting that normalization may reduce prediction noise without affecting overall fit. This trend was most notable in Zone 4 in terms of the MSE and MAE values, and in Zone 5 in terms of the MAE value. The advantage of normalization showed itself as improved prediction precision without added model complexity, while the limitation of the approach was that the model did not benefit from normalization in terms of variance explanation (R²).

While auto hyperparameter optimization significantly reduced computation time for normalized data in Zones 1 & 2 by smoothing the hyperparameter search space, it also led to a moderate drop in R² in all zones. This indicates that, although the optimizer achieved faster convergence due to the homogenized input scales, it occasionally settled on parameter sets that were not optimal for achieving the highest accuracy. On the other hand, normalized data led to dramatic reductions in MAE and MSE, especially in Zones 4 and 5. The advantages of normalization in auto hyperparameter optimization came out to be accelerated training time (approximately 6.8 h in un-normalized vs. similar or faster in normalized) and improved error metrics.

The final approach of Bayesian Optimization consistently achieved the best balance of high accuracy and low error, especially when trained on normalized data. It yielded the lowest MSE and MAE in all zones, and R² either remained high or improved slightly (e.g., Zone 5: R² increased from 0.9526 to 0.9537). For this method, normalization showed the advantage of superior predictive performance with stable or improved R², while it has the limitation of computational resources due to its training time periods. Nevertheless, this method seems to be the most optimal for applications demanding both accuracy and efficiency.

Discussion

An overall look at the findings show that normalization reduced MAE and MSE across all models, with the most significant improvements observed in Bayesian Optimization. This confirms its utility in stabilizing the learning process and enhancing model generalization. Among all models, Bayesian Optimization emerged as the most effective model across all zones and data types, offering a favorable tradeoff between accuracy, training time, and robustness. While normalization improved the computational efficiency of Auto Optimization, it sometimes led to reduced predictive accuracy. Adjusting parameter search bounds or applying adaptive rescaling techniques could help address this issue. Finally, Baseline Ensemble is a suitable low complexity alternative, especially where interpretability and moderate precision are sufficient enough. The findings suggest that in real-time or embedded systems (e.g., HVAC and lighting control, building automation), normalized data combined with Bayesian Optimization provides a robust solution, optimizing both performance and resource efficiency.

With the Baseline Ensemble model, as expected for a tree-based algorithm, R² remained nearly constant after normalization, underscoring their scale-invariance. However, notable improvements were observed in MSE and MAE. In Zone 4, the MSE value dropped down from 0.010027 to 0.0016912, almost a 6-fold change while the MAE value dropped down from 0.045625 to 0.018879. In Zone 5, on the other hand, MSE increased from 0.014449 to 0.030971 while MAE showed a clear drop from 0.064772 to 0.0033. Except for Zone 5, normalization proved beneficial across all other zones. This was an outcome that was somewhat unexpected theoretically, as normalization is typically considered unnecessary for tree-based models. These findings suggest that although decision trees are theoretically insensitive to the scale of input features, normalization can still enhance their performance by promoting more balanced decision boundaries and reducing the impact of extreme prediction errors.

With the second approach of Auto Hyperparameter Optimization, the normalization of the data reduced training times considerably by making it easier for the algorithm to navigate the range of possible hyperparameter values more effectively. While this led to faster convergence and significantly lower MSE/MAE with often more than 50 % of reduction in the obtained values (e.g. MSE for Zone 2, dropped down from 0.00110 to 0.00057; MAE for Zone 5 dropped down from 0.037972 to 0.017992), R² values also experienced a moderate decline in some zones. This tradeoff indicates that certain hyperparameter configurations optimized for un-normalized data may no longer be optimal when normalized data is utilized.

The last but not least approach, Bayesian Optimization, consistently outperformed the previous two, especially when trained on normalized data. It achieved the lowest MAE and MSE across all zones, with stable or slightly improved R² values (e.g., in Zone 5, R² increased from 0.9526 to 0.9537). While its training time was approximately 20 min per run, it remains acceptable for applications requiring both accuracy and robustness.

To summarize, among the three modeling strategies, Bayesian Optimization consistently delivered the most accurate and stable predictions across all zones and data configurations, with notably low MAE and MSE values. Its performance on normalized data confirms its suitability for applications requiring both high precision and acceptable computational cost. Auto hyperparameter Optimization, while beneficial in reducing training time, occasionally showed slight declines in R², indicating sensitivity to feature scaling in parameter selection. The Baseline Ensemble model, despite its simplicity, also proved to be a robust low-complexity option. The results underline that model selection should be guided by a balance between prediction performance, computational demand, and the operational needs of the application.

Conclusion

This study investigates the predictive performance of three Boosted Regression Tree models, Baseline Ensemble, Auto Hyperparameter Optimized, and Bayesian Optimized, for estimating EUI in four distinct building zones of a university building. The study contributes to the literature by highlighting the practical benefits of Bayesian optimization in improving EUI prediction accuracy, a technique that has so far seen limited application in energy modeling frameworks. The models were tested using both un-normalized and min-max normalized datasets, allowing for a comparative analysis of data preprocessing strategies and modeling approaches in terms of accuracy, error reduction, and computational efficiency. Across all zones and models, normalization consistently resulted in substantial improvements in error metrics such as MAE and MSE, while changes in R² values remained minimal. This indicates that while normalization does not always affect the proportion of explained variance, it significantly enhances the precision of predictions. These findings directly respond to the two research questions posed in the introduction, confirming the value of Bayesian optimization and normalization in improving EUI prediction performance.

Looking at the results of the study, several recommendations and contributions can be listed:

(1) To the best of our knowledge, this study represents a unique example of estimating EUI using a boosted regression tree approach, a technique that has rarely been applied in this context.

(2) Normalization is recommended when feature scales differ by a factor of 2-3 or more, especially when input features include both continuous data and time-based features (e.g., sine-cosine encoding).

(3) For real time systems such as AC or lighting control, the use of Bayesian Optimization on normalized data presents the best balance between predictive performance and computational demand.

(4) Auto Optimization offers a valuable middle ground but requires careful consideration of hyperparameter sensitivity post normalization, particularly for high dimensional datasets or when computational resources are limited.

(5) Baseline Ensemble is an effective low complexity model, suitable for interpretable, lightweight applications, though it may underperform in tasks if detailed accuracy is needed.

To conclude, normalization should not be treated solely as a preprocessing step but as a structural design choice influencing model performance. Its interaction with algorithm behavior is complex, particularly for models sensitive to hyperparameter settings. As such, data normalization, model architecture, and optimization strategies must be considered jointly in EUI forecasting frameworks. This integrative approach establishes the foundation for more reliable energy modeling systems, with direct implications for smart building operations, energy management platforms, and sustainability driven planning. The proposed LSBoost-Bayesian method presents a practical and adaptable approach for improving EUI predictions based on real world building data.

In addition to its methodological contributions, the study also highlights several ways in which these models can be applied in real-world building management scenarios. The proposed ML framework offers building managers a practical, data-driven alternative to conventional energy modeling methods. Unlike traditional physics-based models, which often require detailed knowledge of building materials, systems, and occupancy schedules, the LSBoost models developed in this study operate directly on existing sensor data and external weather inputs, resources typically available through modern Building Management System (BMS) infrastructures.

For example, the Bayesian-optimized LSBoost model can be integrated into real-time energy dashboards to forecast daily or hourly EUI trends, enabling facility managers to detect anomalies such as unexpected energy spikes due to faulty equipment or irregular occupancy patterns. Zone based EUI predictions also help identify areas of high energy intensity, supporting targeted actions like schedule adjustments, occupancy-aware controls, or maintenance planning. Additionally, the framework allows for faster performance analysis, which is particularly valuable in operational settings that require timely diagnostics and decision making. While not a replacement for physics-based simulation tools, the proposed models can serve as complementary tools by offering empirical baselines that help prioritize deeper investigations or control interventions.

Footnotes

Lale Erdem Atılgan

Duygu Bayram Kara

Ethical considerations

Ethical approval is not required for this manuscript.

Author contributions

Lale Erdem Atılgan: Conceptualization, Methodology, Writing - Original Draft, Visualization, Supervision.

Duygu Bayram Kara: Methodology, Software, Writing - Original Draft.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

This study used the CU-BEMS dataset, which is publicly available at https://sgrudata.github.io/. Additional weather data used in this study were sourced from the Reliable Prognosis website and are available at https://rp5.ru/Weather_archive_in_Bangkok,_Suvarnabhumi_(airport) and solar irradiance data from the Joint Research Center and available at .

References

International Energy Agency . Developing a global energy efficiency workforce in the buildings sector. IEA, 2024. https://www.iea.org/reports/developing-a-global-energy-efficiency-workforce-in-the-buildings-sector.

Park

Yoon

, et al. Analysis of a building energy efficiency certification system in Korea. Sustainability 2015; 7: 16086–16107. https://doi.org/10.3390/su71215804

United Nations Department of Economic and Social Affairs . Sustainable development goals. Sustainable Development Goals. https://sdgs.un.org/goals.

Taheri

Norouzijajarm

Athar

. Technical, economic, and environmental assessment of energy audit and optimization in the mechanical room of an office building. Energy Sustain Dev 2025; 87: 101717. https://doi.org/10.1016/j.esd.2025.101717

Olu-Ajayi

Alaka

Sulaimon

, et al. Building energy consumption prediction for residential buildings using deep learning and other machine learning techniques. J Build Eng 2022; 45: 103406. https://doi.org/10.1016/j.jobe.2021.103406

Ali

Bano

Shamsi

, et al. Urban building energy performance prediction and retrofit analysis using data-driven machine learning approach. Energy Build 2024; 303: 113768. https://doi.org/10.1016/j.enbuild.2023.113768

Richman

. A high level method to disaggregate electricity for cluster-metered buildings. Energy Build 2016; 111: 351–368. https://doi.org/10.1016/j.enbuild.2015.11.019

Samadi

Fattahi

. Energy use intensity disaggregation in institutional buildings – a data analytics approach. Energy Build 2021; 235: 110730. https://doi.org/10.1016/j.enbuild.2021.110730

Jayakeerti

Nakkeeran

Aravindh

, et al. Predicting an energy use intensity and cost of residential energy-efficient buildings using various parameters: ANN analysis. Asian J Civ Eng 2023; 24: 3345–3361. https://doi.org/10.1007/s42107-023-00717-y

10.

Feng

Chen

Yang

, et al. The impact of building morphology on energy use intensity of high-rise residential clusters: a case study of Hangzhou, China. Build 2024; 14: 2245. https://doi.org/10.3390/buildings14072245

11.

Lee

Lim

Hwang

, et al. Development of building benchmarking index for improving gross-floor-area-based energy use intensity. Energy Build 2025; 328: 115103. https://doi.org/10.1016/j.enbuild.2024.115103

12.

Veiga

Veloso

Melo

, et al. Application of machine learning to estimate building energy use intensities. Energy Build 2021; 249: 111219. https://doi.org/10.1016/j.enbuild.2021.111219

13.

Chaturvedi

Kumar

Toftum

, et al. Exploring influential parameters affecting residential building energy use: advancing energy efficiency through machine learning. Clean Technol Environ Policy 2025; 27: 6975–6996. https://doi.org/10.1007/s10098-025-03301-x

14.

Hitchin

. Heating and cooling load frequency distributions implied by monthly building energy use models. Build Serv Eng Res Technol 2012; 35: 53–68. https://doi.org/10.1177/0143624412467195

15.

Durmayaz

Kadoǧlu

Şen

. An application of the degree-hours method to estimate the residential heating energy requirement and fuel consumption in Istanbul. Energy. 2000; 25: 1245–1256. https://doi.org/10.1016/s0360-5442(00)00040-2

16.

Meng

Mourshed

Wei

. Going beyond the mean: distributional degree-day base temperatures for building energy analytics using change point quantile regression. IEEE Access 2018; 6: 39532–39540. https://doi.org/10.1109/access.2018.2852478

17.

Lindelöf

. Bayesian estimation of a building’s base temperature for the calculation of heating degree-days. Energy Build 2017; 134: 154–161. https://doi.org/10.1016/j.enbuild.2016.10.038

18.

Papadopoulos

Kontokosta

. Grading buildings on energy performance using city benchmarking data. Appl Energy 2019; 233–234: 244–253. https://doi.org/10.1016/j.apenergy.2018.10.053

19.

Delzendeh

Lee

, et al. The impact of occupants’ behaviours on building energy analysis: a research review. Renew Sustain Energy Rev 2017; 80: 1061–1071. https://doi.org/10.1016/j.rser.2017.05.264

20.

Zhou

Yan

Hong

, et al. Data analysis and stochastic modeling of lighting energy use in large office buildings in China. Energy Build 2015; 86: 275–287. https://doi.org/10.1016/j.enbuild.2014.09.071

21.

. HVAC terminal hourly end-use disaggregation in commercial buildings with fourier series model. Energy Build 2015; 97: 33–46. https://doi.org/10.1016/j.enbuild.2015.03.048

22.

Shuangyu

Wenbin

Yupeng

, et al. The impact of deep learning–based equipment usage detection on building energy demand estimation. Build Serv Eng Res Technol. 2021; 42: 545–557. https://doi.org/10.1177/01436244211034737

23.

Gandhi

Brager

. Commercial office plug load energy consumption trends and the role of occupant behavior. Energy Build 2016; 125: 1–8. https://doi.org/10.1016/j.enbuild.2016.04.057

24.

Elhadad

Orban

. A sensitivity analysis for thermal performance of building envelope design parameters. Sustainability 2021; 13: 14018. https://doi.org/10.3390/su132414018

25.

Tian

. Towards advanced uncertainty and sensitivity analysis of building energy performance using machine learning techniques. J Build Perform Simul 2024; 17: 655–662. https://doi.org/10.1080/19401493.2024.2387071

26.

International Partnership for Energy Efficiency Cooperation (IPEEC) Building Energy Efficiency Taskgroup . The Building Energy Performance Gap: An International Review. OECD/IPEEC, 2019. https://www.energy.gov.au/sites/default/files/the_building_energy_performance_gap-an_international_review-december_2019.pdf.

27.

Han

Wang

Zhang

. An approach to data acquisition for urban building energy modeling using a gaussian mixture model and expectation-maximization algorithm. Buildings; 11: 30. https://doi.org/10.3390/buildings11010030

28.

Mariam

Ahmad

Shimaa

, et al. Empowering smart cities with digital twins of buildings: applications and implementation considerations of data-driven energy modelling in building management. Build Serv Eng Res Technol. 2024; 45: 475–498. https://doi.org/10.1177/01436244241239290

29.

Wei

H-L

. Modelling short-term appliance energy use with interpretable machine learning: a system identification approach. Arabian J Sci Eng 2023; 48: 15667–15678. https://doi.org/10.1007/s13369-023-08084-1

30.

Ciampi

Rega

Diallo

TML

, et al. Energy consumption prediction of industrial HVAC systems using Bayesian Networks. Energy Build. 2024; 309: 114039. https://doi.org/10.1016/j.enbuild.2024.114039

31.

Mat Daut

Hassan

Abdullah

, et al. Building electrical energy consumption forecasting analysis using conventional and artificial intelligence methods: a review. Renew Sustain Energy Rev 2017; 70: 1108–1118. https://doi.org/10.1016/j.rser.2016.12.015

32.

Kim

Suh

. Heating and cooling energy consumption prediction model for high-rise apartment buildings considering design parameters. Energy Sustain Dev 2021; 61: 1–14. https://doi.org/10.1016/j.esd.2021.01.001

33.

Faiz

Sajid

Ali

, et al. Energy modeling and predictive control of environmental quality for building energy management using machine learning. Energy Sustain Dev 2023; 74: 381–395. https://doi.org/10.1016/j.esd.2023.04.017

34.

Ngo

Pham

Truong

TTH

, et al. An ensemble machine learning model for enhancing the prediction accuracy of energy consumption in buildings. Arabian J Sci Eng. 2022; 47: 4105–4117. https://doi.org/10.1007/s13369-021-05927-7

35.

Kayacı Çodur

. Ensemble machine learning approaches for prediction of Türkiye’s energy demand. Energies 2022; 17: 410. https://doi.org/10.3390/en17010074

36.

Pipattanasomporn

Chitalia

Songsiri

, et al. CU-BEMS, smart building electricity consumption and indoor environmental sensor datasets. Sci Data. 2020; 7(241): 1–14. https://doi.org/10.1038/s41597-020-00582-3

37.

Wang

Srinivasan

. A novel ensemble learning approach to support building energy use prediction. Energy Build 2018; 159: 109–122. https://doi.org/10.1016/j.enbuild.2017.10.085

38.

Amasyali

El-Gohary

. A review of data-driven building energy consumption prediction studies. Renew Sustain Energy Rev 2018; 81: 1192–1205. https://doi.org/10.1016/j.rser.2017.04.095

39.

Pan

Zhang

. Data-driven estimation of building energy consumption with multi-source heterogeneous data. Appl Energy 2020; 268: 114965. https://doi.org/10.1016/j.apenergy.2020.114965

40.

Ali

Shamsi

Hoare

, et al. Review of urban building energy modeling (UBEM) approaches, methods and tools using qualitative and quantitative analysis. Energy Build 2021; 246: 111073. https://doi.org/10.1016/j.enbuild.2021.111073

41.

Sun

Haghighat

Fung

BCM

. A review of the-state-of-the-art in data-driven approaches for building energy prediction. Energy Build 2020; 221: 110022. https://doi.org/10.1016/j.enbuild.2020.110022

42.

Chen

Guo

Chen

, et al. Physical energy and data-driven models in building energy prediction: a review. Energy Rep 2022; 8: 2656–2671. https://doi.org/10.1016/j.egyr.2022.01.162

43.

Chakraborty

Elzarka

. Advanced machine learning techniques for building performance simulation: a comparative analysis. J Build Perform Simul 2019; 12: 193–207. https://doi.org/10.1080/19401493.2018.1498538

44.

Qureshi

Arbab

Rehman

. Deep learning-based forecasting of electricity consumption. Sci Rep 2024; 14: 6489. https://doi.org/10.1038/s41598-024-56602-4

45.

Ramos

PVB

Villela

Silva

, et al. Residential energy consumption forecasting using deep learning models. Appl Energy 2023; 350: 121705. https://doi.org/10.1016/j.apenergy.2023.121705

46.

Mawson

Hughes

. Deep learning techniques for energy forecasting and condition monitoring in the manufacturing sector. Energy Build 2020; 217: 109966. https://doi.org/10.1016/j.enbuild.2020.109966

47.

Ucar

Kaygusuz

. Short-term energy consumption forecasting analysis using different optimization and activation functions with deep learning models. Applied Sciences 2025; 15: 6839. https://doi.org/10.3390/app15126839

48.

Gao

Zhang

Wang

, et al. Research on building energy consumption prediction based on hybrid GRU neural network. Arabian J Sci Eng. 2025; online first. https://doi.org/10.1007/s13369-025-10151-8

49.

Ali

Jayaraman

Mayyas

, et al. Machine learning as a surrogate to building performance simulation: predicting energy consumption under different operational settings. Energy Build 2023; 286: 112940. https://doi.org/10.1016/j.enbuild.2023.112940

50.

Moon

Maqsood

, et al. Advancing ensemble learning techniques for residential building electricity consumption forecasting: insight from explainable artificial intelligence. PLoS One 2024; 19: e0307654. https://doi.org/10.1371/journal.pone.0307654

51.

Dietterich

TG.

Ensemble Methods in Machine Learning. In: Multiple Classifier Systems. MCS 2000. Lecture Notes in Computer Science, 2000; 1857. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45014-9_1

52.

Zhou

Z-H

. Ensemble methods: Foundations and Algorithms. 2nd ed.. CRC Press, 2025. https://doi.org/10.1201/9781003587774

53.

Wang

Hong

Huang

, et al. A comprehensive review and future research directions of ensemble learning models for predicting building energy consumption. Energy Build 2025; 335: 115589. https://doi.org/10.1016/j.enbuild.2025.115589

54.

Ardabili

Abdolalizadeh

Makó

, et al. Systematic review of deep learning and machine learning for building energy. Front Energy Res 2022; 10: 786027. https://doi.org/10.3389/fenrg.2022.786027

55.

Wang

Srinivasan

. A review of artificial intelligence based building energy use prediction: contrasting the capabilities of single and ensemble prediction models. Renew Sustain Energy Rev 2017; 75: 796–808. https://doi.org/10.1016/j.rser.2016.10.079

56.

Papadopoulos

Azar

Woon

W-L

, et al. Evaluation of tree-based ensemble learning algorithms for building energy performance estimation. J Build Perform Simul 2018; 11: 322–332. https://doi.org/10.1080/19401493.2017.1354919

57.

Huang

Zuo

Sohn

. A bayesian network model for predicting cooling load of commercial buildings. Build Simulat 2018; 11: 87–101. https://doi.org/10.1007/s12273-017-0382-z

58.

Soares Geraldi

Bavaresco

Ghisi

. Bayesian Network for Predicting Energy Consumption in Schools in Florianópolis – Brazil. Proceedings of the 16th IBPSA International Conference, Rome, Italy, Sept. 2-4, 2019.

59.

O’Neill

. Development of a probabilistic graphical model for predicting building energy performance. Appl Energy 2016; 164: 650–658. https://doi.org/10.1016/j.apenergy.2015.12.015

60.

Pipattanasomporn

Chitalia

Songsiri

, et al. CU-BEMS, smart building electricity consumption and indoor environmental sensor datasets. Sci Data 2020; 7: 241. https://doi.org/10.1038/s41597-020-00582-3

61.

Chulalongkorn University – Faculty of Engineering . CU-BEMS: chulalongkorn university building. Energy Management System 2024. https://ee.eng.chula.ac.th/cu-bems/

62.

Reliable prognosis . https://rp5.ru/Weather_archive_in_Bangkok,_Suvarnabhumi_(airport),_METAR

63.

Joint Research Centre . European Commission. Photovoltaic Geographical Information System (PVGIS) 2024. https://ec.europa.eu/jrc/en/pvgis

64.

Melo

Carrilho da Graça

Oliveira Panão

MJN

. A review of annual, monthly, and hourly electricity use in buildings. Energy Build 2023; 293: 113201. https://doi.org/10.1016/j.enbuild.2023.113201