Sage Journals: Discover world-class research

Abstract

Cooling systems provide a safe thermal environment for the reliable operation of IT equipment in data centers (DCs) while generating significant energy consumption. Therefore, to achieve energy savings in cooling system control under dynamic thermal distribution in DCs, this paper proposes a multi-setpoint cooling control approach based on deep reinforcement learning (DRL). Firstly, a thermal model based on the XGBoost algorithm is constructed to precisely evaluate the thermal distribution in the rack room to guide real-time cooling control. Secondly, a multi-set point cooling control approach based on the deep Q-network algorithm (DQN-MSP) is designed to finely regulate the supply air temperature of each air conditioner by capturing the thermal fluctuations to ensure the dynamic balance of cooling supply and demand. Finally, we adopt the extended CloudSimPy simulation tool and the real workload trace of the PlanetLab system to evaluate the effectiveness and performance of the proposed approach. The simulation results show that the proposed control solution effectively reduces the cooling energy consumption by over 2.4% by raising the average air supply temperature of the air conditioner while satisfying the thermal constraints.

Keywords

Data center deep reinforcement learning thermal modeling cooling control energy saving

Introduction

With the widespread use of emerging applications such as natural language processing (NLP), image recognition, and 5G communications, the computing power demand for information systems in various industries is also increasing.¹ DCs provide resource services such as data computing, storage, and networking for information systems and have become one of the critical infrastructures in the Internet era.² Subsequently, the severe energy consumption and carbon emission problems brought about by the large-scale construction of data centers have attracted widespread attention from the community. As early as 2020, the top international journal Science reported that the current annual total energy consumption of global cloud data centers is about 205 terawatts, accounting for about 1% of the world’s entire power generation, and will maintain steady growth in the next few years.³ In addition, with the proposal of the strategic goal of Carbon Peak and Carbon Neutrality, promoting the construction of green DCs plays an essential role in realizing the dual carbon goal.⁴

DCs generally include core components such as IT, cooling, power supply, distribution, and lighting systems, whose energy consumption is distributed as shown in Figure 1. The cooling system is one of the core components indispensable for the stable operation of data centers, providing a thermal environment for the safe operation of IT equipment in the rack room. Still, its energy consumption accounts for up to 30%–40%.⁵ Therefore, managers must adopt more effective energy-saving technologies to reduce cooling system energy consumption and enhance the overall energy efficiency of DCs.

Figure 1.

Energy consumption distribution of data centers.

One of the keys to efficient cooling management in DC is the rapid and accurate prediction of the thermal distribution in the rack room, which in turn maintains a dynamic balance between heat dissipation demand and cooling supply.⁶ However, the complex infrastructure layout, dynamically changing IT thermal load, and airflow in the rack room pose a significant challenge to constructing temperature models. Traditional approaches for thermal modeling of data centers include computational fluid dynamics (CFD) models and simplified physical models.⁷ CFD-based temperature models can accurately simulate and evaluate the thermal distribution in the rack room. However, this method has high computational overhead and a complex modeling process, which is unsuitable for real-time thermal management. Additionally, the simplified physics-based model considers the heat transfer fluid and thermodynamic principles and can complete temperature prediction quickly but performs poorly in terms of accuracy.⁸ With the development of machine learning (ML) and Internet of Things (IoT) technologies, data center thermal modeling methods have gradually evolved from traditional CFD simulation modeling and simplified physical modeling to data-driven thermal modeling methods.⁹ In particular, artificial neural networks (ANN), with their powerful nonlinear fitting capability, are also gradually being widely used for thermal modeling data centers.^10–12 The work¹³ developed a variety of thermal prediction models based on machine learning models, including ANN, gaussian process regression (GPR) models, and linear regression (LR) models. The authors used these thermal models to guide task placement and scheduling on a heterogeneous system, saving 17% in cooling power consumption while maintaining the quality of service (QoS). In summary, data-driven thermal models provide a better trade-off between modeling complexity, accuracy, and computational expense. They are better suited to guide cooling control in data centers’ complex thermal environments.

DRL combines the strengths of deep learning and reinforcement learning algorithms to solve complex decision-making tasks. Since DeepMind applied the DQN algorithm to the Atari game and beat the human player,¹⁴ improved and novel DRL algorithms continue to be developed. For example, deep deterministic policy gradient (DDPG)¹⁵ and proximal policy optimization (PPO)¹⁶ based on actor-critic structure have been proposed to make the DRL algorithms more stable and converge faster. Furthermore, a multi-agent deep deterministic policy gradient (MADDPG) is designed to solve the multi-intelligence collaboration and competition problem.¹⁷ These research advances have led to a wide range of applications of DRL in various fields, including gaming, robot control, autonomous driving, and financial trading. As DRL performs well in solving control problems of complex systems, it has been gradually adopted to solve DCs cooling control problems in recent years.^18,19 For example, the work¹⁸ uses a reinforcement learning algorithm to control fan speed and cooling water velocity in the rack room's air handling unit (AHU). Furthermore, the work¹⁹ uses a large amount of monitoring data (IT load and weather information) in the data center to train the DRL-based agent to learn the control strategy of the water chilling unit. All these efforts have achieved good energy savings, but most existing cooling control schemes use a coarse-grained control strategy to set the global cooling parameters. This strategy usually sets the cooling supply based on the peak temperature to avoid hot spots, thus leading to some racks being over-cooled. In addition, the thermal impact of each air conditioner on different racks depends on the relative position, blast airspeed, and floor ventilation rate.¹⁹ Therefore, this coarse-grained control strategy creates a mismatch between local cooling supply and demand. It is challenging to achieve fine-grained thermal management, resulting in low cooling energy efficiency.

To address this issue, this work proposes a cooling control method based on DRL to address the dynamic control of cooling systems in complex thermal environments. The main contributions are as follows.

We compare the thermal prediction performance of five ML models and select the best-performing XGBoost-based temperature prediction model to predict the thermal distribution of the rack room for guiding real-time cooling control.

We design a DQN-based multi-setpoint cooling control method to solve the control problem of multiple CRACs in DCs and effectively reduce cooling energy consumption.

Extensive simulation experiments verify that the proposed DQN-based cooling control model can capture the thermal load changes of each region in the rack room and adjust the air supply temperature of each air conditioner in a fine-grained manner to improve the cooling efficiency.

The remaining parts of this paper are structured as follows. The first section describes the related work of thermal modeling and cooling system control. The second section mainly introduces the system model and optimization objectives. The third section presents the design of state space, action space, and reward function of the cooling control algorithm based on DQN. The fourth section shows the simulation experiment and result analysis; finally, the conclusions and outlook.

Related works

Data center thermal modeling

The temperature distribution in data centers exhibits complex and variable dynamic characteristics due to the dynamic load of IT equipment, airflow circulation, and building layout. Existing thermal modeling of data centers can be classified into simplified physical models,²⁰ computational fluid dynamics (CFD) based models,²¹ and data-driven models.⁸

The most classical simplified physical model is an abstract thermal recirculation model for air-cooled data centers proposed in work.²⁰ This model constructs a thermal disturbance matrix to represent the weights of the interactions between multiple nodes in the rack room. However, this thermal model can only evaluate steady-state temperature profiles and ignores the time-varying nature of temperature. Subsequent work²² improved the thermal recirculation model by considering the temperature variation relationship with time and constructing a transient temperature model to predict the temperature distribution after the next time step. In addition, CFD-based simulation modeling approaches have been extensively applied for the thermal modeling of data centers. The work²³ provides an extensive survey and analysis of studies related to CFD/HT-based thermal modeling methods. CFD-based thermal models can generally offer a complete and accurate thermal field. Still, the complex modeling process and huge computational overhead make it challenging to be used for real-time temperature assessment and cooling control in data centers. Therefore, data-driven models have been developed as a prospective thermal modeling method and are broadly adopted for the thermal management of data centers. The work²⁴ constructs a thermal model based on proper orthogonal decomposition (POD) to fit the complex nonlinear relationship between the thermal load of nodes and the inlet temperature. Furthermore, work¹² developed an ANN-based thermal model for data centers to predict temperature and airflow distribution in rack rooms. The prediction results of this model are in high agreement with the CFD simulation results, which further validate that the neural network-based thermal model has acceptable performance and is suitable for real-time management of cooling systems. Moreover, the latest research work¹⁰ demonstrates the applicability of the data-driven thermal model for guiding the thermal management of data centers through numerous simulation experiments.

Cooling control system

Most existing works on data center cooling control optimize the cooling parameters from the holistic control level. For example, the work¹⁸ proposed a model-based reinforcement learning algorithm to regulate data center cooling parameters. More specifically, the temperature and airflow rate in the rack room is regulated by controlling the air handling unit’s fan speed and chilled water flow rate. Moreover, the work¹⁹ formulated the cooling control strategy as an energy minimization problem with thermal constraints. Subsequently, an energy-aware cooling control algorithm (CCA) based on the Actor-Critic framework is proposed to solve it. Specifically, CCA is an end-to-end offline algorithm that can utilize historical monitoring data from data centers to train DRL-based agents to learn and improve chiller control policies. Furthermore, the work²⁵ constructs a DRL agent with constraints to learn the control strategy for air supply temperature and air speed of air conditioners, considering the thermal constraints of temperature and relative humidity (RH) of the data center. This strategy adaptively controls the amount of return hot air mixed with fresh outside air to minimize cooling energy consumption. Although these global cooling control methods can reduce cooling energy consumption while satisfying thermal constraints, there are some limitations. First, this coarse-grained control method requires large-scale adjustment of equipment parameters, which may increase the complexity of the cooling control problem. Secondly, the overall control method focuses more on the global temperature, which leads to difficulties in accurately regulating the temperature in the local area. Therefore, this work proposes a DQN-based cooling control method to achieve fine-grained thermal management and improve cooling energy efficiency in DCs.

System modeling and optimization goals

The cooling control model proposed in this work (Figure 2) mainly includes a cloud environment and DRL agent. The core components of an air-cooled data center include IT equipment that provides computing services and a cooling system that protects the thermal environment of the computer room. We model the key components of the data center, including the power model of the IT equipment, the cooling system, and the temperature model of the rack room, to build a high-fidelity simulation environment. Assume that the IT system of a data center consists of N racks with servers, which can be represented as Racks = {Rack₁,…, Rack_n}, where 1 ≤ n ≤ N. Moreover, the cooling system consists of M CRACs, which can be expressed as CRACs = {CRAC₁,…, CRAC_m}, where 1 ≤ m ≤ M. The DRL-based cooling controller interacts with the cloud environment through operational actions, obtains feedback rewards and updated state space, and continuously explores control optimization strategies by trial and error.^26–28

Figure 2.

Data center cooling control model.

Data-driven thermal models

The data-driven model analyzes system data to capture complex correlations between input-output variables from massive amounts of data. The modeling approach takes data as the central basis without focusing too much on the physical meaning behind the relationship.⁹ A flowchart of the data-driven thermal modeling approach is shown in Figure 3. First, multiple sensor data (temperature, humidity, pressure, flow rate) under natural operating conditions or simulation data from CFD models are collected to construct the dataset. Subsequently, the data set is used to train the data-driven-based thermal model until it meets the prediction accuracy requirements. Finally, real-time temperature and airflow prediction is performed based on the trained thermal model.

Figure 3.

Flowchart for data-driven thermal modelling.

CFD model

This work takes 6SigmDCX commercial simulation software²⁹ to construct a CFD model of a rack room (Figure 4), which is parameterized according to the IT equipment parameters and building layout of a small data center. The rack room is equipped with an air conditioning system that sends cold air from the raised floor and returns hot air to the room. Four rows of racks are symmetrically arranged in the rack room, with 20 racks available for servers. An open cold channel is installed in the middle of each rack row, and four precision air conditioners are used to provide cooling capacity. The power density of servers in the rack room is set to 1.5 kW/m² according to the recommendations of ASHRAE.³⁰ Moreover, three sensors were deployed at different height positions in each rack inlet to collect temperature data, as shown in Figure 5. The specific hardware configuration and building parameters are shown in Table 1.

Figure 4.

CFD model of data center.

Figure 5.

Schematic of temperature sensor deployment.

Table 1.

Parameters of CFD model.

Parameters	Values	Parameters	Values
Room scale (m)	107.54.5	Number of PDU	2
Floor height (m)	0.65	Air supply volume of CRACs (m³/s)	9
Number of racks	20	Floor ventilation rate (%)	50%
Number of servers	240	Number of grid	9.8*10⁵
Number of CRACs	4	Turbulence model	K-ε

Data driven thermal modeling

The thermal distribution in the data center depends on the cooling system supply and the heat generation of the IT equipment. The experiments in this work use the supply air temperature T_sup, the blower air speed FS_crac of the CRAC, and the running power P_rack of racks as independent variables, and the rack inlet temperature T_{Rack_inlet} as the dependent variable. Thus, the data-driven thermal model can be expressed as,

T_{Rack_inlet} = f (T_{\sup}, F S_{crac}, P_{rack}),

(1)

where f represents the non-linear relationship between the input-output variables and is commonly fitted by regression models. Furthermore, to achieve efficient sampling, the latin hypercube sampling (LHS) method³¹ is adopted to ensure that the independent variables are uniformly distributed in the multidimensional parameter space. Based on this method, 1000 sets of parameters are randomly generated, and each set includes 28 parameters, including the air supply temperature T_sup (18°C–30°C), airspeed FS_crac (20%–100%) and operating power consumption P_rack (6–10 kW). Finally, CFD models were used to carry out multiple numerical calculations, export the simulation data and perform data pre-processing to form the dataset. The dataset consists of 1000 samples denoted as (T_sup, FS_crac, P_rack, $T_{Rack_inlet}$ ) and is split into training and test sets in a ratio of 8:2.

Model training and performance

To validate and compare the performance of various existing data-driven models for data center temperature prediction, six widely used ML models were selected for the experiments, namely, the lasso regression model, random forest regression model, support vector regression (SVR) model, gradient boosting model (XGBoost)³² and artificial neural network (ANN). Subsequently, the constructed dataset was sliced into training and test sets in an 8:2 ratio to train and test the performance of each model. The simulation experiments used mean absolute percentage error (MAPE) and root mean square error (RMSE), and coefficient of determination (R²) as the prediction error metrics for evaluating the models, which was defined as,

MAPE = \frac{1}{n} \sum_{i = 1}^{n} | \frac{{\hat{T}}_{i} - T_{i}}{T_{i}} | \times 100 %,

(2)

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {({\hat{T}}_{i} - T_{i})}^{2},}

(3)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(T_{i} - \hat{T})}^{2}}{\sum_{i = 1}^{n} {(T_{i} - \bar{T})}^{2}},

(4)

where ${\hat{T}}_{i}$ is the predicted value, $T_{i}$ is the observed value, and n denotes the sample size. Besides, the 10-Fold cross-validation method is used in the training process to avoid model over-fitting. Table 1 shows that the MAPE and RMSE values of the XGBoost model are 2.73% and 1.0°C, respectively, which are significantly smaller than other models and can meet the demand for prediction accuracy. Moreover, R² equals 0.87, closest to 1.0, indicating a high degree of model fit (Table 2). Therefore, this work will use the XGBoost-based temperature prediction model to guide the cooling control of DCs. Note that the hyper-parameters of the ML models are shown in Appendix Table A1.

Table 2.

Prediction error metrics of different models.

Model	MAPE(%)	RMSE(°C)	R ²
Lasso	4.5	1.46	0.64
Random Forest	3.7	1.24	0.77
SVR	3.41	1.23	0.76
XGBoost	2.73	1.01	0.87
ANN	3.2	1.19	0.83

Power model

Computing system

Numerous servers with different hardware configurations are deployed in the rack room. The power consumption Pⁱ(t) of server i at moment t can be divided into static power consumption P_static(t) and dynamic power consumption Pⁱ_dynamic(t). Static power consumption P_static(t) is the base power consumption of the server under no load, which is usually a constant. Moreover, there is a complex relationship between the dynamic power consumption P_dynamic(t) and the computational resource utilization U(t) of the server. The work³³ states that there exists an optimal computational resource utilization U_opt (close to 70%) for most servers. When U(t) ≤ U_opt, the dynamic power consumption P_dynamic(t) grows linearly with the computational resource utilization U(t). Conversely, when U(t) ≥ U_opt, P_dynamic(t) grows nonlinearly and rapidly with U(t). Thus, the dynamic power consumption P_dynamic(t) can be expressed as,

P_{dynamic} (t) = {\begin{matrix} α * U (t), (U (t) \leq U_{opt}) \\ α * U (t) + β * {(U (t) - U_{opt})}^{2}, (U (t) > U_{opt}) \end{matrix},

(5)

where the constant coefficients α, β are set to 0.5, 10, respectively, and the optimal utilization U_opt is set to 0.7. The total energy consumption of the IT system at time t, P_IT(t), is the sum of the power consumption of all servers in the rack room, denoted as,

P_{IT} (t) = \sum_{i}^{I} P^{i} (t),

(6)

Cooling system

The computer room air conditioner (CRAC) is the primary energy-consuming device of the cooling system. It occupies most of the cooling overhead, so this work considers it the main optimization target for energy saving. The cooling efficiency of CRAC can be measured by calculating the energy consumption ratio between the system and the cooling system, called the coefficient of performance (CoP). The higher the CoP value, the higher the cooling efficiency, which can be expressed as,

CoP = \frac{P_{IT}}{P_{cooling}},

(7)

where P_IT, P_cooling denotes the total power consumption of IT system and cooling system, respectively. Additionally, the study³⁴ showed that CoP is positively correlated with the cold air supply temperature T_sup. Here, the CoP measured by HP labs is

CoP (T_{\sup}) = 0.0068 * {T_{\sup}}^{2} + 0.0008 * T_{\sup} + 0.458 .

(8)

From equation (6), it can be seen that increasing the cooling supply temperature T_sup of the CRAC can improve the cooling system’s efficiency.

Optimization objective

The optimization objective of the cooling management problem studied in this paper is to minimize the total energy consumption of the data center while satisfying the thermal constraints. Therefore, the optimization objective is defined as,

Minimize P_{tomtal} = P_{IT} + P_{cooling} = (1 + \frac{1}{Co P_{(T_{\sup})}}) * P_{IT},

(9)

s . t . U^{i} (t) \leq U_{\max},

(10)

T_{Inlet}^{i} \leq T_{redline},

(11)

T_{\sup_\min} \leq T_{\sup} \leq T_{\sup_\max} .

(12)

where constraint (10) indicates that the computational resource utilization of the server cannot exceed the maximum utilization; constraint (11) indicates that the inlet temperature of the server is guaranteed to be lower than the red line temperature (32°C)³⁰; and constraint (12) indicates the range of values for the supply air temperature of the air conditioner.

DQN-based cooling control model

The thermal profile of a data center results from multiple uncertainties coupled with the load on IT equipment and the organization of the cooling airflow. The cooling system needs to evaluate the complex thermal profile gradients in the rack room in real time to make control responses. In this process, the decisions made by the controller at each moment are only related to the system's current state and are independent of the previous historical state. Thus, this continuous decision process has Markovian properties. Therefore, this work models the data center cooling control optimization problem as a continuous markov decision process (MDP), denoted as (S, A, R, P). The state S denotes the set of environmental states and features, and s_t ∈ S denotes the state of the agent at time t; R is the reward function, which depends on the optimization objective. r_t = R (s_t, a_t) denotes the immediate reward that the agent receives for executing action at in state s_t. P is the state transition function and p (s_t_+ 1|s_t, a_t) denotes the probability that the agent takes action at in state s_t to transform the next state s_t_+ 1. The goal of the reinforcement learning agent explores the optimal policy π for each scenario from t − T with maximized expected cumulative discounted rewards, denoted as,

R (π) = \sum_{t = 0}^{T} γ^{t} r_{t},

(13)

where the discount factor γ ∈ [0, 1] is given to weigh the effect of future rewards on cumulative rewards.

Subsequently, the classical deep reinforcement learning model DQN¹⁴ was adopted to solve this MDP problem. A schematic diagram of the DQN model is shown in Figure 6. The DQN agent learns the global control optimization strategy by continuously interacting with the cloud environment to obtain the corresponding reward.¹⁹ More specifically, the DQN model uses a deep neural network to extract features of the complex state space, followed by a Q-learning algorithm to evaluate and select action decisions. More specifically, the DQN model contains two homogeneous neural networks: the target and the online network. Q(s,a|θ) denotes the output of the online network, which is used to evaluate the value function of the current state-action pair; max_a'Q(s′,a′|θ^∼) denotes the maximum Q value of the output of the target network. Thus, the target Q value can be calculated as,

Y = r + γ ma x_{a^{'}} Q (s', a' | θ^{~}),

(14)

Subsequently, the mean square error of the current Q value and the target Q value is taken to define the loss function, denoted as,

L (θ) = E_{(s, a, r, s')} [{(Y - Q (s, a; θ))}^{2}],

(15)

Calculate the gradient of the parameter θ with respect to the loss function L(θ),

\nabla_{θ} L (θ) = E_{(s, a, r, s')} [(Y - Q (s, a; θ)) \nabla_{θ} Q (s, a; θ)],

(16)

Subsequently, based on this gradient, stochastic gradient descent (SDG) is taken to update the parameters θ of the online network. To enhance the stability of the algorithm, the target network adopts a delayed update method, which copies the parameters of the online network to the target network every C step. Therefore, the state space, action space and reward function of the DQN-based cooling control model are given below.

Figure 6.

DQN model framework.

State space

In this work, the inlet temperature Tⁱ_lnlet of each rack, the operating power consumption Pⁱ_rack, the air supply temperature T^j_sup of the air conditioner, and the wind speed FS^j_crac can be used as the state space of the rack room environment, which is expressed as S_t = {Tⁱ_lnlet, Pⁱ_rack, T^j_sup, FS^j_crac}.

Action space

The cooling controller is responsible for adjusting the cooling system air supply temperature in real-time based on the heat distribution in the rack room. Assume that there are four precision air conditioners with initial temperature T_initial in the rack room. The available operation for each air conditioner is denoted as a = {+, −}, “+” and “−” means to increase the air supply temperature or decrease the air supply temperature by 0.5°C respectively. Therefore, for a set of four air conditioners, the action combination can be expressed as A₁ = {a₁, a₂, a₃, a₄}, with a total of 2⁴ action combinations. In addition, an empty action combination A₁₇ is added to represent the operation of maintaining the original parameter settings, so the action space is represented as Action = {A₁, A₂, …, A₁₇}.

Reward function

A properly designed reward function will help the agent learn the desired strategy faster and better toward the optimization goal. Therefore, for the optimization objective of minimizing the cooling energy consumption under the thermal constraint, the reward function can be designed as,

R (t) = \frac{1}{n} \sum_{n}^{N} T_{\sup}^{n} (t) - C * HS (t),

(17)

where Tⁿ_sup(t), HS(t) denote the supply air temperature and the number of hot spots of the n-th CRAC at time t, respectively, and C denotes a constant. Specifically, the higher the supply temperature of CRACs, the lower the cooling power, so the supply temperature of all CRACs is used as a reward for the time t. Conversely, to satisfy the thermal constraints, so the number of hot spots is used as a penalty. This reward function aims to reduce the cooling energy consumption by increasing the air supply temperature of the air conditioner as much as possible while giving a corresponding penalty for violating the thermal constraint.

Agent training process

Here, it is assumed that the data center cooling system triggers a cooling control operation every 10 min for 24 h as an episode. The DQN-agent training process in each episode is as follows.

DQN-agent training process

1. Initialize the online network Q(θ) and target network Q^∼(θ^∼)
2. Initialize the cloud environment Env and obtain the initial state space s_t
3. For each episode:
4. Fort to T do:
5. Agent adopts ε-greedy strategy to select the action a_t.
6. Env takes action a_t, returns the new state space s_t+1, calculate the reward r.
7. Agent stores samples (s, a, r, s’) into the memory pool.
8. End For
9. If the number of samples exceeds the threshold:
10. Randomly select the mini-batch samples to train the online network Q(θ)
11. Defining the loss function:
12.

L (θ) = E_{(s, a, r, s^{'}) ~ D (M)} [{(r + γ \max_{a^{'}} Q (s^{'}, a^{'}; θ^{~}) - Q (s, a; θ))}^{2}]

13. Using stochastic gradient descent to update online network Q^∼(θ^∼)
14.

\nabla_{θ} L (θ) = E_{(s, a, r, s^{'}) ~ D (M)} [(r + γ \max_{a^{'}} Q (s^{'}, a^{'}; θ^{~}) - Q (s, a; θ)) \nabla_{θ} Q (s, a; θ)]

Update the target network every C steps, Q^∼=Q
15. End For

Simulation experiment and result analysis

Experiment settings

The 6SigmaDC simulation software²⁹ was adopted to build a CFD simulation model of a small data center (Figure 4), which serves as a platform for conducting repeatable validation experiments. Additionally, the simulation experiments use real workload traces from the PlanetLab system³⁵ to simulate the variation of IT load in the data center. Moreover, we extend the open-source cloud simulation tool CloudsimPy³⁶ to add temperature models, and power consumption models of computing and cooling systems. All source codes of the experiments are written in Python and run on a laptop with a core i7-5700HQ CPU, 3.5 GHz, and 12 GB RAM.

The experiment assumes that there are two cooling setpoint modes for the CRACs in the machine room, which are single setpoint (SSP) and multiple setpoint (MSP) modes. Specifically, the SSP mode indicates that all CRACs have the same cooling setpoints, while the MSP mode is a fine-grained cooling control mode that sets different cooling setpoints for each CRAC. In addition, two well-performing DRL algorithms, proximal policy optimization (PPO)¹⁶ and deep deterministic policy gradient (DDPG)¹⁵ are chosen to serve as benchmark control algorithms. Distinguished from the target-online network structure of DQN, the PPO and DDPG algorithms are policy gradient algorithms based on the actor-critic structure, where the actor is the policy neural network that selects the action and the critic is the neural network that evaluates the value of the action. Besides, the agent of PPO adopts the same strategy π for action selection during training and testing. While the agents of DQN and DDPG usually adopt ε-greedy action selection strategy during training and $a^{*} = argmax Q^{*} (s, a)$ strategy during testing or practical use.

Therefore, to validate the effectiveness and performance of the proposed DQN-MSP cooling control method, we set up four cooling control benchmarks in MSP and SSP modes which are DQN-MSP, PPO-MSP, DDPG-MSP, and DQN-SSP. Considering that the DRL algorithm is sensitive to hyper-parameters, the key hyper-parameters of the models used in the experiments are described as follows. The learning rate of actor and critic networks for PPO and DDPG are 0.01, 0.02, respectively. The hyper-parameter ε of PPO determines the range of the clip of the action selection probability and is set to 0.2. Moreover, the hyper-parameters of the DQN model are in Table 3.

Table 3.

The hyper-parameters of DQN model.

Parameters	Values	Parameters	Values
Training episode	300	The initial value of ε	0.4
Learning rate α	0.001	The maximum value of ε	0.95
Discount factor γ	0.90	ε-greedy increment	0.002
Memory size	3000	Frequency of replacement	100
Mini-batch	32	Hidden layers	2

Experimental results and analysis

Figure 7 represents the variation curves of total reward and total energy consumption of the DQN-MSP during the training process. To facilitate the observation of the variation relationship between the two curves, the total reward and the total energy consumption are normalized. Figure 7 shows that the total reward first increases rapidly and then converges gradually, while the total energy consumption curve decreases rapidly and then converges gradually, with obvious synchronization between the two. It can be inferred that the designed reward function is well suited to drive the agent to learn the control strategy toward the optimization goal of reducing energy consumption.

Figure 7.

Normalized total reward and total energy.

Figure 8 shows that as the training episodes increase, the total reward of four DRL-based cooling control methods shows an increasing trend and eventually converges. Note that DQN-SSP outperforms the other methods regarding convergence speed but has the smallest convergence value. The reason is that DQN-SSP uses a coarse-grained control strategy, so its action space is smaller than the other MSP methods, and the exploration space is smaller, so it converges faster but obtains less reward. Moreover, after 200 training scenarios, the convergence value of the proposed DQN-MSP outperforms all benchmarks.

Figure 8.

Normalized reward curves for control methods.

Figure 9 compares the total reward and energy consumption of different control methods. As can be observed, the proposed DQN-MSP algorithm can learn better control strategies to minimize energy consumption than other benchmarks. Compared to DQN-SSP, PPO-MSP, and DDPG-MSP, it can save 5.7%, 2.4%, and 4.2% of energy consumption, respectively.

Figure 9.

Total reward and energy for control methods.

Figure 10 represents the trace of the total IT power for 24 h in the data center and the average supply air temperature for various cooling control methods. It is shown that the DRL-based cooling control methods can capture the power consumption trace variations to regulate the CRACs supply temperature. In general, the supply temperature of the cooling control methods with MSP mode (DQN-MSP, PPO-MSP, DDPG-MSP) is slightly higher than that with SSP mode. The specific reason is that SSP mode requires a lower global cooling setpoint to meet the thermal constraints of each zone, especially the high heat generating zones. In contrast, MSP mode allows each CRAC to regulate the supply air temperature according to the thermal variations in their respective coverage zones. As a result, the air supply temperature of the associated CRAC can be appropriately increased for areas and periods with low heat generation, thus reducing cooling energy consumption. In addition, Figure 11 represents the temperature distribution of the four cooling control methods. The supply temperature of the proposed DQN-MSP is higher than the other benchmarks, which also means lower cooling energy consumption.

Figure 10.

Total IT power consumption and average supply air temperature.

Figure 11.

Supply temperature distribution for four control methods.

Comparing the average rack inlet temperature distribution under the two control approaches of DQN-SSP (Figure 12(a)) and DQN-MSP (Figure 12(b)), it can be seen that the temperature gradient with the fine-grained control strategy (DQN-MSP) is more uniform and closer to the red-line temperature (32°C). In conclusion, the fine-grained cooling control strategy can differentially regulate the cooling parameters of multiple air conditioners according to the temperature changes in each region, which more effectively reduce the temperature gradient and cooling energy consumption.

Figure 12.

Average rack inlet temperature: (a) DQN-SSP and (b) DQN-MSP.

Conclusion

To address the cooling control energy-saving problem in DCs, this work firstly constructs an XGBoost-based temperature prediction model to quickly and accurately evaluate the temperature distribution in rack rooms. Then, based on the guidance of this thermal model, a DRL-based cooling control method is proposed, which can reduce cooling energy consumption by over 2.4%–5.7% by increasing the average air supply temperature of the air conditioner without violating thermal constraints. The proposed DQN-based cooling control model can capture the thermal load changes of each region in the rack room and adjust the air supply temperature of each air conditioner in a fine-grained manner to improve the cooling efficiency.

However, this work also has some limitations. For example, as the cooling units increase, the action space of the single-agent controller increases exponentially, leading to model training failure to converge or poor optimization results. Therefore, the following work will consider a multi-agent cooperative control framework³⁷ to achieve asynchronous control of multiple cooling units. This approach ensures the global cooling supply and demand balance while considering the local area's thermal fluctuations.

Footnotes

Appendix

DCs Data centers

DRL Deep reinforcement learning

DQN Deep Q-network

PPO Proximal policy optimization

DDPG Deep deterministic policy gradient

MADDPG multi-agent deep deterministic policy gradien

QoS Quality of service

NLP Natural language processing

CFD Computational fluid dynamics

LHS Latin hypercube sampling

ML Machine learning

ANN Artificial neural networks

GPR Gaussian process regression

LR Linear regression

AHU Air handling unit

POD Proper orthogonal decomposition

RH Relative humidity

XGBoost Gradient boosting model

CRAC Computer room air conditioner

CoP Coefficient of performance

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The work presented in this paper was supported by: National Natural Science Foundation of China (62273109, 61772145, 61672174); Key Realm R&D Program of Guangdong Province (2021B0707010003). Guangdong Basic and Applied Basic Research Foundation (2021A1515012252, 2020A1515010727, 2022A1515012022). Key Field Special Project of Department of Education of Guangdong Province (2020ZDZX3053). Guangdong Province Ordinary Universities Characteristic Innovation Project (2019KTSCX108). Maoming Science and Technology Project (210429094551175, mmkj2020008, mmkj2020033). Weipeng Guo and Delong Cui are the corresponding authors.

ORCID iD

Jianpeng Lin

References

Gill

Buyya

. A taxonomy and future directions for sustainable cloud computing. ACM Comput Surv 2019; 51: 1–33.

Vafamehr

Khodayar

. Energy-aware cloud computing. Electricity J 2018; 31(2): 40–49.

Masanet

Shehabi

Lei

, et al. Recalibrating global data center energy-use estimates. Science 2020; 367(6481): 984–986.

Cheng

Liu

Lin

, et al. A survey of energy-saving technologies in cloud data centers. J Supercomput 2021; 77: 13385–13420.

Zhang

Meng

Hong

, et al. A survey on data center cooling systems: technology, power consumption modeling and control strategy optimization. J Syst Archit 2021; 119: 102253.

Yuan

Zhou

Pan

, et al. Phase change cooling in data centers: a review. Energy Build 2021; 236: 110764.

Athavale

Yoda

Joshi

. Thermal modeling of data centers for control and energy usage optimization. Adv Heat Transfer 2018; 50: 123–186.

Ilager

Ramamohanarao

Buyya

. Thermal prediction for efficient energy management of clouds using machine learning. IEEE Trans Parallel Distrib Syst 2021; 32(5): 1044–1056.

Athavale

Yoda

Joshi

. Comparison of data driven modeling approaches for temperature prediction in data centers. Int J Heat Mass Transfer 2019; 135: 1039–1052.

10.

Lin

. Thermal prediction for air-cooled data center using data driven-based model. Appl Thermal Eng 2022; 217: 119207.

11.

Lloyd

Rebow

. Data driven prediction model (DDPM) for server inlet temperature prediction in raised-floor data centers. In: 2018 17th IEEE intersociety conference on thermal and thermomechanical phenomena in electronic systems (ITherm), San Diego, CA, USA, 2018, pp. 716–725.

12.

Athavale

Joshi

Yoda

. Artificial neural network based prediction of temperature and flow profile in data centers. In: 2018 17th IEEE intersociety conference on thermal and thermomechanical phenomena in electronic systems (ITherm), San Diego, CA, USA, 2018, pp. 871–880.

13.

Zhang

Guliani

Ogrenci-Memik

, et al. Machine learning-based temperature prediction for runtime thermal management across system components. IEEE Trans Parallel Distributed Syst 2018; 29: 405–419.

14.

Mnih

Kavukcuoglu

Silver

, et al. Human-level control through deep reinforcement learning. Nature 2015; 518: 529–533.

15.

Lillicrap

Hunt

Pritzel

, et al. Continuous control with deep reinforcement learning. ICLR. arXiv preprint arXiv:1509.02971, 2016.

16.

Schulman

Wolski

Dhariwal

, et al. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.

17.

Ryan

Aviv

, et al. Multi-agent actor-critic for mixed cooperative-competitive environments. In: Proceedings of the 31st international conference on neural information processing systems (NIPS’17), Red Hook, NY, USA, 2017, pp. 6382–6393. Curran Associates Inc.

18.

Lazic

Boutilier

. Data center cooling using model-predictive control. In: Proceedings of the 32nd international conference on neural information processing systems, Montreal, QC, Canada, 2018, vol. 8, pp. 3818–3827. ACM.

19.

Wen

Tao

, et al. Transforming cooling optimization for green data center via deep reinforcement learning. IEEE Trans Cybern 2019; 50(5): 2002–2013.

20.

Tang

Mukherjee

Gupta

SKS

, et al. Sensor-based fast thermal evaluation model for energy efficient high-performance datacenters. In: 2006 Fourth international conference on intelligent sensing and information processing, 2006, pp. 203–208.

21.

Patankar

. Airflow and cooling in a data center. J Heat Transfer 2010; 132(7): 271–291.

22.

Sun

Stolf

Pierson

J-M

. Spatio-temporal thermal-aware scheduling for homogeneous high-performance computing datacenters. Futur Gener Comput Syst 2017; 71: 157–170.

23.

Fulpagare

Bhargav

. Advances in data center thermal management. Renewable Sustainable Energy Rev 2015; 43: 981–996.

24.

Ghosh

Joshi

. Rapid temperature predictions in data centers using multi-parameter proper orthogonal decomposition. Numer Heat Transf A Appl 2014; 66(1): 41–63.

25.

Wang

Liu

, et al. Deep reinforcement learning for tropical air free-cooled data center control. ACM Trans Sens Netw 2021; 17(3): 1–28.

26.

Chen

X-L

Cao

, et al. Ensemble network architecture for deep reinforcement learning. Math Probl Eng 2018; 2018: 2129393.

27.

Zhou

Wang

Wen

, et al. Joint IT-facility optimization for green data centers via deep reinforcement learning. IEEE Network 2021; 35(6): 255–262.

28.

Lin

Cui

Peng

, et al. A two-stage framework for the multi-user multi-data center job scheduling and resource allocation. IEEE Access 2020; 8: 197863–197874.

29.

6Sigmaroom: https://www.futurefacilities.com/products/6sigmaroom/, 2023.

30.

ASHRAE. Thermal guidelines for data processing environments. 4th ed, 2015. Atlanta, GA, USA: ASHRAE Technical Committee (TC) 9.9.

31.

Stein

. Large sample properties of simulations using Latin hypercube sampling. Technometrics 1987; 29(2): 143–151.

32.

Chen

Guestrin

. Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, New York, NY, 2016, pp. 785–794.

33.

Peng

Lin

Cui

, et al. A multi-objective trade-off framework for cloud resource scheduling based on the deep Q-network algorithm. Cluster Comput 2020; 23(4): 2753–2767.

34.

Moore

Chase

Ranganathan

, et al. Making scheduling “Cool”: temperature-aware workload placement in data centers. In: Proceedings of the 2005 Usenix Annual Technical Conference, Anaheim, CA, USA, 2005, pp. 61–75. Usenix.

35.

Planetlab: https://planetlab.cs.princeton.edu/about.html, 2023.

36.

CloudSimPy: https://github.com/FengcunLi/CloudSimPy, 2023.

37.

Ladosz

Weng

Kim

, et al. Exploration in deep reinforcement learning: a survey. Inf Fusion 2022; 85: 1–22.

A multi-setpoint cooling control approach for air-cooled data centers using the deep Q-network algorithm

Abstract

Keywords

Introduction

Related works

Data center thermal modeling

Cooling control system

System modeling and optimization goals

Data-driven thermal models

CFD model

Data driven thermal modeling

Model training and performance

Power model

Computing system

Cooling system

Optimization objective

DQN-based cooling control model

State space

Action space

Reward function

Agent training process

Simulation experiment and result analysis

Experiment settings

Experimental results and analysis

Conclusion

Footnotes

Appendix

Declaration of conflicting interests

Funding

ORCID iD

References