Abstract
To mitigate losses caused by emergency resource shortages, this paper investigates a multi-period resource allocation problem. By recognizing the interdependence among affected areas, demand points are modeled as a system. A systemic loss metric model is developed to quantify the cascading impacts arising from these interdependencies. Then, an emergency resource allocation model is constructed to minimize the loss and maximize the fairness. To solve the proposed model efficiently, a novel hybrid algorithm (ACO-DQN) that integrates ant colony optimization (ACO) with deep q-network (DQN) is designed. To enhance the convergence and stability of the algorithm, the pheromone mechanism of the ACO is employed to dynamically guide the exploration process and to adjust the Q-value update strategy. Numerical experiments demonstrate that, compared to the DQN, the proposed ACO-DQN shows significant advantages in terms of solution quality, convergence speed, and robustness. Finally, a case study based on the Wenchuan earthquake shows that considering interdependence enables decision-makers to better balance efficiency and fairness when resources are constrained. The findings provide important decision support for improving the overall resilience and recovery post disasters.
Keywords
1. Introduction
Frequent global emergencies, including earthquakes, floods, and public health incidents, severely disrupt socioeconomic systems and endanger lives due to their inherent destructiveness and uncertainty. Against this backdrop, humanitarian emergency logistics emerges as the lifeline of relief operations and determines the effectiveness of disaster response. 1 Efficiently and fairly allocating limited resources among dispersed demand points is a central and persistent challenge in humanitarian emergency logistics. Therefore, developing models for optimizing emergency supply allocation is of significant theoretical and practical urgency.
The emergency resource allocation problem is characterized by inherent complexities such as dynamic demand, multi-stakeholder coordination, and resource constraints. Research methodologies in this field have continually evolved to address these challenges. Research predominantly relies on operational research methods like stochastic programming 2 and robust optimization 3 to model parameter uncertainties. As problem scales expanded, heuristic and meta-heuristic algorithms (e.g., genetic algorithms, simulated annealing) were applied to obtain feasible solutions for complex instances. 4 More recently, deep reinforcement learning (DRL) has emerged for dynamic and sequential decision-making, given its capability to learn adaptive policies through interaction with complex environments. Particularly Deep Q-Networks (DQN) and their variants have gained preliminary attention in resource allocation due to their structural stability. 5 Building on this, some studies have further integrated DQN with heuristic algorithms (e.g., NSGA-II) to tackle emergency resource distribution problems.6,7 Therefore, the integration of DQN with heuristic algorithms presents a highly promising and necessary direction for solving complex emergency resource allocation problems.
The impact of emergency supply shortages is severe, yet existing metrics for quantifying this impact are often oversimplified. A common approach models the penalty for shortages as a linear cost function, calculated simply as the product of the shortage quantity and a fixed unit penalty.8,9 To achieve greater realism, subsequent research incorporated the time dimension, developing deprivation cost models that account for both the amount and the duration of shortages.10,11 However, these existing loss metrics fail to capture cascading loss in interdependent regions. In interdependent regions, a resource shortage at a critical node can trigger cascading failures through supply chains and mobility networks, imposing significant secondary losses on interconnected points.12,13 Consequently, allocation models relying solely on localized loss achieve only local optimality, failing to enhance overall system resilience. Therefore, a comprehensive loss assessment should incorporate the interdependent perspective.
As a powerful sequential decision-making framework, deep reinforcement learning has shown promise in optimizing emergency resource allocation.5,14 However, its application to large-scale, complex disaster scenarios is often constrained by two interrelated challenges: low learning efficiency and non-adaptive exploration. Standard approaches, which rely on a single learning paradigm and fixed strategies like ε-greedy, struggle to fully utilize past experience and adapt their exploration schemes. This frequently leads to suboptimal convergence and fails to guarantee robust performance in real-world emergency operations.
To bridge these gaps, this paper investigates multi-period emergency resource allocation by explicitly modeling regional interdependencies. First, a loss metric model is developed to quantify the cascading impact of resource shortages arising from these interdependencies. Subsequently, the allocation problem is formulated as a sequential decision-making process. To solve this model efficiently and robustly, a novel hybrid ACO-DQN algorithm is proposed. The main contributions of this work are summarized as follows: (1) A loss metric model incorporating regional interdependencies is proposed. To address the existing models’ oversight of cascading effects, this work develops a loss function that explicitly accounts for interdependencies among affected areas. This shifts the assessment from evaluating isolated points to analyzing the integrated affected areas. Consequently, the model quantifies the cascade losses, enabling resource allocation strategies that optimize the overall recovery of all disaster affected areas. (2) A hybrid ACO-DQN algorithm with enhanced learning efficiency and adaptive exploration is proposed. 1) A hybrid learning paradigm integrating value learning and experience accumulation is developed. To address low learning efficiency, we integrate the temporal-difference (TD) error from DQN with the pheromone mechanism from ACO. This integration allows for the assessment and prioritization of high-quality historical experiences. The outputs of these two learning strategies are then adaptively weighted, enabling more efficient and stable policy learning. 2) A pheromone-driven adaptive exploration strategy is proposed. To overcome the limitations of fixed exploration strategies, we design a mechanism that uses pheromone information to dynamically modulate exploration intensity. This strategy progresses through three distinct phases: promoting broad coverage in the initial stage, shifting to targeted exploration guided by accumulated pheromone, and finally converging on refined exploitation. This ensures an optimal balance between exploration and exploitation throughout the learning process.
The remainder of this paper is structured as follows: Section 2 presents a literature review. Section 3 constructs the loss metric model and the emergency resource allocation model. Section 4 details the design of the ACO-DQN algorithm. Section 5 designs simulation experiments to comparatively analyze the performance of the proposed model and algorithm. Section 6 concludes the paper and outlines directions for future research.
2. Literature review
2.1. Modeling the loss of emergency resource shortages
Emergency resource shortages frequently arise from uncertainties inherent in disaster demand, transportation, and supply. Beamon proposed a fundamental distinction between humanitarian relief chains and commercial supply chains, stating that while commercial supply chains focus on profit maximization, humanitarian relief chains aim to minimize loss of life and alleviate human suffering. 15 Therefore, quantifying and incorporating the loss associated with emergency resource shortages is a critical issue that must be considered in the emergency resources allocation.
To measure losses resulting from emergency resource shortages, linear penalty cost functions have been commonly adopted in the literature. In such an approach, the incurred loss is quantified as the product of the shortage quantity and a predetermined unit penalty coefficient. For instance, Balcik et al. employed a linear penalty model in their formulation of last-mile distribution planning. 16 Similarly, Rawls and Turnquist incorporated a linear shortage penalty within the objective function of their emergency supply pre-positioning model. 9 Extending this line of work, Ahmadi et al. also applied a linear penalty structure to quantify shortage losses under conditions of network disruption. 17
Recognizing that human suffering intensifies with the duration of unmet needs, recent studies have incorporated the time dimension into the measurement of shortage-related losses. A pivotal development in this regard is the concept of deprivation costs, introduced by Holguín-Veras et al. 18 to quantify the physiological and psychological suffering caused by delayed access to relief supplies. This concept was subsequently established as a theoretical cornerstone of humanitarian logistics 18 and has been empirically calibrated using post-disaster data. 19 As a result, deprivation cost functions have been widely integrated into optimization models for emergency logistics, guiding resource allocation and distribution decisions in recent research.11,20,21
To further mitigate the losses arising from emergency resource shortages, fairness has been increasingly integrated into allocation models as a critical objective. Tzeng et al. measured fairness by maximizing the minimum satisfaction rate across affected areas. 22 Balcik and Beamon emphasized the balance between maximizing total demand fulfillment and maintaining distributional fairness. 16 Other studies have modeled the trade-off between efficiency and fairness, either by quantifying unmet demand through penalty costs 23 or by embedding the Gini coefficient within the fairness. 24
Despite these methodological advances, a fundamental limitation persists across linear penalty, deprivation cost, and fairness-aware models. These approaches typically treat losses at individual demand points as independent and additive, relying on a conventional assumption of localized impact that confines the consequences of a shortage to the directly affected area. However, empirical evidence consistently shows that disruptions are inherently propagative, rendering such simplifying assumptions increasingly untenable. This is particularly critical given the interconnected nature of modern supply chain and socio-economic networks, where risks are prone to cascading effects. 25 For instance, research on flood disasters 12 and urban rainstorms 13 demonstrates significant economic and logistical spillovers propagated through regional linkages. From a systemic risk perspective, studies have drawn analogies to epidemic models, revealing threshold behaviors in risk diffusion. 26 Despite this growing recognition of cascade effects, regional interdependencies and the resulting cascading losses remain notably absent from the loss functions used in emergency resource allocation models.
2.2. Deep reinforcement learning for resources allocation problems
Deep Reinforcement Learning (DRL), with its capacity for sequential decision-making and adaptation in complex, uncertain environments, has emerged as a promising paradigm for resource allocation problem.
2.2.1. Applications in emergency management
In emergency management, DRL has been applied to a range of resource allocation and logistics routing problems. Fan et al. designed a DQN-based method for emergency supply allocation, demonstrating superior solution quality and reduced computational time compared to conventional optimization methods. 5 For public health crises, Zeng et al. established a DRL-based dispatch model for the allocation of medical supplies. 27 Lei et al. developed a DRL support system for multi-hazard response, reporting significant gains in operational efficiency and resource utilization. 28 Beyond pure allocation, DRL has also been employed for integrated logistics problems, such as the truck-drone cooperative routing challenge tackled by Peng et al. formulate the truck-drone collaborative routing problem in humanitarian logistics as a Markov game and solve it using a multi-agent deep reinforcement learning algorithm enhanced with prioritized experience replay and invalid action masking. 29 Gao et al. employ deep reinforcement learning to optimize an anticipatory routing, acceptance, and postponement policy for the multi-period dynamic vehicle routing problem with stochastic requests, demonstrating that DRL effectively enhances emergency supplies distribution under dynamic and uncertain conditions. 30 Wang et al. propose an adversarial deep reinforcement learning framework (RESA) that models multi-period emergency supplies allocation under demand uncertainty as a two-player zero-sum Markov game, and show that their RESA-PPO algorithm-combining combinatorial action representation and reward clipping-significantly outperforms heuristic and standard RL methods. 31
2.2.1. Hybrid DRL-heuristic algorithms
To enhance the solution efficiency and stability of DRL in complex optimization contexts, a growing line of research focuses on integrating DRL with heuristic and metaheuristic algorithms. These hybrid frameworks aim to combine the adaptive learning capability of DRL with the robust search mechanisms of traditional heuristics.
Wu et al.proposed a weight-aware deep reinforcement learning (WADRL) approach to solve the multi-objective vehicle routing problem with time windows. This method utilizes the DRL model to address the entire multi-objective optimization problem, and then employs the non-dominated sorting genetic algorithm-II (NSGA-II) to further optimize the solutions generated by WADRL, thereby mitigating the limitations of each method individually. 14 In the emergency resource allocation problem, Wu et al. combined DRL with a genetic algorithm for maritime search and rescue resource allocation. Their algorithm was able to provide stable optimal solutions within 300 seconds, meeting the timeliness requirements of emergency response. 32
Beyond genetic algorithms, heuristic algorithms such as Particle Swarm Optimization (PSO), Simulated Annealing (SA), and Differential Evolution (DE) have been successfully integrated with DRL. Pradhan et al. proposed a deep reinforcement learning with particle swarm optimization (DRPO) algorithm, which utilizes PSO to avoid unnecessary searches in the deep deterministic policy gradient (DDPG) method. 33 Kosanoglu et al. designed a hybrid method combining a Double DQN (DDQN) agent with Simulated Annealing (SA). In each episode of their proposed algorithm, the best solution found by DRL is passed to SA as an initial solution, while the best solution from SA is passed back to DRL as an initial state. 34 Li et al. developed an adaptive multi-objective differential evolution algorithm based on DRL, where DRL serves as a controller integrated into the multi-objective differential evolution algorithm, enabling adaptive selection of mutation operators and parameters according to different search domains. 35
DRL has gained prominence in resource allocation due to its proficiency in sequential decision-making under uncertainty, demonstrating considerable promise in the domain of emergency logistics. To further improve its solution quality and stability, a growing body of research has developed hybrid frameworks that integrate DRL with heuristic or metaheuristic algorithms (e.g., PSO, SA). However, these integrations largely maintain a modular separation, in which DRL functions as a high-level orchestrator for parameter adaptation, while the embedded heuristic executes the core search process. This design over looks the potential of embedding the learning mechanisms (i.e., the pheromone-based feedback in ACO) in heuristic directly into the evolution or value estimation processes of DRL. Consequently, a deeper algorithmic hybridisation remains underexplored.
2.3. Gap analysis
There are two key research gaps remain in the existing literature.
First, current loss models fail to capture the cascading losses resulting from regional interdependencies in emergency resource allocation. Most loss metrics, including the deprivation cost model, 18 are designed for independent demand points and do not account for the propagation of shortages through economic, logistical, or social linkages. Consequently, allocation strategies derived from such models may be locally optimal but systemically inefficient. While fairness considerations have been studied, 23 they are typically static and not integrated with a loss function that explicitly embeds network centrality. In contrast, our proposed loss metric directly incorporates interdependency factors into a sigmoid-based loss function. This formulation translates regional systemic importance into a nonlinear, threshold-sensitive loss that penalizes shortages in high-centrality regions more severely, thereby internalizing cascade effects.
Second, while hybrid DRL-heuristic frameworks have emerged, a deep integration remains absent. Existing approaches (e.g., NSGA-II-DQN, PSO-DRL) typically employ heuristics for action filtering, population initialization, or separate policy shaping, but the heuristic principles are not embedded into the core learning mechanics of the DRL agent. In our ACO-DQN algorithm, the pheromone mechanism directly modulates the Q-value update and the pheromone concentration is updated using episodic rewards, creating a closed-loop interaction between the long-term memory of ACO and the temporal-difference learning of DQN. This is fundamentally different from modular or loosely coupled hybrids, as the heuristic feedback becomes an integral part of the value function approximation process.
Therefore, this study makes two distinct contributions. First, a novel loss metric model is developed that considers the cascade effect in interdependent regions. This transforms the allocation problem from optimizing local performance to maximizing global system benefit. Second, the ACO-DQN algorithm is proposed as an efficient solver for the proposed model. This algorithm achieves a deeper integration by embedding the pheromone-based feedback mechanism of ACO directly into the experience learning and exploration process of the DQN. This integration aims to achieve superior convergence and policy robustness in the complex emergency resource allocation problem.
3. Model development
3.1. Problem description
Efficient and equitable multi-period allocation of emergency resources is crucial for effective disaster response. This paper investigates a multi-period emergency resource allocation problem within a system comprising a central distribution center (DC) and multiple interdependent demand points (DPs), as illustrated in Figure 1. The objective is to determine optimal allocation plans from the DC to each DP over a finite planning horizon of The multi-period emergency resource allocation problem.
3.1.1. Model assumptions
The proposed model is based on the following assumptions: (1) The total demand for emergency resources at each demand point over the entire planning horizon is known and deterministic. In reality, post-disaster demand is subject to considerable uncertainty. However, the deterministic assumption is widely adopted in the emergency logistics literature18 23 for two reasons. First, it provides a tractable baseline model that focuses on the core trade-offs among cascading losses, fairness, and capacity constraints without the additional complexity of stochasticity. Second, in practice, relief agencies typically produce point estimates of total demand based on rapid needs assessments (e.g., affected population multiplied by per-capita consumption rates). The proposed model can be applied directly using such estimates. The deterministic assumption therefore does not undermine the model’s practical relevance. (2) The total quantity of resources dispatched from the distribution center in any single period cannot exceed its available capacity. (3) Resources allocated at the beginning of period (4) Demand points are interdependent. The shortage at one point may propagate and generate cascading effects on other points.
3.2. Loss model considering regional interdependency
Most existing models for emergency resource allocation quantify loss based solely on local resource shortages, largely neglecting the interdependencies among regions formed through economic, social, and logistical ties. Consequently, these models fail to capture the cascading effects of disaster losses across interconnected systems. To address this, this paper introduces the regional interdependency into the loss metric model to characterize the cascade effect of shortage.
3.2.1. Interdependency network
Let
3.2.2. Sigmoid-based loss function
To capture the nonlinear escalation of loss once a shortage exceeds a critical threshold, we adopt an S-shaped (sigmoid) function, following established disaster impact studies.36–38 The interdependency factor Relationship of the shortage and the loss.
For a region with high
3.3. Model formulation
3.3.1. Parameter definitions
3.3.2. Decision variables
The objective of this study is to minimize the total loss of the demand system and maximize the fairness among regions. The loss is calculated by the proposed loss model. Fairness is defined as range-based disparity (max-min total loss), which is a special case of min-max fairness. This is a special case of min-max fairness (Rawlsian criterion,
39
which prioritizes the welfare of the worst-off group in a distribution. This choice directly penalizes the difference between the region with the highest cumulative loss and the one with the lowest cumulative loss, thereby avoiding extreme deprivation. The proposed model is formulated as follows.
When
Equation (2) represents the objective function, indicating that the model aims to minimize the loss while maximizing fairness. The first term represents the total accumulated loss across all demand points over the entire planning horizon. A smaller total loss indicates that resources are more effectively distributed to mitigate the adverse consequences of shortages. The second term quantifies the range of cumulative losses among the demand points. In particular, it computes the difference between the highest total loss experienced by any region and the lowest total loss among all regions. A smaller value implies that the allocation is fairer across regions. Equation (3) ensures that the total quantity of resources allocated from the distribution center to all demand points in each period does not exceed its capacity. Equations (4)–(6) describe the state transition constraints for the emergency resource shortage. Equation (7) imposes the non-negative constraint on the allocated quantity of emergency resources.
4. Algorithm design
To solve the multi-period interdependent resource allocation problem formulated in Section 3, a hybrid ACO-DQN algorithm is proposed. This section is structured as follows. In Section 4.1, the problem is formally defined as a sequential decision-making process within a Markov decision process (MDP) framework, specifying the state, action, and reward functions. Section 4.2 discusses the limitations of applying a standard DQN directly to this problem and outlines two corresponding improvement strategies, which motivate by the ACO. The complete algorithmic steps and training procedure are provided in the Appendix.
4.1. Formulation as a markov decision process
This section formulates the multi-period resource allocation model as a Markov Decision Process within a DQN framework. In this framework, the distribution center acts as an agent that interacts with an environment comprising the demand points. At each decision period
To guide the agent in generating effective actions, we define four distinct resource allocation strategies, each corresponding to a specific weight vector
Action1 (Urgency Priority):
This framework transforms the complex multi-period resource allocation problem into a decision-making task that can be trained through deep reinforcement learning. Although the MDP formulation could theoretically be solved by dynamic programming or mixed-integer programming for small
4.2. Design of the ACO-DQN algorithm
When a standard DQN is applied to the multi-period allocation problem, two main challenges are encountered which are the inefficient use of experience and the undirected exploration. To address these issues, the pheromone mechanism of Ant Colony Optimization (ACO) is integrated into DQN.
4.2.1. Pheromone-guided experience utilization
In standard DQN, transitions are sampled uniformly from the replay buffer, and the varying learning value of different experiences is ignored. To bias learning towards historically successful actions, the Q-value update is augmented with a pheromone term.
4.2.1. Adaptive exploration
The standard ε-greedy exploration strategy linearly decays randomness in a predetermined manner, which lacks adaptability and may lead to inefficient exploration behavior in complex decision spaces. A three-phase adaptive exploration mechanism is introduced, in which the balance between pheromone guidance and Q-value guidance is dynamically adjusted.
1) Early stage: Exploration is strongly guided by accumulated pheromone trails, enabling rapid bootstrap from high-quality historical experience and reducing dependence on purely random initial exploration. 2) Middle stage: A balanced mix of pheromone and Q-value guidance maintains a robust trade-off between exploration and exploitation. 3) Late stage: Exploration becomes strongly Q-value-driven, allowing the policy to finely converge towards the optimum predicted by the mature value network.
The flowchart of the main innovations of the algorithm is shown below (Figure 3). Innovations of the algorithm.
The complete training procedure of the ACO-DQN algorithm is outlined in Appendix2. The complexity analysis of the proposed algorithm are shown in Appendix 3.
5. Numerical experiments
This section conducts comprehensive numerical experiments to validate the effectiveness of the proposed ACO-DQN algorithm and the proposed emergency allocation model. The experiments consist of two parts: (1) performance assessment on randomly generated instances of varying scales, and (2) a case study based on actual emergency resource allocation data.
5.1. Algorithm performance evaluation
To evaluate the performance of the proposed ACO-DQN algorithm, we design four groups of randomly generated test instances with increasing complexity. The detailed parameter settings are as follows: Group 1:
This section evaluates the performance of the proposed ACO-DQN algorithm against DQN and DRL-GA on the four groups described above. The following subsections analyze solution quality, stability, convergence speed, and computational efficiency.
5.1.1. Solution quality and stability analysis
Average objective value of three algorithms (mean ± standard deviation).
From Table 1, ACO-DQN achieves slightly lower objective values than DQN in three groups, with improvements ranging from 0.01 to 0.04. Although the differences are marginal, ACO-DQN consistently matches or marginally outperforms DQN. Furthermore, the standard deviations of ACO-DQN are generally smaller than those of DQN, indicating slightly better stability. In contrast, DRL-GA yields significantly higher objective values and larger standard deviations, demonstrating its inferior performance and robustness.
To gain deeper insight into the distribution of the results, Figure 1 presents boxplots of the objective values for the three algorithms on each group, based on the 50 instances. The boxplots clearly show the median (central mark), interquartile range (box edges), and outliers (points beyond the whiskers).
From the boxplots in Figure 4, ACO-DQN consistently exhibits the lowest median and interquartile range across all four groups, indicating both superior solution quality and higher stability compared to DQN and DRL-GA. DQN yields slightly higher medians and somewhat wider distributions, while DRL-GA shows markedly elevated medians, larger spreads, and several outliers especially in Groups 3 and 4, confirming its inferiority and instability. Overall, the boxplot analysis reinforces the conclusion that ACO-DQN outperforms the other two algorithms in terms of both central tendency and robustness. Distribution of objective values for three algorithms.
5.1.2. Convergence speed and computational efficiency analysis
Convergence speed and computational efficiency comparison.
As shown in Table 2, ACO-DQN consistently converges faster than DQN across all groups, with improvements ranging from 2.0% to 4.2%. The average reduction in convergence episodes is 3.3%. Moreover, ACO-DQN requires slightly less CPU time per instance. These results demonstrate that the pheromone-guided mechanism effectively accelerates the learning process. Therefore, ACO-DQN is more suitable for time-critical emergency response scenarios where rapid decision-making is essential.
Based on the two parts of analysis, ACO-DQN demonstrates clear superiority. It not only matches DQN in solution quality and stability while outperforming DRL-GA by a large margin, but also converges faster with slightly lower CPU time. Therefore, ACO-DQN is a more efficient, stable, and reliable algorithm for multi-period emergency resource allocation, particularly in time-critical disaster response.
5.2. Sensitivity analysis of key parameters
To examine the impact of key parameters on model performance, we conduct sensitivity analysis using Group 3 instances (
5.2.1. Effect of the sigmoid turning point
The parameter
Sensitivity of performance metrics to
5.2.2. Effect of the fairness weight
The parameter
Sensitivity of performance metrics to
5.2.3. Effect of distribution center capacity
The per-period capacity is scaled by factors of 0.6, 0.8, 1.0, 1.2, and 1.4 relative to the default value
Sensitivity of performance metrics to capacity factor.
5.2.4. Robustness to
perturbations
Unlike the global parameters
Sensitivity to
The above analysis confirms that the model behaves as expected and that the default parameter values
5.2. Case study
The proposed model is applied to a real-world scenario, the 2008 Wenchuan earthquake. Four severely affected regions are selected for analysis which are Dujiangyan, Wenchuan, Beichuan, and Qingchuan. First, the allocation of prefabricated housing, which is a critical emergency resource for post-disaster shelter, is examined. Then, the generalizability of the model is tested by applying it to a different resource type, i.e., disinfectants for epidemic prevention. Finally, the adaptability of the proposed model to various emergency resources and disaster contexts is discussed.
5.2.1. Prefabricated housing allocation
The demand for prefabricated housing in each region is estimated using the formula:
To measure regional interdependency, this study adopts road freight turnover and road passenger turnover as indicators. Freight turnover reflects the strength of supply chain networks for raw materials and goods, while passenger turnover captures socioeconomic linkages such as labor and business flows. Together, they form the basis for regional interdependency. According to the Sichuan Statistical Yearbook 2008, the road passenger turnovers for the four regions are: 122256, 60353, 6096, and 10979 (10,000 person·km). The corresponding road freight turnovers are: 5908, 120431, 3859, and 2792 (10,000 ton·km).
To eliminate scale differences and integrate both dimensions, the raw data are normalized, and a weighted sum approach is used to construct a composite interdependency index:
The resulting total objective value is 16.7058, comprising a total loss of 15.2007 and a fairness term of 1.5051. The losses attributed to each region are: 1.7553, 5.5179, 3.9173, and 4.0102, respectively. The emergency resource allocation plan across the six periods is presented below.
Emergency resource allocation plan for prefabricated housing.
The model first prioritizes Dujiangyan, whose demand is fully satisfied within the first two periods. This allocation priority stems from the high interdependency coupled with moderate absolute demand of Dujiangyan. Addressing its shortage early effectively mitigates potential cascading losses, thereby accelerating the overall recovery of affected regions.
Subsequently, the model adopts a phased approach to allocate resources to the remaining regions. Although Wenchuan has the highest interdependency, its resource demand is magnitude larger. To prevent overallocation to a single high-priority node at the expense of systemic fairness, the model gradually increases Wenchuan’s allocation share over successive periods. ,This results in a final satisfaction rate of 87.45% by Period 6. This rate is slightly lower than those of Beichuan 94.01% and Qingchuan 91.98%.
This outcome illustrates the balance between efficiency and fairness of the proposed model. Beichuan and Qingchuan, despite their low interdependency, achieve high satisfaction rates because their smaller absolute demand can be met without severely compromising allocations to the critical, high-demand region Wenchuan. Thus, the model avoids the extremes of purely efficiency-driven or purely fairness-driven allocation.
5.2.2. Disinfectant allocation
To demonstrate the model’s applicability to a different category of emergency resources, we apply the same framework to disinfectants (e.g., 84 disinfectant solution, bleaching powder), which are essential for post-disaster epidemic prevention. The demand for disinfectants is estimated based on the standard practice of large-scale environmental disinfection after earthquakes. The required amount of disinfectant is approximately proportional to the affected population. A commonly used ratio in disaster logistics is 10 liters of disinfectant concentrate per affected person for the initial six-week response period. Thus, the demand for region
The sigmoid turning point
Emergency resource allocation plan for disinfectant.
Table 8 presents the allocation plan. The total objective value is 16.0756, comprising a total loss of 14.8235 and a fairness term of 1.2521, both slightly lower than those for prefabricated housing (15.2007 and 1.5051, respectively). This reduction is primarily driven by the smaller
The allocation pattern differs notably from the prefabricated housing case. Dujiangyan (interdependency 0.5245, demand 74,570 L) is fully satisfied by the end of Period. Wenchuan (interdependency 0.7469, demand 584,540 L) receives gradually increasing allocations and reaches full satisfaction by Period 5. Beichuan and Qingchuan, which have very low interdependency (0.0410 and 0.0565), are largely deferred: at Period 6 their satisfaction rates reach only 61.8% and 92.6%, compared to 94.0% and 92.0% for prefabricated housing. This indicates that when urgency is high, the model prioritizes high-interdependency regions even more aggressively, at the expense of low-interdependency areas with large demands. Overall, the plan satisfies about 91.8% of total disinfectant demand, confirming that the model adapts effectively to resource criticality by recalibrating
5.2.3. Generalizability of the proposed model
The case studies above demonstrate that the proposed model is not limited to a specific resource type. Whether allocating shelter materials (prefabricated housing) or epidemic prevention supplies (disinfectants) are allocated, the same framework, which includes the interdependency network
Therefore, the proposed model can be readily extended to other emergency resources, such as food, drinking water, medical supplies, and fuel, by recalculating demands based on population or other relevant indicators and by adjusting
5.3. Managerial implications
The proposed model and algorithm offer several actionable insights for emergency managers and policymakers involved in multi-period resource allocation under regional interdependencies. (1) Priority setting based on interdependency. The interdependency factor (2) Balancing efficiency and fairness. The fairness weight (3) Adjusting urgency via the sigmoid turning point. The parameter (4) Capacity planning. The sensitivity analysis reveals that increasing capacity beyond a certain point yields diminishing returns. Managers should prioritize ensuring a minimum adequate capacity before investing in further expansion. When capacity is severely constrained, the model automatically favors high-interdependency regions, which may be an acceptable trade-off in extreme scarcity. (5) Algorithm selection for real-time deployment. The ACO-DQN algorithm converges 2.0%∼4.2% faster than standard DQN while maintaining the same solution quality, and its inference time is below 0.5 seconds per decision period. This makes it suitable for time-critical emergency response where rapid re-planning is required as new demand information arrives.
6. Conclusion
This paper investigates a multi-period emergency resource allocation problem in post-disaster with interdependent regions. First, a loss metric model is developed to capture the cascading effects of resource shortages across interdependent regions. By modeling demand points as a system, the proposed framework provides a more realistic quantification of losses caused by shortage. Unlike deprivation cost models that assume independent regions, our loss function explicitly incorporates interdependency factors to reflect cascading dynamics. Second, based on the loss model, a multi-period optimization model is formulated to minimize loss and maximize fairness. Third, to solve the proposed model efficiently, a novel hybrid algorithm (ACO-DQN) is designed, which integrates the pheromone guided mechanism of ACO into DQN. This integration enables dynamic guidance of the exploration process and enhances the Q-value update strategy. In contrast to existing heuristic-DRL hybrids that use heuristics only for action filtering or population initialization, our ACO-DQN directly embeds the pheromone signal into the Q-value update, creating a closed-loop interaction between heuristic memory and value learning. Numerical experiments confirm that ACO-DQN outperforms the standard DQN in terms of both computational efficiency and solution stability. Furthermore, a case study based on the 2008 Wenchuan earthquake illustrates that the proposed model enables decision-makers to balance efficiency and fairness under resource constraints.
While the proposed algorithm shows significant advantages in small-scale problems, its performance in large-scale and complex scenarios remains less pronounced. Therefore, future research will focus on exploring more efficient optimization algorithms suitable for large-scale emergency resource allocation problem.
Supplemental material
Supplemental material - Multi-period emergency resource allocation problem with a hybrid ant colony optimization and deep Q-network algorithm
Supplemental material for Multi-period emergency resource allocation problem with a hybrid ant colony optimization and deep Q-network algorithm by Jingke Zhou, and Yingzhen Chen in Science Progress.
Footnotes
Acknowledgements
This work was supported by the National Natural Science Foundation of China under Grant No. 72374067.
Author contributions
Zhou Jingke(First Author): Conceptualization, Methodology, Software, Validation, Investigation, Data curation, Writing-original draft, Writing-review & editing, Visualization. Chen Yingzhen(Corresponding Author): Conceptualization, Methodology, Supervision, Project administration, Funding acquisition, Writing-original draft, Writing -review & editing.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was funded by National Natural Science Foundation of China (72374067).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The data used in this study are derived from publicly available sources (e.g., Sichuan Statistical Yearbook 2008) and simulated instances generated by the authors. The simulation data supporting the findings are available from the corresponding author upon reasonable request.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
