Abstract
Meeting the growing global electricity demand in remote and off-grid regions requires cost-effective and reliable power solutions that overcome the intermittency of renewable energy sources. This paper presents a comprehensive techno-economic optimization framework for the design and operation of off-grid hybrid renewable energy systems (HRES) integrating photovoltaic (PV), wind turbine, biomass generator, diesel backup, and a dual-chemistry hybrid battery energy storage system (HBESS) combining lithium-ion and nickel-iron batteries. A detailed mathematical modeling approach is employed to capture the nonlinear dynamics, stochastic renewable behavior, battery degradation, and temperature-adjusted component efficiencies. The system is formulated as a multi-objective mixed-integer nonlinear programming problem targeting the minimization of life cycle cost (LCC), levelized cost of energy (LCOE), and CO2 emissions while satisfying reliability constraints such as loss of power supply probability (LPSP < 0.01). To solve the optimization problem, advanced metaheuristic algorithms—Particle Swarm Optimization (PSO), Genetic Algorithms (GA), Grey Wolf Optimizer (GWO), and Differential Evolution (DE), and Salp Swarm Algorithm (SSA)—and a Deep Q-Network (DQN)-based reinforcement learning energy management strategy are implemented and benchmarked. The proposed DQN-based controller demonstrates superior performance over conventional rule-based and static dispatch methods by maintaining more stable battery state-of-charge (SOC) profiles, reducing degradation, and enabling intelligent real-time decision-making. Simulation results based on realistic meteorological and demand profiles reveal that the integrated DQN and HBESS strategy reduces total LCC by over 20%, CO2 emissions by up to 30%, and battery degradation costs by over 10% compared to baseline systems. The Salp Swarm Algorithm (SSA) achieves the fastest convergence and the highest-quality Pareto-optimal solutions among all metaheuristics evaluated. Sensitivity analysis identifies diesel price and interest rate as the most influential parameters on LCOE, while load shifting through aggressive demand-side management further minimizes battery usage, operating costs, and emissions. The proposed framework not only addresses key challenges in off-grid microgrid design but also provides a scalable and robust pathway for sustainable rural electrification using hybrid storage and intelligent control.
Keywords
Introduction
The growing global energy demand and climate goals are driving a rapid transition to renewable power. According to the International Energy Agency, world energy consumption grew at an above-average pace in 2024, with electricity demand rising nearly twice as fast as total demand https://www.iea.org/reports/global-energy-review-2025-:∼:text=The%20latest%20data%20show%20that,data%20centres%20and%20artificial%20intelligence. This surge is fueled by higher loads for cooling, industry, transport electrification, and data centers (IEA, 2025). Meeting this demand sustainably requires ramping up low-carbon generation. Many governments have pledged net-zero emissions by 2050, which implies dramatically higher shares of solar, wind, and other renewables in power grids (Dong et al., 2022; Pablo-Romero et al., 2023; Safi et al., 2021). As Pinthurat et al. (2024) explain, rising GHG emissions have motivated adoption of renewables to achieve a more stable, sustainable energy landscape. Likewise, Güven and Samy (2022) note that decades of economic growth have inevitably accelerated the transition to renewable energy use worldwide” to meet fast-growing demand. The energy transition and growing load requirements make hybrid systems of multiple renewables and storage a strategic necessity.
Remote and off-grid regions especially benefit from hybrid renewable energy systems (HRES). Over 1.6 billion people lack electricity access, 80% of them in rural Asia and Africa (Mohammed et al., 2014). Extending centralized grids is often impractical or too costly in such areas (Fara et al., 2006). Hybrid microgrids combining solar, wind, biomass, or hydro with energy storage—offer an affordable alternative. Integrating complementary resources can eliminate the variability of any one source. On an isolated DC or AC bus a photovoltaic (PV) array, wind turbines, and batteries can jointly supply power when the grid is unavailable (Valenciaga and Puleston, 2005). Afgan and Carvalho (2008) show that HRES can deliver highly cost-saving performance compared to single-source systems. Indeed, techno-economic analyses find hybrid power plants such as solar–biomass, PV–wind–diesel hybrids often outperform diesel gensets and standalone renewables in life-cycle cost, reliability, and emissions (Infield, 1994). Thus, HRES are increasingly viewed as a key solution for rural electrification and small-scale microgrids.
Integrating variable renewables with storage in off-grid microgrids poses major technical and economic challenges. The fundamental issue is intermittency, solar, wind, and biomass outputs fluctuate unpredictably, making demand–supply balance difficult (Olatomiwa et al., 2016; Ramesh and Saini, 2020). To compensate, diverse sources and energy storage must be added but this greatly increases system complexity and cost. The intermittent nature of most renewable resources and increased capital expenses drive the design of HRES with storage and complementary sources to achieve a reliable, cost-effective supply (Güven and Samy, 2022). Olatomiwa et al. (2016) stress that incorporating multiple renewables, backup generators, and storage is essential to overcome intermittency, yet these additional design considerations raise the overall cost. In practice, off-grid microgrids often must include diesel generators or other backup to meet peak loads. However, reliance on diesel is costly, rural generators incur high fuel, maintenance, and logistics expenses (Elhadidy, 2002; Protogeropoulos et al., 1997). Standalone PV systems alone are usually far from being economic compared to fossil backup. By contrast, PV-wind or PV-wind-diesel hybrids can supply off-grid loads more economically. In all cases, intelligent energy management and control are needed to dispatch sources and storage optimally, but the combinatorial complexity and uncertain whether make this very challenging (Mohammed et al., 2014).
Many studies formulate optimization problems for HRES component sizing and dispatch, using tools like HOMER or custom models (Ashetehe et al., 2024; Khan et al., 2025). A wealth of renewable–storage planning approaches exists these include deterministic and stochastic programming, classical techniques and a wide range of metaheuristic algorithms. Khan et al. (2025) provide a comprehensive review showing that combining RES with storage improves reliability and reduces cost, but they emphasize that further study of optimization methods, meta-heuristic algorithm strategies, system components, and design constraints is needed. Modern techniques overwhelmingly favor metaheuristic soft computing for HRES optimization. Traditional optimization fails to cope with the many uncertainties and nonlinearities, so in the last decade soft computing techniques that rely on meta-heuristic algorithms have been widely employed for hybrid system sizing. Indeed, recent literature abounds with GA, PSO, DE, GWO, FA, ABC, CS, ACO, and other metaheuristic optimizers applied to microgrid design (Modu et al., 2023; Ramli et al., 2018). About 25% of microgrid optimization papers use PSO, ∼10% use GA, ∼5% use GWO, etc., reflecting the dominance of population-based algorithms https://bohrium.dp.tech/paper/arxiv/c3dc08f71192a53a505c5f76b65e14857c12ab1b697662c7f70d383ed7056eae-:∼:text=techno,further%20studies%20in%20this%20area.
Hybridization of algorithms is also common. Modu et al. (2023) report hybrid algorithms that combine two or more techniques to exploit their strengths. Such as, enhanced DE + PSO has been used to size off-grid HRES with PV, wind, diesel, and battery components. Other works fuse swarm and evolutionary methods such as water-cycle + moth-flame for similar mixed-resource problems (Singh et al., 2023). Such hybrids often outperform single methods in convergence speed and solution quality, especially for the multi-objective or highly constrained HRES sizing problems.
On the energy management side, many recent studies propose rule-based, optimization-based or AI-based EMS for HRES. Reviews note that advanced EMS and optimization improve reliability and efficiency but are often evaluated on idealized simulations (Mekhilef et al., 2013). The need for EMS that coordinates multiple sources and storage to ensure uninterrupted supply while minimizing cost. Others have applied machine learning or reinforcement learning to microgrid control, reference (Pinthurat et al., 2024) explore RL for smart-home renewable energy management but adoption of RL in off-grid microgrid scheduling remains limited. While metaheuristic-based EMS techniques can optimize dispatch and resiliency, they often suffer from local minima and may not guarantee global optimality (Akter et al., 2024). Nonetheless, the trend is clear, integrating storage such as batteries, hydrogen, supercaps, and so on with renewables and using metaheuristic algorithms for sizing and dispatch has shown significant promise in many case studies.
A microgrid supplying EV chargers with a hybrid battery energy storage system (HBESS) composed of Li-ion, lead-acid, and second-life EV batteries https://www.mdpi.com/1996-1073/17/15/3631-:∼:text=unmet%20load%2C%20and%20the%20outputs,proportional%20to%20the%20degradation%20reduction (Khazali et al., 2024). They find that a properly sized HBESS can reduce overall system cost by over 20% compared to a pure Li-ion bank, and by moderating degradation extend the useful life of lead-acid units (Safi et al., 2021; Shaaban et al., 2019; Zhou et al., 2022). Researchers emphasize that hybridizing battery chemistries can yield better cost–life trade-offs: for example, Li-ion batteries, while more expensive, suffer less capacity fade than lead-acid https://www.mdpi.com/1996-1073/17/15/3631-:∼:text=Lead,renewables%20once%20they%20have%20come (Dhundhara et al., 2018). Other studies have proposed combining batteries with hydrogen or supercapacitors for large microgrids (Khazali et al., 2024; Modu et al., 2023). It is common to optimize battery charge/discharge schedules, depth-of-discharge limits, and capacity factors to account for aging and replacement costs in the objective function.
Literature review
Recent advances in smart energy management have demonstrated the potential of integrating distributed energy resources (DERs) and flexible appliance scheduling using bio-inspired optimization strategies (Sahoo et al., 2023a). For instance, the Cheetah Optimization Algorithm was applied to coordinate appliance-level dispatch and DER scheduling in mixed-use microgrids, showcasing significant cost and energy efficiency improvements (Thirumalai et al., 2025). Expanding this, the Improved Lyrebird Optimization technique has proven effective for multi-microgrid sectionalization and distributed generation scheduling, enabling cost-efficient islanded or semi-autonomous operation in decentralized grids (Nagarajan et al., 2025). Furthermore, combining price-elastic demand response models with swarm intelligence algorithms such as Greedy Rat Swarm Optimization has led to more economically and environmentally balanced microgrid dispatch frameworks (Singh et al., 2025a). In the domain of AI-integrated hybrid optimization, quantum particle swarm optimization (QPSO) was used to co-optimize cost and emission trade-offs in grid-connected microgrids, demonstrating strong convergence and robustness under uncertainty (Paul et al., 2025). On the infrastructure planning side, Agajie et al. (2025) compared PV-battery versus PV-fuel cell systems in academic laboratories, highlighting how techno-economic analyses can inform sustainability strategies for institutional energy systems. Similar investigations in rural India revealed the viability of off-grid hybrid systems by applying sensitivity-driven cost modeling (Kumar et al., 2025), while others proposed robust power-sharing schemes for autonomous microgrids leveraging hybrid energy sources like PV, wind, and biomass (Anitha et al., 2025). Integrating V2G (Vehicle-to-Grid) technology into smart villages was assessed by Nadimuthu et al. (2024), who found significant potential for renewable-dominant microgrids in rural India. For real-time control, Selvaraj et al. (2024) employed the Crow Search Algorithm to schedule DERs, improving microgrid resilience and operational cost. Load frequency stability in islanded urban microgrids was improved using 1PD-3DOF-PID controllers alongside mobile EV storage, further highlighting the control importance in high-penetration renewable environments (Davoudkhani et al., 2024a). In Douala, Cameroon, a regional techno-economic microgrid assessment revealed hybrid architectures as the most efficient choice for urban sustainability (Molu et al., 2024). On the multi-objective optimization front, advanced sine-cosine chaotic algorithms have been tailored to optimize real-time scheduling in microgrids with fluctuating generation (Karthik et al., 2024), while newer nature-inspired strategies such as mountaineering team-based optimization were introduced for frequency control in isolated renewable-powered microgrids (Davoudkhani et al., 2024b). More recently, crystal structure-based metaheuristics have been utilized to schedule energy and EV loads under uncertainty, achieving improved grid reliability and flexibility (Rajagopalan et al., 2024). Parallel efforts in machine learning-based forecasting of microgrid demand and generation are also maturing, enabling more accurate load shaping and power dispatch (Singh et al. et al., 2024b). Complementary studies have reviewed classical control schemes for microgrids (Kumar et al., 2023) and explored power quality enhancement through novel water-wave inspired optimization (Choudhury et al., 2023) and scaled-conjugate neural networks (Sahoo et al., 2023a). The control and stability of AC/DC hybrid microgrids are also being improved using robust fuzzy-based or multi-layer AI controllers (Abraham et al., 2023; Khosravi et al., 2023). Intelligent optimization of AC–DC conversion and droop control has emerged as a powerful technique for minimizing system cost while managing hybrid energy flows (Prasad et al., 2022). The role of EVs in demand-side flexibility has been reviewed extensively, with a focus on control, modeling, and market integration challenges (Mohanty et al., 2022). Other works model grid-connected microgrids’ sensitivity to hybrid configurations and explore load-voltage coordination for improved energy balancing (Dashtdar et al., 2022b; Sharma et al., 2022). Optimized microgrid operation using combined heat and power systems and hybrid storage continues to gain traction as shown in work by Abdalla et al. (2021). Design of smart residential EMS frameworks with reinforcement learning or fuzzy logic-based control further illustrates the growing role of intelligent optimization (Dashtdar et al., 2022a). Smart residential demand-side management (DSM) models now incorporate EV charging coordination and advanced optimization algorithms to balance user comfort, energy cost, and grid stability (Panda et al., 2025). The advent of blockchain in microgrid interoperability and demand response coordination is also shaping next-generation control frameworks (Singh et al., 2025b). Integration of AI and blockchain for EV charging networks and smart grids is emerging as a secure, scalable solution for managing distributed loads and transactions (Singh et al. et al., 2024b). Additionally, Cheetah-inspired optimization has been used to solve dynamic economic dispatch problems in integrated renewable systems (Nagarajan et al., 2024). Comprehensive surveys have consolidated insights on DSM strategies and market design frameworks to facilitate renewable penetration and load flexibility (Panda et al., 2023). Reviews on residential DSM models affirm the increasing interest in consumer-centric, AI-supported load-shaping strategies (Panda et al., 2022). On the control side, advanced sliding mode observers have enabled real-time maximum power point tracking in PV–battery systems (Dunna et al., 2024), and dipper-throated optimization has been used to fine-tune IoT-driven smart grid predictors (Alkanhel et al., 2024). Finally, recent reviews on blockchain-based energy systems (Ullah et al., 2024), arithmetic Harris Hawks optimization for DSM (Manzoor et al., 2023), and integrated techno-economic benchmarking of hybrid systems (Güven et al., 2025d) further validate the relevance of bio-inspired and AI-integrated optimization techniques in managing uncertainty and complexity in modern microgrids.
Green energy integration and policy context
The global shift toward renewable energy is being accelerated by climate commitments, carbon reduction targets, and the rising cost of fossil-fuel dependency. Policy frameworks worldwide emphasize the integration of green energy to achieve net-zero emissions and reduce greenhouse gas footprints. Recent reviews highlight that renewable–storage hybrids are critical for reliable, resilient, and environmentally sustainable power systems (Barakat et al., 2026). Furthermore, energy transition pathways increasingly stress the dual need for economic viability and policy-driven incentives, particularly in decentralized energy models (Güven and Rizk-Allah, 2025).
Advanced studies examine socio-economic and policy dimensions, including local adoption barriers and system-wide optimization for carbon-neutral grids (El-Khozondar et al., 2025; Güven and Rizk-Allah, 2025). Beyond macro-level frameworks, hybridized renewable energy systems (HRES) have emerged as key enablers of electrification in remote and underserved regions (El-Khozondar and El-batta, 2022). Sustainable integration requires both technical optimization and policy mechanisms to ensure adoption at scale, especially in developing countries where grid extension remains costly and impractical.
In addition to conventional hybrid battery configurations, recent research has emphasized the role of hydrogen storage and supercapacitors as complementary technologies within hybrid energy systems. Hydrogen-based storage provides long-term energy balancing capabilities, making it particularly suitable for addressing seasonal variability and ensuring resilience against extended renewable intermittency. In contrast, supercapacitors offer extremely high-power density and rapid charge discharge cycles, enabling them to smooth short-term fluctuations and enhance transient stability. Comparative studies highlight that hybridizing batteries with hydrogen storage can extend system autonomy and reduce reliance on oversized battery banks, while coupling batteries with supercapacitors improves efficiency in handling peak loads and short bursts of demand. Such hybrid architectures have been shown to improve both reliability and lifecycle performance of renewable-integrated systems (Imbayah et al., 2024; Khaleel et al., 2024). While the present study focuses on hybrid battery systems due to their commercial maturity and proven cost-effectiveness in off-grid applications, hydrogen and supercapacitor integration remains a promising avenue for future work, particularly in large-scale or high-resilience energy systems.
Energy system optimization approaches
Energy system optimization has evolved from deterministic to probabilistic and, more recently, to AI-driven approaches (Sahoo et al., 2023b; Singh et al., 2024a). Deterministic models rely on average load and resource conditions, often underestimating stochastic variability (Güven and Poyraz, 2021). Stochastic programming addresses some of these gaps by incorporating uncertainty into load and generation profiles (Güven and Mete, 2021). However, these techniques become computationally expensive when applied to large-scale hybrid systems.
In contrast, metaheuristic algorithms have gained prominence due to their robustness in handling multi-objective, nonlinear, and mixed-integer problems. Algorithms such as Particle Swarm Optimization (PSO), Genetic Algorithms (GA), Grey Wolf Optimizer (GWO), and Differential Evolution (DE) have been widely applied in HRES planning and dispatch (Güven and Samy, 2022; Güven et al., 2022). Comparative studies confirm that metaheuristics outperform deterministic methods in convergence speed, scalability, and adaptability under uncertain renewable outputs (Guven et al., 2024).
Hybrid metaheuristic approaches, which combine multiple algorithms, show further improvements in avoiding local minima and capturing global optima (Güven et al., 2023). For instance, PSO–GA and DE–PSO hybrids demonstrate superior performance in cost minimization and reliability enhancement for microgrids (Güven and Yücel, 2023). Emerging work emphasizes the need for adaptive parameter tuning to balance exploration and exploitation, thereby enhancing optimization performance (Güven, 2024a).
Hybrid battery energy storage systems (HBESS)
Storage technologies form the backbone of hybrid energy systems, ensuring supply–demand balance, reliability, and resilience. Traditional reliance on single-chemistry batteries, such as lead–acid or lithium-ion, often results in trade-offs between cost, efficiency, and degradation. HBESS integrate complementary chemistries to optimize lifecycle costs and performance. For instance, Ni–Fe batteries offer robustness and long cycle life, while Li-ion provides high energy density; their combination enables both short- and long-term stability (Güven, 2025).
Recent works highlight that HBESS can reduce total system cost by over 20% compared to single-chemistry setups, while extending overall system lifetime (Güven et al., 2023a; Nassar et al., 2025). Similarly, second-life EV batteries combined with conventional storage can reduce waste and improve sustainability outcomes (Nassar et al., 2024). Comparative frameworks demonstrate that integrating hydrogen storage or supercapacitors alongside batteries further enhances flexibility for large-scale microgrids (Güven and Mengi, 2023; Güven and Yücel, 2025).
In addition to technical benefits, HBESS also align with circular economy and sustainability goals by reusing existing storage resources (El-Khozondar et al., 2023, 2024). Nonetheless, gaps remain in modeling degradation mechanisms, cycle-aging impacts, and long-term operational strategies.
Artificial intelligence and reinforcement learning
Artificial intelligence (AI), particularly reinforcement learning (RL), is transforming energy system management. RL provides adaptive, real-time decision-making capabilities that outperform rule-based and deterministic approaches. Studies have demonstrated its applicability in microgrid dispatch, demand-side management (DSM), and energy storage scheduling (Güven et al., 2025b).
Recent works applying deep RL highlight improvements in system resilience, cost-effectiveness, and renewable utilization (Güven et al., 2024b, 2025c). For example, hybrid learning-based RL controllers effectively balance conflicting objectives in microgrid scheduling, achieving superior flexibility compared to metaheuristic-only approaches (Güven et al., 2025a). Furthermore, RL has been integrated with metaheuristic optimization to dynamically adjust control strategies, showing strong results in stochastic environments (Imbayah et al., 2025). However, despite promising results, real-world deployments remain scarce due to computational complexity, training requirements, and integration challenges.
Sustainability and emerging directions
Sustainability has become central to evaluating energy systems. Beyond minimizing cost, recent studies assess hybrid systems based on life-cycle cost analysis, CO2 reduction potential, and resilience indices (Al-Najjar et al., 2020). Research also highlights the role of advanced materials and eco-friendly design strategies in improving the long-term performance of energy infrastructures.
Emerging work emphasizes multi-objective optimization frameworks that jointly consider economics, technical performance, and sustainability. For instance, multi-criteria approaches applied to hybrid microgrids address both resilience and emissions while ensuring affordability (Güven, 2024b; Güven et al., 2024a). By combining advanced optimization and policy alignment, these studies set the foundation for next-generation resilient and sustainable energy systems.
Research gaps
Despite advancements, several limitations persist. Many works rely on simplified tools, omit degradation effects, or underutilize real-time energy management. While HBESS and AI methods demonstrate strong potential, practical deployment, and resilience-focused optimization remain underexplored. This motivates the present study, which integrates advanced metaheuristics, hybrid battery modeling, and adaptive control strategies into a unified techno-economic optimization framework. These include:
Limited exploration of hybrid storage architecture: Few studies optimize multiple battery chemistries or hybrid batteries and other storage in off-grid HRES. Most assume a single battery type or generic HBESS. Deterministic models: Many approaches use fixed demand and average conditions often via Homer or equivalent, without fully modeling stochastic renewable output and load variability. Single-objective focus: Most sizing problems minimize cost and levelized cost of energy (LCOE) only, without jointly considering reliability, emissions and other metrics. Multi-objective and risk-aware planning such as worst-case scenarios, resilience indices is relatively scarce. Reliance on simplified simulations: Several studies utilize commercial tools and offline optimizers that may not capture dynamic control aspects. Few works integrate real-time EMS strategies such as model predictive control and RL with optimization, especially under uncertainty. Incomplete accounting of degradation and costs: Battery aging and replacement costs are often omitted. Similarly, auxiliary costs such as inverter losses, maintenance and power quality issues are usually simplified.
To overcome the above limitations this study proposes a comprehensive techno-economic optimization framework is developed to design an off-grid HRES integrated with a HBESS. The proposed system architecture includes PV, wind turbine, biomass generator, diesel generator, and two complementary battery types such as lithium-ion (Li-ion) and nickel-iron (Ni‒Fe) configured to enhance system performance, extend battery lifetime, and reduce lifecycle costs. To accurately capture the dynamic behavior and aging effects of storage components, detailed mathematical models are formulated for each energy generation and storage unit. These models include temperature-adjusted PV output, Weibull-distributed wind energy estimation, calorific biomass conversion, and state-of-charge (SOC)-dependent battery degradation mechanisms. The models are further integrated into an hourly dispatch framework that balances demand and generation under variable renewable output.
To solve the resulting nonlinear, multi-dimensional, mixed-integer optimization problem, we apply and benchmark a suite of metaheuristic algorithms, including PSO, GA, GWO, and hybrid variants. These algorithms are enhanced with adaptive parameter tuning to improve convergence and avoid local minima. The objective of optimization is to minimize the total life cycle cost (LCC) while satisfying load demand, storage constraints, and supply reliability criteria such as loss of power supply probability (LPSP). The entire system is simulated using realistic meteorological and demand datasets over a one-year horizon. The effectiveness of various HBESS configurations is evaluated in terms of net present cost (NPC), levelized cost of energy (LCOE), reliability indices, and CO2 emission metrics. Sensitivity analyses are also conducted to examine the impact of cost fluctuations and resource variability. The key contributions are:
Unlike conventional dual-battery studies that primarily address charge–discharge scheduling or short-term cost reduction, this work integrates multi-chemistry hybrid batteries with explicit degradation modeling and lifecycle cost assessment, ensuring long-term operational sustainability. Incorporating detailed cost, efficiency, and degradation models for each storage component including Li-ion and Ni–Fe batteries and all generation units. In contrast to existing AI-based EMS approaches that rely on reinforcement learning or heuristic optimization in isolation, this study introduces a dual-layer optimization strategy that combines RL with adaptive metaheuristics, improving robustness and adaptability under uncertain renewable generation. By integrating degradation-aware storage modeling with advanced optimization and control, the proposed framework enhances resilience, reliability, and sustainability in off-grid HRES beyond what is achieved by traditional dual-battery or single-method EMS designs.
The remainder of this paper is organized as follows. Methodology section outlines the proposed methodology, including the mathematical modeling of the hybrid energy system, the formulation of the optimization problem, and the implementation of advanced metaheuristic algorithms. Case Study section presents the case study, detailing the site-specific parameters, input data, and scenario configurations used to evaluate system performance. Finally, Conclusion section summarizes the key findings, highlights the practical implications of the results, and discusses potential directions for future research.
Methodology
The mathematical model presented herein aims to rigorously describe the dynamic behavior and techno-economic performance of an IHRES configured with hybrid battery energy storage. Each subsystem including energy generation, storage, load balancing, cost estimation, and intelligent dispatch is represented by well-established and novel equations. These models capture real-world variability in resources, component degradation, and control strategy complexity, forming the foundation for subsequent optimization and reinforcement learning-based control schemes as shown in Figure 1.

Research framework.
Renewable energy system modeling
PV power generation
The power output of a PV panel depends on its efficiency, area, and the solar irradiance received. To capture the impact of temperature on PV efficiency, this model includes a correction factor based on cell temperature such as:
Wind power generation
Wind energy modeling begins with characterizing wind speed distribution using a Weibull probability density function. The shape and scale parameters define the wind profile over time. This statistical model is foundational for estimating expected turbine output under stochastic wind conditions will be:
Biomass generator power
Biomass power output is modeled as a product of the generator's thermal efficiency, biomass calorific value, and feedstock flow rate. The conversion of bioenergy into usable electrical energy is captured based on fuel quality and availability.
Diesel generator power
The diesel generator's fuel consumption is approximated by a second-order polynomial function of output power. This reflects real-world performance, where efficiency varies non-linearly with the load factor.
Hybrid battery modeling
The hybrid battery bank, comprising Lithium-Ion and Nickel-Iron units, is modeled to capture dynamic charging, discharging, and degradation processes. Mathematical expressions detail the evolution of SOC, voltage characteristics, depth of discharge (DoD), and degradation-induced cost. This level of modeling ensures that battery behavior under different dispatch scenarios is realistically represented. The SOC of each battery evolves based on charging and discharging power flows. The govern energy conservation within the storage system and account for charge/discharge efficiency will be:
Load balance and energy flow
A fundamental constraint of the microgrid is that supply must meet demand at each time step t. This ensures that the sum of generation and discharge equals total consumption, including water pumping and curtailment is:
Objective optimization
LCC is the total cost incurred over the system's lifetime, discounted to present value. It includes capital, operational, fuel, and replacement expenses. It serves as the economic objective in the optimization framework is:
Reinforcement learning (DQN)
The Deep Q-Network (DQN) agent observes the system via a state vector that includes SOCs, generation, load, and time. These inputs allow the neural network to learn an optimal policy under diverse scenarios:
Sensitivity analysis
To test robustness, input parameters such as cost and fuel price are perturbed is:
Metaheuristic algorithm hyperparameters and configuration settings.
Table 2 consolidates all the essential parameters used in the proposed microgrid optimization framework. It brings together technical specifications of renewable generation units (PV, wind, biomass, and diesel generators), hybrid battery system characteristics (Li-ion and Ni‒Fe technologies), as well as economic assumptions such as capital costs, O&M costs, replacement costs, and discount rate. In addition, system-level constraints related to reliability, renewable energy penetration, and DSM scenarios are summarized. Presenting these parameters in a single table improves readability, ensures transparency, and facilitates reproducibility of the study by clearly outlining the baseline assumptions and optimization boundaries.
Summary of key technical, economic, and system parameters adopted in the microgrid optimization model.
Table 3 summarizes the architecture and training parameters of the DQN employed in this study. The network consists of a five-dimensional input layer representing the state variables, two fully connected hidden layers with 128 and 64 neurons respectively using ReLU activation to capture nonlinear dynamics, and an output layer with five nodes corresponding to the discrete action space. The training was performed using the Adam optimizer with a learning rate of 0.001 and a discount factor of 0.95 to balance short- and long-term rewards. An ε-greedy strategy was adopted to ensure exploration during early training, with ε decaying gradually from 1.0 to 0.01. To stabilize learning, a replay memory buffer of 10,000 experiences with a batch size of 64 was used, and a target network was updated every 200 steps. Training was carried out for 5000 episodes using the PyTorch framework, with convergence observed in terms of stable reward progression and Q-value approximation.
DQN architecture and training parameters.
Case study
The case study is based on experimental battery datasets under different representative operating temperatures. These datasets, derived from laboratory characterization, capture key resistance and capacitance parameters as a function of SOC. Unlike meteorological case studies tied to a specific geographic region, this study focuses on the battery performance domain, making the findings broadly applicable to diverse microgrid and renewable integration contexts. The baseline microgrid system shown in Figure 2 operates under a conventional rule-based control strategy. Renewable power from the PV array and wind turbine is first directed to the energy system bus and given priority in meeting the critical load. A single lithium-ion battery provides storage, managed by a rule-based controller that enforces SOC limits between

Baseline microgrid system configuration with rule-based control.
Figure 3 shows the hourly energy balance of an IHRES under three different DSM scenarios such as Base, Moderate DSM, and Aggressive DSM. In the Base Scenario, the peak PV output aligns with the midday load demand, resulting in significant battery charging activity. During off-peak PV hours such as early morning and evening, battery discharge compensates for the energy shortfall. Biomass and wind provide consistent support throughout the day, while diesel usage remains minimal, highlighting the system's ability to rely primarily on renewables. With Moderate DSM, a temporal load-shifting mechanism is applied, effectively flattening the load curve

Energy balance analysis.
Figure 4 shows the system's performance under an extreme event stress test combining calm wind, cloudy solar conditions, and a mid-episode storm outage. The electrical load, which maintains a strong daily cycle around 400–480 kW. Renewable generation from wind and solar is visibly suppressed compared to typical variability, with wind output frequently near zero and solar attenuated by heavy cloud cover. During the storm period such as days 5–6, both wind and solar generation are drastically curtailed, highlighting the severity of the stress scenario. To maintain supply, diesel generation ramps up significantly during renewable deficits, demonstrating its role as a backup resource. Meanwhile, the SOC battery fluctuates between 35% and 90%, but crucially remains above the reserve threshold, indicating that the energy management system enforces a stricter SOC floor under stress. The combined effect shows that despite prolonged renewable scarcity and a storm-induced outage, the system preserves reliability by intelligently coordinating storage and diesel dispatch.

Dispatch and SOC during a calm and cloudy stress scenario with storm outage, showing diesel backup and SOC reserves maintain reliability under prolonged renewable scarcity.
Figure 5 shows the representative load scenarios used to evaluate the robustness of the proposed EMS. Figure 5(a) presents high-demand event profiles, including heatwave, cold snap, and festival/recovery days. These scenarios introduce significant deviations from normal demand, such as sharp midday and evening peaks during heatwaves, extended high evening loads under cold snaps, and atypical multi-peak structures during festival days. The DSM variants (Base, Moderate, Aggressive) demonstrate a progressive reduction in peak magnitude, with load shifted into PV-rich midday hours, thereby alleviating stress on evening demand. Figure 5 (b) compares weekday and weekend patterns, where weekday loads exhibit stronger evening peaks associated with workday return, while weekend profiles show flatter daytime consumption. Again, DSM application reshapes the profiles by reducing evening peaks and distributing demand more evenly across the day. It highlights the variability in demand conditions under which the EMS is tested, as well as the effectiveness of DSM in mitigating peak stresses and enhancing operational flexibility.

Representative load scenarios under DSM: (a) high-demand events and (b) weekly patterns.
Figure 6 shows the SOC profiles of Li-ion and Ni‒Fe batteries over one year, comparing conventional static dispatch strategies against a DQN-based RL approach. In the upper subplot, the Li-ion battery SOC under static dispatch exhibits aggressive oscillations, regularly approaching both the upper and lower bounds of allowable SOC of 0.2 to 0.9. This behavior leads to frequent deep cycling, which is known to accelerate capacity fade and increase the battery degradation cost

SOC profile Li-ion and Ni‒Fe.
Figure 7 shows the convergence trajectories of four metaheuristic optimization algorithms such as SSA, GA, PSO, and GWO, based on the normalized objective function value

Convergence curves of optimization algorithms.
Figure 8 shows the multi-objective optimization defined by LCOE, CO2 emissions, and LPSP. In analyzing the Pareto optimization results, it is evident that trade-offs emerge between economic and environmental objectives. Solutions that minimize LCOE tend to rely more heavily on dispatchable backup sources, which can lead to higher CO2 emissions. Conversely, solutions emphasizing emission reduction often necessitate larger renewable generation and storage capacities, which increase capital investment and slightly raise LCOE. The Pareto fronts illustrate these dynamics, where SSA-generated solutions cluster toward balanced outcomes with relatively low LCOE (0.32–0.35 USD/kWh) and minimal LPSP (<0.01), while GA and PSO solutions show greater dispersion and in some cases higher emissions (Makhzom et al., 2023). Similarly, the LCOE-emission relationship highlights that deeper renewable penetration not only reduces emissions but also stabilizes long-term system costs by distributing expenditures across a larger share of clean energy generation. These results underscore the practical value of multi-objective optimization: decision-makers can select solutions along the Pareto frontier depending on whether cost minimization, emission reduction, or system reliability is prioritized. This aligns with recent studies that emphasize Pareto-based energy planning as a tool for balancing affordability, sustainability, and resilience in renewable-dominated systems.

Multi-objective optimization.
Table 4 shows a comparative view of the scalability characteristics of the investigated algorithms. The results indicate that population-based heuristics such as SSA and PSO demonstrate rapid convergence for small- to medium-scale problems and maintain relatively low training overhead. However, their computational burden increases as the number of decision variables and the problem dimensionality expand, which limits their efficiency when applied to larger microgrids or extended horizons. GA shows moderate convergence but suffers from the steepest increase in execution time with problem size, making it the least efficient option for high-dimensional planning problems. In contrast, DQN emerges as the most scalable framework, although it requires significant training effort and computational resources upfront, its post-training performance is superior across multiple dimensions. Specifically, DQN exhibits very high convergence reliability, manageable execution time growth, and excellent real-time suitability, as decision-making during inference is nearly instantaneous and independent of microgrid size. This distinction underscores a key trade-off, while classical heuristics may be adequate for short-term or localized optimization tasks, reinforcement learning–based methods such as DQN provide a more robust and future-ready solution for large-scale, long-horizon microgrid applications.
Comparative scalability assessment of optimization algorithms.
Figure 9 shows the impact of increasing system complexity on the solution quality of the tested metaheuristic optimizers. The left panel shows performance trends when the number of microgrid clusters is scaled up to three times as much as the baseline configuration, while the right panel presents results for extended planning horizons of up to three years. For larger cluster sizes, SSA exhibits a steady increase in relative cost compared to the baseline, indicating that its solution quality deteriorates as the system expands. PSO demonstrates stronger sensitivity, with a sharp decline at twofold scaling before partial recovery at threefold scaling, suggesting instability under larger search spaces. In contrast, GA maintains near-constant solution quality across all cluster scales, reflecting greater robustness. When the horizon length is extended, SSA shows a consistent degradation in performance, while PSO maintains quality up to two years but drops significantly at three years. GA again proves most stable, with solution values remaining close to the baseline throughout. These results highlight the trade-off between exploration ability and computational stability: SSA and PSO may offer stronger performance in smaller systems but face challenges in long-horizon or larger-scale optimization, whereas GA demonstrates more consistent scalability at the cost of slower convergence in some cases.

Normalized solution quality of SSA, PSO, and GA under scalability tests: (a) variation with cluster scale (×baseline) and (b) Variation with planning horizon (years).
Figure 10 shows the execution time scalability of the metaheuristic optimizers when applied to increasingly complex microgrid configurations. In the left panel, execution times are plotted against cluster scaling, while the right panel shows runtime growth as the planning horizon is extended to three years. For cluster scaling, SSA and PSO exhibit a relatively smooth and near-linear increase in runtime, reflecting predictable growth in computational demand as the search space expands. GA, however, shows a more irregular trend, with execution time peaking sharply at twofold scaling before decreasing at threefold scaling, which may be attributed to premature convergence or reduced exploration effort at higher scales. When considering horizon length, all algorithms show a clear upward trend, with SSA and PSO experiencing almost linear growth in runtime as simulation length increases, reaching close to 4 s at a 3-year horizon. GA, in contrast, demonstrates a flatter curve beyond two years, suggesting greater efficiency in handling long temporal horizons, albeit with the earlier noted risks of solution quality variability. These results confirm that SSA and PSO maintain consistency in runtime scaling but at higher computational costs, while GA offers efficiency advantages for extended horizons but may trade off robustness in larger system sizes.

Execution time scalability of SSA, PSO, and GA under extended problem sizes: (a) execution time versus cluster scale (× baseline) and (b) execution time versus planning horizon (years).
Figure 11 shows the comparison of five optimization algorithms such as GA, PSO, SSA, GWO, and DQN across three key normalized performance metrics such as Execution Time, Solution Stability (

Execution time and stability comparison.
Figure 12 shows a Q-value heatmap representing the action policy of a DQN agent in a HBESS. At low SOC < 0.4, charging is dominant across nearly all load levels, reflecting the DQN's learned priority to preserve battery reserve and avoid deep cycling. While at high SOC > 0.8 and low to moderate loads < 1.5 kW, discharging becomes prevalent, optimizing energy availability and reducing diesel use. The idle state is scattered and appears primarily in mid-range SOC ∼ 0.5–0.7, indicating energy equilibrium where supply meets demand without battery involvement. This Q-policy shows the agent's nuanced control logic shaped by long-term rewards, including minimizing degradation cost, emissions, and LCOE, while ensuring energy sufficiency. The heatmap effectively confirms that the DQN agent has learned state-aware energy management strategies that balance charging and discharging in response to both system condition and external demand.

Q-value heatmap (this figure was generated by the authors using Python 3.11.4 with Matplotlib v3.7.1 and Seaborn v0.12.2.).
Figure 13 shows the normalized Q-value distribution

Q-value heatmap for charging (This figure was generated by the authors using Python 3.11.4 with Matplotlib v3.7.1 and Seaborn v0.12.2.).
Figure 14 shows the Q-value map for the discharging action, where each cell encodes the normalized Q-value

Q-value heatmap for discharging (this figure was generated by the authors using Python 3.11.4 with Matplotlib v3.7.1 and Seaborn v0.12.2.).
Figure 15 quantifies the sensitivity of the LCOE to key system parameters using a univariate perturbation method. The most sensitive parameter is diesel price, where a

Sensitivity of LCOE to key parameters.
Figure 16 shows the sensitivity of the LPSP to ±20% variations in key techno-economic parameters. The base-case LPSP is 0.008, indicating a highly reliable system. Changes in diesel price and interest rate have negligible influence, with sensitivity indices of 0.031 and 0.000, respectively. By contrast, PV capital cost shows a strong negative sensitivity of

Sensitivity of LPSP to ±20% variations in key system parameters.
Figure 17 shows the sensitivity of battery lifetime to ±20% variations in system parameters. The base-case battery life is 7.6 years. Diesel price and interest rate have negligible influence, with sensitivity indices of −0.033 and 0.000, respectively. PV capital cost exhibits a positive effect of

Sensitivity of battery lifetime to ±20% variations in key system parameters.
Figure 18 shows a correlation matrix between four renewable energy sources such as PV, Wind, Biomass, and Diesel. The values within the matrix represent the correlation coefficients between pairs of these energy sources, ranging from −1 to 1. The diagonal elements, which compare each energy source to itself, are all 1, as they are perfectly correlated. The correlation between PV and Wind is −0.002353, indicating a very weak negative correlation, suggesting that changes in one have a negligible inverse relationship with the other. On the other hand, Biomass and Diesel show a correlation of 0.01264, indicating a very weak positive correlation, meaning that their outputs have a slight tendency to move in the same direction. The color scale, from cyan low correlation to magenta high correlation, visually represents the strength of these relationships. This matrix provides an overview of how independently these renewable sources behave relative to each other, with no strong correlations observed, implying that they can be used complementarily in an energy system.

Renewable resource complementarity.
Figure 19 shows the sessional profile of renewable resources across four seasons of the year. In each season, energy generation from the four sources is plotted for day and night periods, with the day highlighted in red and the night in blue. During Spring and Summer, PV generation peaks in the midday hours, reaching over 300 kWh, reflecting the high solar irradiance during these seasons. Wind energy shows variability, with higher production during the afternoon, especially in Spring and Summer, where it reaches approximately 150 kWh. Biomass provides a stable contribution across the day, generally ranging between 50 and 100 kWh. Diesel generation remains minimal across all seasons, with energy production typically below 50 kWh, indicating limited use during the day. In Autumn and Winter, PV output decreases significantly due to shorter daylight hours, while Wind generation becomes more prominent, especially in Winter, with values peaking at about 200 kWh. Biomass and Diesel contributions remain consistent across seasons, providing steady energy supply when renewable sources are insufficient.

Sessional profile of renewable resource.
Figure 20 shows a radar chart comparing the impacts of key system parameters on LCOE and CO2 emissions. The axes represent Battery Cost, Interest Rate, PV Efficiency, DSM Level, and Diesel Price, with each point indicating the normalized impact of each parameter on the respective objective. Diesel price exhibits the largest influence on both LCOE and CO2, with LCOE impact peaking at 0.6 and CO2 impact at 0.55. Battery cost and interest rate also show significant impacts on LCOE but have a lower effect on CO2 emissions, with values approaching 0.3 and 0.2, respectively. The DSM level and PV efficiency contribute the least to both LCOE and CO2 impacts, registering values close to 0 in both metrics.

Parameter impact on LCOE and CO2.
Figure 21 compares the performance of two EMS such as DQN and Rule-Based EMS, using a radar chart. The chart measures five key metrics: Degradation (Battery Cost), Runtime (Seconds), Emissions (CO2), Cost Reduction (%), and Reliability (LPSP). In terms of battery cost degradation, the DQN shows significantly better performance, with a value of approximately 0.2 compared to the Rule-Based EMS, which has a value closer to 0.8. Similarly, the emissions of CO2 for DQN are notably lower, approaching 0.2, while Rule-Based EMS has emissions around 0.6. Regarding cost reduction, DQN achieves a higher reduction of about 0.8, whereas Rule-Based EMS performs around 0.4. DQN also shows better reliability, with LPSP closer to 0.1, compared to Rule-Based EMS, which has an LPSP closer to 0.5. However, Rule-Based EMS has a shorter runtime, closer to 0.1 s, compared to DQN, which takes around 0.6 s.

Comparison of DQN versus Rule-Based EMS Performance.
Figure 22 shows the impact of increasing the number of village clusters on both LCOE and CO2 emissions. As the number of clusters rises from 1 to 10, the LCOE decreases from approximately 1.8 USD/kWh to 1.3 USD/kWh, reflecting the economies of scale effect. This is because the system's fixed costs, such as capital and operational expenditures, are distributed over a larger energy output, lowering the per-unit energy cost. Simultaneously, CO2 emissions decrease from around 0.36 to 0.3 kg/kWh, driven by the higher renewable energy penetration in the system as more clusters are integrated. The reduction in emissions indicates the positive environmental impact of expanding renewable capacity and reducing reliance on fossil fuels, showcasing the benefits of system aggregation in terms of both economic and environmental performance.

LCOE and CO2.
Figure 23 shows the degradation timeline for HBES, focusing on both capacity fade and degradation costs over a 10-year period. The capacity fades for Li-ion batteries, which increases linearly over time, starting at around 1% and reaching about 6% by year 10. While the capacity fade of Ni‒Fe batteries, which also increases, but at a higher rate, reaching about 7% by year 10. Alongside this, the degradation costs for each battery type. Li-ion degradation costs increase steadily, starting at around 100 USD and reaching over 700 USD by year 10, while Ni‒Fe degradation costs follow a similar trend, but start higher and grow at a slightly steeper rate, reaching almost 750 USD by year 10.

Degradation timeline for Li-ion versus Ni‒Fe.
Figure 24 shows the LCOE surface plotted against Li-ion and Ni‒Fe battery sizes. The figure clearly demonstrates that the LCOE increases with both battery sizes, showing a non-linear rise as the capacity of either battery increases. At lower battery sizes ∼50 kWh, the LCOE is relatively low but increases sharply as the size of the batteries grows beyond 100 kWh. There are economies of scale at smaller battery sizes, but larger capacities result in diminishing returns with respect to cost-efficiency. The color bar indicates that the LCOE varies from 1.0 to 5.5 USD/kWh, with the highest values occurring for larger Li-ion and Ni‒Fe battery configurations. While this surface plot visually captures the trade-off between battery size and the overall cost of energy, with a clear indication that there is an optimal region where the combination of Li-ion and Ni‒Fe sizes can minimize costs. The non-linear relationship signifies that merely increasing battery capacity without optimizing the mix may result in higher LCOE, highlighting the importance of carefully considering the trade-offs between the types and sizes of energy storage used in hybrid energy systems.

LCOE surface versus Battery Sizes.
To quantify the robustness of the optimized hybrid microgrid, a Monte Carlo simulation with 1000 realizations was conducted. As shown in Figure 25, demonstrate narrow distributions for economic indicators and acceptable spread for environmental and reliability metrics. The LCOE exhibits a mean of 0.0555 USD/kWh with a 95% confidence interval (CI) of [0.0553–0.0557 USD/kWh], indicating that economic performance remains stable despite uncertainties. Similarly, the LCC centers around 0.908 million USD, with a 95% CI of [0.907–0.910 million USD], confirming low financial risk. For environmental performance, the annual CO2 emissions show a wider spread, with values ranging from 0.2 to 1.0 tons/year and an average of 0.42 tons/year, reflecting the influence of diesel generator usage during unfavorable renewable conditions. In terms of reliability, the LPSP distribution remains tightly clustered below 0.005, with a mean of 0.0015. This implies that even under extreme variability, the probability of unmet load remains below 0.5%, ensuring system resilience. The Monte Carlo analysis confirms that the dual-battery EMS strategy provides robust economic viability, environmental sustainability, and high reliability, directly addressing the reviewer's concern about comprehensive uncertainty and resilience evaluation.

Monte Carlo distributions of LCOE, LCC, CO2 emissions, and reliability for 1000 stochastic realizations.
Figure 26 shows the battery degradation cost heatmap as a function of Li-ion and Ni‒Fe battery sizes. The color gradient, ranging from black to yellow, represents the costs associated with battery degradation, which are a function of battery cycling and DoD. As the Li-ion battery size increases along the x-axis and the Ni‒Fe battery size increases along the y-axis, the degradation cost steadily rises, especially in the top-right corners of the heatmap, where the larger battery configurations are located. The highest degradation costs occur at the upper-end of the battery size spectrum such as Li-ion size ∼160 kWh and Ni‒Fe size ∼160 kWh, where the degradation cost exceeds 930 USD. This suggests that increasing battery size without a balanced approach may significantly impact long-term maintenance and replacement costs due to higher cycling rates and deeper discharges. The lower-end configurations such as Li-ion ∼40 kWh, Ni‒Fe ∼40 kWh show lower degradation costs ∼840–870 USD, indicating that smaller batteries with more frequent cycling may have reduced wear-and-tear due to less capacity being utilized per cycle.

Battery degradation cost heatmap (this figure was generated by the authors using Python 3.11.4 with Matplotlib v3.7.1 and Seaborn v0.12.2.).
Figure 27 compares the battery discharge and total operating cost across three different load profiles such as Base Load, Moderate DSM, and Aggressive DSM. The plot shows that as DSM becomes more aggressive, both battery discharge and operating costs decrease. For the Base Load scenario, the battery discharges approximately 2.275 kWh, with a total operating cost of 0.114 USD. Under Moderate DSM, the battery discharge reduces to 0.915 kWh, with the total cost decreasing to 0.046 USD, indicating that load shifting during non-peak hours reduces battery usage and operational costs significantly. The Aggressive DSM scenario further minimizes battery discharge to 0.202 kWh, with the total operating cost dropping to 0.010 USD, demonstrating the effectiveness of aggressive load shifting to optimize battery usage and system costs. This clear reduction in both battery discharge and costs with DSM implementation emphasizes the potential for DSM strategies to reduce battery degradation and overall system costs, improving the financial and operational efficiency of hybrid energy systems.

Battery usage versus operating cost.
Figure 28 shows the relationship between battery discharge and CO2 emissions under three distinct DSM strategies such as Base Load, Moderate DSM, and Aggressive DSM. The graph shows that as DSM strategies become more aggressive, both battery discharge and CO2 emissions significantly decrease. In the Base Load scenario, the system discharges approximately 2.5 kWh from the battery, resulting in 12 kg of CO2 emissions. When implementing Moderate DSM, battery discharge drops to 1.5 kWh, and CO2 emissions reduce to 7 kg, reflecting the efficiency gains achieved through shifting load to periods with lower demand. Finally, in the Aggressive DSM scenario, battery discharge is further reduced to 0.25 kWh, and CO2 emissions drop to 2 kg, demonstrating the effectiveness of more aggressive load shifting in reducing both battery usage and emissions. This inverse relationship highlights how DSM strategies, particularly the aggressive ones, optimize energy consumption, reduce reliance on battery discharge, and, as a result, minimize CO2 emissions, making the energy system more environmentally sustainable and cost-effective.

Battery usage versus emissions across DSM strategies.
Figure 29 shows the comparison of CO2 emissions intensity for three different battery dispatch strategies such as LF, CC, and DQN. The chart shows the CO2 emissions for each strategy, with DQN showing the lowest emissions at

Conclusion
This study presents a robust techno-economic framework for optimizing the design and operation of off-grid HRES integrated with a HBESS and a reinforcement learning-based energy management strategy utilizing DQN. The system architecture incorporates multiple energy sources, including PV, Wind Turbine, Biomass Generator, and Diesel Generator, complemented by Li-ion and Ni‒Fe batteries. By applying metaheuristic optimization algorithms such as SSA, PSO, GA, GWO and DQN, the model optimizes both component sizing and operational scheduling. The results indicate that the DQN-based energy management system reduces the total LCC by more than 20% compared to conventional strategies, while maintaining a high level of reliability with a LPSP of less than 0.01. The integration of DQN enhances the system's ability to adapt to variations in resource availability and load, improving both operational efficiency and environmental sustainability. Sensitivity analyses confirm that the model is resilient to variations in key parameters, such as fuel costs and resource availability, with diesel price showing the most significant effect on LCOE. The proposed system also minimizes CO2 emissions by optimizing the use of renewable energy sources, with a reduction of emissions by up to 30% under varying operational conditions. These findings emphasize the practical benefits of combining hybrid storage with intelligent reinforcement learning for optimizing the performance and sustainability of off-grid microgrids. Future work will focus on refining the DQN strategy through hardware-in-the-loop validation and exploring DRL for real-time deployment.
Future work
While the present study establishes a robust techno-economic framework for HRES integrated with hybrid battery energy storage and reinforcement learning-based control, several avenues remain open for further research:
Hardware-in-the-loop validation and pilot deployment: Future work will focus on validating the proposed framework using hardware-in-the-loop platforms and real-world pilot-scale microgrids. This will allow testing under realistic conditions, including inverter derating in high-temperature environments, generator run-time constraints, and long-term battery self-discharge. Computational scalability: Applying the optimization framework to larger microgrid topologies, extended horizons, and multi-year datasets will help evaluate computational efficiency at scale. Parallelized and distributed implementations will also be explored to enhance convergence for real-time operation. Advanced AI and hybrid learning approaches: The integration of more advanced deep reinforcement learning (DRL) methods such as actor critic architectures, proximal policy optimization (PPO), and multi-agent reinforcement learning (MARL) will be investigated. Coupling DRL with physics-informed neural networks (PINNs) could further improve interpretability and robustness by embedding physical laws into learning. Multi-objective optimization: Expanding beyond cost minimization, future studies will incorporate sustainability metrics, resilience indices, and risk-aware planning to balance economics, technical performance, and environmental impacts under uncertainty. Emerging storage technologies and circular economy: Future frameworks will consider second-life EV batteries, supercapacitors, and hydrogen storage alongside hybrid battery systems. These not only enhance flexibility and lifetime but also align with circular economy principles and policy-driven sustainability targets.
Footnotes
Author contributions
Wajid Khan, Feng Renhai, and Abdul Aziz contributed to conceptualization, methodology, software, visualization, investigation, and writing‒original draft preparation. Muhammad Zain Yousaf, Zhi Cai, and Muhammad Umair Iqbal contributed to data curation, validation, supervision, resources, and writing‒review and editing. Jiang Wang, Mustafa Abdullah, and Mebratu Sintie Geremew contributed to project administration, supervision, resources, and writing‒review and editing.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability
Availability of data and materials datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.
