Sage Journals: Discover world-class research

Abstract

Meeting the growing global electricity demand in remote and off-grid regions requires cost-effective and reliable power solutions that overcome the intermittency of renewable energy sources. This paper presents a comprehensive techno-economic optimization framework for the design and operation of off-grid hybrid renewable energy systems (HRES) integrating photovoltaic (PV), wind turbine, biomass generator, diesel backup, and a dual-chemistry hybrid battery energy storage system (HBESS) combining lithium-ion and nickel-iron batteries. A detailed mathematical modeling approach is employed to capture the nonlinear dynamics, stochastic renewable behavior, battery degradation, and temperature-adjusted component efficiencies. The system is formulated as a multi-objective mixed-integer nonlinear programming problem targeting the minimization of life cycle cost (LCC), levelized cost of energy (LCOE), and CO₂ emissions while satisfying reliability constraints such as loss of power supply probability (LPSP < 0.01). To solve the optimization problem, advanced metaheuristic algorithms—Particle Swarm Optimization (PSO), Genetic Algorithms (GA), Grey Wolf Optimizer (GWO), and Differential Evolution (DE), and Salp Swarm Algorithm (SSA)—and a Deep Q-Network (DQN)-based reinforcement learning energy management strategy are implemented and benchmarked. The proposed DQN-based controller demonstrates superior performance over conventional rule-based and static dispatch methods by maintaining more stable battery state-of-charge (SOC) profiles, reducing degradation, and enabling intelligent real-time decision-making. Simulation results based on realistic meteorological and demand profiles reveal that the integrated DQN and HBESS strategy reduces total LCC by over 20%, CO₂ emissions by up to 30%, and battery degradation costs by over 10% compared to baseline systems. The Salp Swarm Algorithm (SSA) achieves the fastest convergence and the highest-quality Pareto-optimal solutions among all metaheuristics evaluated. Sensitivity analysis identifies diesel price and interest rate as the most influential parameters on LCOE, while load shifting through aggressive demand-side management further minimizes battery usage, operating costs, and emissions. The proposed framework not only addresses key challenges in off-grid microgrid design but also provides a scalable and robust pathway for sustainable rural electrification using hybrid storage and intelligent control.

Keywords

Deep reinforcement learning demand side management dual-battery storage energy management system hybrid renewable energy system metaheuristic optimization microgrid control off-grid electrification

Introduction

The growing global energy demand and climate goals are driving a rapid transition to renewable power. According to the International Energy Agency, world energy consumption grew at an above-average pace in 2024, with electricity demand rising nearly twice as fast as total demand https://www.iea.org/reports/global-energy-review-2025-:∼:text=The%20latest%20data%20show%20that,data%20centres%20and%20artificial%20intelligence. This surge is fueled by higher loads for cooling, industry, transport electrification, and data centers (IEA, 2025). Meeting this demand sustainably requires ramping up low-carbon generation. Many governments have pledged net-zero emissions by 2050, which implies dramatically higher shares of solar, wind, and other renewables in power grids (Dong et al., 2022; Pablo-Romero et al., 2023; Safi et al., 2021). As Pinthurat et al. (2024) explain, rising GHG emissions have motivated adoption of renewables to achieve a more stable, sustainable energy landscape. Likewise, Güven and Samy (2022) note that decades of economic growth have inevitably accelerated the transition to renewable energy use worldwide” to meet fast-growing demand. The energy transition and growing load requirements make hybrid systems of multiple renewables and storage a strategic necessity.

Remote and off-grid regions especially benefit from hybrid renewable energy systems (HRES). Over 1.6 billion people lack electricity access, 80% of them in rural Asia and Africa (Mohammed et al., 2014). Extending centralized grids is often impractical or too costly in such areas (Fara et al., 2006). Hybrid microgrids combining solar, wind, biomass, or hydro with energy storage—offer an affordable alternative. Integrating complementary resources can eliminate the variability of any one source. On an isolated DC or AC bus a photovoltaic (PV) array, wind turbines, and batteries can jointly supply power when the grid is unavailable (Valenciaga and Puleston, 2005). Afgan and Carvalho (2008) show that HRES can deliver highly cost-saving performance compared to single-source systems. Indeed, techno-economic analyses find hybrid power plants such as solar–biomass, PV–wind–diesel hybrids often outperform diesel gensets and standalone renewables in life-cycle cost, reliability, and emissions (Infield, 1994). Thus, HRES are increasingly viewed as a key solution for rural electrification and small-scale microgrids.

Integrating variable renewables with storage in off-grid microgrids poses major technical and economic challenges. The fundamental issue is intermittency, solar, wind, and biomass outputs fluctuate unpredictably, making demand–supply balance difficult (Olatomiwa et al., 2016; Ramesh and Saini, 2020). To compensate, diverse sources and energy storage must be added but this greatly increases system complexity and cost. The intermittent nature of most renewable resources and increased capital expenses drive the design of HRES with storage and complementary sources to achieve a reliable, cost-effective supply (Güven and Samy, 2022). Olatomiwa et al. (2016) stress that incorporating multiple renewables, backup generators, and storage is essential to overcome intermittency, yet these additional design considerations raise the overall cost. In practice, off-grid microgrids often must include diesel generators or other backup to meet peak loads. However, reliance on diesel is costly, rural generators incur high fuel, maintenance, and logistics expenses (Elhadidy, 2002; Protogeropoulos et al., 1997). Standalone PV systems alone are usually far from being economic compared to fossil backup. By contrast, PV-wind or PV-wind-diesel hybrids can supply off-grid loads more economically. In all cases, intelligent energy management and control are needed to dispatch sources and storage optimally, but the combinatorial complexity and uncertain whether make this very challenging (Mohammed et al., 2014).

Many studies formulate optimization problems for HRES component sizing and dispatch, using tools like HOMER or custom models (Ashetehe et al., 2024; Khan et al., 2025). A wealth of renewable–storage planning approaches exists these include deterministic and stochastic programming, classical techniques and a wide range of metaheuristic algorithms. Khan et al. (2025) provide a comprehensive review showing that combining RES with storage improves reliability and reduces cost, but they emphasize that further study of optimization methods, meta-heuristic algorithm strategies, system components, and design constraints is needed. Modern techniques overwhelmingly favor metaheuristic soft computing for HRES optimization. Traditional optimization fails to cope with the many uncertainties and nonlinearities, so in the last decade soft computing techniques that rely on meta-heuristic algorithms have been widely employed for hybrid system sizing. Indeed, recent literature abounds with GA, PSO, DE, GWO, FA, ABC, CS, ACO, and other metaheuristic optimizers applied to microgrid design (Modu et al., 2023; Ramli et al., 2018). About 25% of microgrid optimization papers use PSO, ∼10% use GA, ∼5% use GWO, etc., reflecting the dominance of population-based algorithms https://bohrium.dp.tech/paper/arxiv/c3dc08f71192a53a505c5f76b65e14857c12ab1b697662c7f70d383ed7056eae-:∼:text=techno,further%20studies%20in%20this%20area.

Hybridization of algorithms is also common. Modu et al. (2023) report hybrid algorithms that combine two or more techniques to exploit their strengths. Such as, enhanced DE + PSO has been used to size off-grid HRES with PV, wind, diesel, and battery components. Other works fuse swarm and evolutionary methods such as water-cycle + moth-flame for similar mixed-resource problems (Singh et al., 2023). Such hybrids often outperform single methods in convergence speed and solution quality, especially for the multi-objective or highly constrained HRES sizing problems.

On the energy management side, many recent studies propose rule-based, optimization-based or AI-based EMS for HRES. Reviews note that advanced EMS and optimization improve reliability and efficiency but are often evaluated on idealized simulations (Mekhilef et al., 2013). The need for EMS that coordinates multiple sources and storage to ensure uninterrupted supply while minimizing cost. Others have applied machine learning or reinforcement learning to microgrid control, reference (Pinthurat et al., 2024) explore RL for smart-home renewable energy management but adoption of RL in off-grid microgrid scheduling remains limited. While metaheuristic-based EMS techniques can optimize dispatch and resiliency, they often suffer from local minima and may not guarantee global optimality (Akter et al., 2024). Nonetheless, the trend is clear, integrating storage such as batteries, hydrogen, supercaps, and so on with renewables and using metaheuristic algorithms for sizing and dispatch has shown significant promise in many case studies.

A microgrid supplying EV chargers with a hybrid battery energy storage system (HBESS) composed of Li-ion, lead-acid, and second-life EV batteries https://www.mdpi.com/1996-1073/17/15/3631-:∼:text=unmet%20load%2C%20and%20the%20outputs,proportional%20to%20the%20degradation%20reduction (Khazali et al., 2024). They find that a properly sized HBESS can reduce overall system cost by over 20% compared to a pure Li-ion bank, and by moderating degradation extend the useful life of lead-acid units (Safi et al., 2021; Shaaban et al., 2019; Zhou et al., 2022). Researchers emphasize that hybridizing battery chemistries can yield better cost–life trade-offs: for example, Li-ion batteries, while more expensive, suffer less capacity fade than lead-acid https://www.mdpi.com/1996-1073/17/15/3631-:∼:text=Lead,renewables%20once%20they%20have%20come (Dhundhara et al., 2018). Other studies have proposed combining batteries with hydrogen or supercapacitors for large microgrids (Khazali et al., 2024; Modu et al., 2023). It is common to optimize battery charge/discharge schedules, depth-of-discharge limits, and capacity factors to account for aging and replacement costs in the objective function.

Literature review

Recent advances in smart energy management have demonstrated the potential of integrating distributed energy resources (DERs) and flexible appliance scheduling using bio-inspired optimization strategies (Sahoo et al., 2023a). For instance, the Cheetah Optimization Algorithm was applied to coordinate appliance-level dispatch and DER scheduling in mixed-use microgrids, showcasing significant cost and energy efficiency improvements (Thirumalai et al., 2025). Expanding this, the Improved Lyrebird Optimization technique has proven effective for multi-microgrid sectionalization and distributed generation scheduling, enabling cost-efficient islanded or semi-autonomous operation in decentralized grids (Nagarajan et al., 2025). Furthermore, combining price-elastic demand response models with swarm intelligence algorithms such as Greedy Rat Swarm Optimization has led to more economically and environmentally balanced microgrid dispatch frameworks (Singh et al., 2025a). In the domain of AI-integrated hybrid optimization, quantum particle swarm optimization (QPSO) was used to co-optimize cost and emission trade-offs in grid-connected microgrids, demonstrating strong convergence and robustness under uncertainty (Paul et al., 2025). On the infrastructure planning side, Agajie et al. (2025) compared PV-battery versus PV-fuel cell systems in academic laboratories, highlighting how techno-economic analyses can inform sustainability strategies for institutional energy systems. Similar investigations in rural India revealed the viability of off-grid hybrid systems by applying sensitivity-driven cost modeling (Kumar et al., 2025), while others proposed robust power-sharing schemes for autonomous microgrids leveraging hybrid energy sources like PV, wind, and biomass (Anitha et al., 2025). Integrating V2G (Vehicle-to-Grid) technology into smart villages was assessed by Nadimuthu et al. (2024), who found significant potential for renewable-dominant microgrids in rural India. For real-time control, Selvaraj et al. (2024) employed the Crow Search Algorithm to schedule DERs, improving microgrid resilience and operational cost. Load frequency stability in islanded urban microgrids was improved using 1PD-3DOF-PID controllers alongside mobile EV storage, further highlighting the control importance in high-penetration renewable environments (Davoudkhani et al., 2024a). In Douala, Cameroon, a regional techno-economic microgrid assessment revealed hybrid architectures as the most efficient choice for urban sustainability (Molu et al., 2024). On the multi-objective optimization front, advanced sine-cosine chaotic algorithms have been tailored to optimize real-time scheduling in microgrids with fluctuating generation (Karthik et al., 2024), while newer nature-inspired strategies such as mountaineering team-based optimization were introduced for frequency control in isolated renewable-powered microgrids (Davoudkhani et al., 2024b). More recently, crystal structure-based metaheuristics have been utilized to schedule energy and EV loads under uncertainty, achieving improved grid reliability and flexibility (Rajagopalan et al., 2024). Parallel efforts in machine learning-based forecasting of microgrid demand and generation are also maturing, enabling more accurate load shaping and power dispatch (Singh et al. et al., 2024b). Complementary studies have reviewed classical control schemes for microgrids (Kumar et al., 2023) and explored power quality enhancement through novel water-wave inspired optimization (Choudhury et al., 2023) and scaled-conjugate neural networks (Sahoo et al., 2023a). The control and stability of AC/DC hybrid microgrids are also being improved using robust fuzzy-based or multi-layer AI controllers (Abraham et al., 2023; Khosravi et al., 2023). Intelligent optimization of AC–DC conversion and droop control has emerged as a powerful technique for minimizing system cost while managing hybrid energy flows (Prasad et al., 2022). The role of EVs in demand-side flexibility has been reviewed extensively, with a focus on control, modeling, and market integration challenges (Mohanty et al., 2022). Other works model grid-connected microgrids’ sensitivity to hybrid configurations and explore load-voltage coordination for improved energy balancing (Dashtdar et al., 2022b; Sharma et al., 2022). Optimized microgrid operation using combined heat and power systems and hybrid storage continues to gain traction as shown in work by Abdalla et al. (2021). Design of smart residential EMS frameworks with reinforcement learning or fuzzy logic-based control further illustrates the growing role of intelligent optimization (Dashtdar et al., 2022a). Smart residential demand-side management (DSM) models now incorporate EV charging coordination and advanced optimization algorithms to balance user comfort, energy cost, and grid stability (Panda et al., 2025). The advent of blockchain in microgrid interoperability and demand response coordination is also shaping next-generation control frameworks (Singh et al., 2025b). Integration of AI and blockchain for EV charging networks and smart grids is emerging as a secure, scalable solution for managing distributed loads and transactions (Singh et al. et al., 2024b). Additionally, Cheetah-inspired optimization has been used to solve dynamic economic dispatch problems in integrated renewable systems (Nagarajan et al., 2024). Comprehensive surveys have consolidated insights on DSM strategies and market design frameworks to facilitate renewable penetration and load flexibility (Panda et al., 2023). Reviews on residential DSM models affirm the increasing interest in consumer-centric, AI-supported load-shaping strategies (Panda et al., 2022). On the control side, advanced sliding mode observers have enabled real-time maximum power point tracking in PV–battery systems (Dunna et al., 2024), and dipper-throated optimization has been used to fine-tune IoT-driven smart grid predictors (Alkanhel et al., 2024). Finally, recent reviews on blockchain-based energy systems (Ullah et al., 2024), arithmetic Harris Hawks optimization for DSM (Manzoor et al., 2023), and integrated techno-economic benchmarking of hybrid systems (Güven et al., 2025d) further validate the relevance of bio-inspired and AI-integrated optimization techniques in managing uncertainty and complexity in modern microgrids.

Green energy integration and policy context

The global shift toward renewable energy is being accelerated by climate commitments, carbon reduction targets, and the rising cost of fossil-fuel dependency. Policy frameworks worldwide emphasize the integration of green energy to achieve net-zero emissions and reduce greenhouse gas footprints. Recent reviews highlight that renewable–storage hybrids are critical for reliable, resilient, and environmentally sustainable power systems (Barakat et al., 2026). Furthermore, energy transition pathways increasingly stress the dual need for economic viability and policy-driven incentives, particularly in decentralized energy models (Güven and Rizk-Allah, 2025).

Advanced studies examine socio-economic and policy dimensions, including local adoption barriers and system-wide optimization for carbon-neutral grids (El-Khozondar et al., 2025; Güven and Rizk-Allah, 2025). Beyond macro-level frameworks, hybridized renewable energy systems (HRES) have emerged as key enablers of electrification in remote and underserved regions (El-Khozondar and El-batta, 2022). Sustainable integration requires both technical optimization and policy mechanisms to ensure adoption at scale, especially in developing countries where grid extension remains costly and impractical.

In addition to conventional hybrid battery configurations, recent research has emphasized the role of hydrogen storage and supercapacitors as complementary technologies within hybrid energy systems. Hydrogen-based storage provides long-term energy balancing capabilities, making it particularly suitable for addressing seasonal variability and ensuring resilience against extended renewable intermittency. In contrast, supercapacitors offer extremely high-power density and rapid charge discharge cycles, enabling them to smooth short-term fluctuations and enhance transient stability. Comparative studies highlight that hybridizing batteries with hydrogen storage can extend system autonomy and reduce reliance on oversized battery banks, while coupling batteries with supercapacitors improves efficiency in handling peak loads and short bursts of demand. Such hybrid architectures have been shown to improve both reliability and lifecycle performance of renewable-integrated systems (Imbayah et al., 2024; Khaleel et al., 2024). While the present study focuses on hybrid battery systems due to their commercial maturity and proven cost-effectiveness in off-grid applications, hydrogen and supercapacitor integration remains a promising avenue for future work, particularly in large-scale or high-resilience energy systems.

Energy system optimization approaches

Energy system optimization has evolved from deterministic to probabilistic and, more recently, to AI-driven approaches (Sahoo et al., 2023b; Singh et al., 2024a). Deterministic models rely on average load and resource conditions, often underestimating stochastic variability (Güven and Poyraz, 2021). Stochastic programming addresses some of these gaps by incorporating uncertainty into load and generation profiles (Güven and Mete, 2021). However, these techniques become computationally expensive when applied to large-scale hybrid systems.

In contrast, metaheuristic algorithms have gained prominence due to their robustness in handling multi-objective, nonlinear, and mixed-integer problems. Algorithms such as Particle Swarm Optimization (PSO), Genetic Algorithms (GA), Grey Wolf Optimizer (GWO), and Differential Evolution (DE) have been widely applied in HRES planning and dispatch (Güven and Samy, 2022; Güven et al., 2022). Comparative studies confirm that metaheuristics outperform deterministic methods in convergence speed, scalability, and adaptability under uncertain renewable outputs (Guven et al., 2024).

Hybrid metaheuristic approaches, which combine multiple algorithms, show further improvements in avoiding local minima and capturing global optima (Güven et al., 2023). For instance, PSO–GA and DE–PSO hybrids demonstrate superior performance in cost minimization and reliability enhancement for microgrids (Güven and Yücel, 2023). Emerging work emphasizes the need for adaptive parameter tuning to balance exploration and exploitation, thereby enhancing optimization performance (Güven, 2024a).

Hybrid battery energy storage systems (HBESS)

Storage technologies form the backbone of hybrid energy systems, ensuring supply–demand balance, reliability, and resilience. Traditional reliance on single-chemistry batteries, such as lead–acid or lithium-ion, often results in trade-offs between cost, efficiency, and degradation. HBESS integrate complementary chemistries to optimize lifecycle costs and performance. For instance, Ni–Fe batteries offer robustness and long cycle life, while Li-ion provides high energy density; their combination enables both short- and long-term stability (Güven, 2025).

Recent works highlight that HBESS can reduce total system cost by over 20% compared to single-chemistry setups, while extending overall system lifetime (Güven et al., 2023a; Nassar et al., 2025). Similarly, second-life EV batteries combined with conventional storage can reduce waste and improve sustainability outcomes (Nassar et al., 2024). Comparative frameworks demonstrate that integrating hydrogen storage or supercapacitors alongside batteries further enhances flexibility for large-scale microgrids (Güven and Mengi, 2023; Güven and Yücel, 2025).

In addition to technical benefits, HBESS also align with circular economy and sustainability goals by reusing existing storage resources (El-Khozondar et al., 2023, 2024). Nonetheless, gaps remain in modeling degradation mechanisms, cycle-aging impacts, and long-term operational strategies.

Artificial intelligence and reinforcement learning

Artificial intelligence (AI), particularly reinforcement learning (RL), is transforming energy system management. RL provides adaptive, real-time decision-making capabilities that outperform rule-based and deterministic approaches. Studies have demonstrated its applicability in microgrid dispatch, demand-side management (DSM), and energy storage scheduling (Güven et al., 2025b).

Recent works applying deep RL highlight improvements in system resilience, cost-effectiveness, and renewable utilization (Güven et al., 2024b, 2025c). For example, hybrid learning-based RL controllers effectively balance conflicting objectives in microgrid scheduling, achieving superior flexibility compared to metaheuristic-only approaches (Güven et al., 2025a). Furthermore, RL has been integrated with metaheuristic optimization to dynamically adjust control strategies, showing strong results in stochastic environments (Imbayah et al., 2025). However, despite promising results, real-world deployments remain scarce due to computational complexity, training requirements, and integration challenges.

Sustainability and emerging directions

Sustainability has become central to evaluating energy systems. Beyond minimizing cost, recent studies assess hybrid systems based on life-cycle cost analysis, CO₂ reduction potential, and resilience indices (Al-Najjar et al., 2020). Research also highlights the role of advanced materials and eco-friendly design strategies in improving the long-term performance of energy infrastructures.

Emerging work emphasizes multi-objective optimization frameworks that jointly consider economics, technical performance, and sustainability. For instance, multi-criteria approaches applied to hybrid microgrids address both resilience and emissions while ensuring affordability (Güven, 2024b; Güven et al., 2024a). By combining advanced optimization and policy alignment, these studies set the foundation for next-generation resilient and sustainable energy systems.

Research gaps

Despite advancements, several limitations persist. Many works rely on simplified tools, omit degradation effects, or underutilize real-time energy management. While HBESS and AI methods demonstrate strong potential, practical deployment, and resilience-focused optimization remain underexplored. This motivates the present study, which integrates advanced metaheuristics, hybrid battery modeling, and adaptive control strategies into a unified techno-economic optimization framework. These include:

Limited exploration of hybrid storage architecture: Few studies optimize multiple battery chemistries or hybrid batteries and other storage in off-grid HRES. Most assume a single battery type or generic HBESS.

Deterministic models: Many approaches use fixed demand and average conditions often via Homer or equivalent, without fully modeling stochastic renewable output and load variability.

Single-objective focus: Most sizing problems minimize cost and levelized cost of energy (LCOE) only, without jointly considering reliability, emissions and other metrics. Multi-objective and risk-aware planning such as worst-case scenarios, resilience indices is relatively scarce.

Reliance on simplified simulations: Several studies utilize commercial tools and offline optimizers that may not capture dynamic control aspects. Few works integrate real-time EMS strategies such as model predictive control and RL with optimization, especially under uncertainty.

Incomplete accounting of degradation and costs: Battery aging and replacement costs are often omitted. Similarly, auxiliary costs such as inverter losses, maintenance and power quality issues are usually simplified.

To overcome the above limitations this study proposes a comprehensive techno-economic optimization framework is developed to design an off-grid HRES integrated with a HBESS. The proposed system architecture includes PV, wind turbine, biomass generator, diesel generator, and two complementary battery types such as lithium-ion (Li-ion) and nickel-iron (Ni‒Fe) configured to enhance system performance, extend battery lifetime, and reduce lifecycle costs. To accurately capture the dynamic behavior and aging effects of storage components, detailed mathematical models are formulated for each energy generation and storage unit. These models include temperature-adjusted PV output, Weibull-distributed wind energy estimation, calorific biomass conversion, and state-of-charge (SOC)-dependent battery degradation mechanisms. The models are further integrated into an hourly dispatch framework that balances demand and generation under variable renewable output.

To solve the resulting nonlinear, multi-dimensional, mixed-integer optimization problem, we apply and benchmark a suite of metaheuristic algorithms, including PSO, GA, GWO, and hybrid variants. These algorithms are enhanced with adaptive parameter tuning to improve convergence and avoid local minima. The objective of optimization is to minimize the total life cycle cost (LCC) while satisfying load demand, storage constraints, and supply reliability criteria such as loss of power supply probability (LPSP). The entire system is simulated using realistic meteorological and demand datasets over a one-year horizon. The effectiveness of various HBESS configurations is evaluated in terms of net present cost (NPC), levelized cost of energy (LCOE), reliability indices, and CO₂ emission metrics. Sensitivity analyses are also conducted to examine the impact of cost fluctuations and resource variability. The key contributions are:

Unlike conventional dual-battery studies that primarily address charge–discharge scheduling or short-term cost reduction, this work integrates multi-chemistry hybrid batteries with explicit degradation modeling and lifecycle cost assessment, ensuring long-term operational sustainability.

Incorporating detailed cost, efficiency, and degradation models for each storage component including Li-ion and Ni–Fe batteries and all generation units.

In contrast to existing AI-based EMS approaches that rely on reinforcement learning or heuristic optimization in isolation, this study introduces a dual-layer optimization strategy that combines RL with adaptive metaheuristics, improving robustness and adaptability under uncertain renewable generation.

By integrating degradation-aware storage modeling with advanced optimization and control, the proposed framework enhances resilience, reliability, and sustainability in off-grid HRES beyond what is achieved by traditional dual-battery or single-method EMS designs.

The remainder of this paper is organized as follows. Methodology section outlines the proposed methodology, including the mathematical modeling of the hybrid energy system, the formulation of the optimization problem, and the implementation of advanced metaheuristic algorithms. Case Study section presents the case study, detailing the site-specific parameters, input data, and scenario configurations used to evaluate system performance. Finally, Conclusion section summarizes the key findings, highlights the practical implications of the results, and discusses potential directions for future research.

Methodology

The mathematical model presented herein aims to rigorously describe the dynamic behavior and techno-economic performance of an IHRES configured with hybrid battery energy storage. Each subsystem including energy generation, storage, load balancing, cost estimation, and intelligent dispatch is represented by well-established and novel equations. These models capture real-world variability in resources, component degradation, and control strategy complexity, forming the foundation for subsequent optimization and reinforcement learning-based control schemes as shown in Figure 1.

Figure 1.

Research framework.

Renewable energy system modeling

PV power generation

The power output of a PV panel depends on its efficiency, area, and the solar irradiance received. To capture the impact of temperature on PV efficiency, this model includes a correction factor based on cell temperature such as:

P_{P V} (t) = η_{P V} \cdot A_{P V} \cdot G (t) \cdot [1 - β_{T} (T_{c} (t) - T_{r e f})]

(1)

here,

P_{P V} (t)

is the power output from the solar panel at time t,

A_{P V}

is the area of the PV panels,

G (t)

is the solar irradiance,

T_{c} (t)

is the cell temperature and

β_{T}

adjusts for temperature effects on PV efficiency. The actual temperature of the cell plays a crucial role in power conversion efficiency. This relation estimates the cell temperature based on ambient conditions and solar radiation using the standard NOCT as:

T_{c} (t) = T_{a} (t) + (\frac{N O C T - 20}{800}) \cdot G (t)

(2)

here,

T_{c}

is the temperature of the photovoltaic (PV) cell at time t, which depends on the ambient temperature

T_{a}

and the solar irradiance G, adjusted by the NOCT.

Wind power generation

Wind energy modeling begins with characterizing wind speed distribution using a Weibull probability density function. The shape and scale parameters define the wind profile over time. This statistical model is foundational for estimating expected turbine output under stochastic wind conditions will be:

f (v) = (\frac{k}{c}) {(\frac{v}{v})}^{k - 1} (e^{- (\frac{v}{c}) k})

(3)

here, v is wind speed using the Weibull distribution, where k is the shape parameter and c is the scale parameter. To calculate the energy harnessed from wind, the power curve of the wind turbine must be integrated over the Weibull distribution which is:

E_{W T} = \int_{v c u t - i n}^{v c u t - o u t} P_{W T} (v) \cdot f (v) d v

(4)

here,

E_{W T}

is the expected wind energy, integrated over the range of wind speeds between the cut-in and cut-out speeds.

Biomass generator power

Biomass power output is modeled as a product of the generator's thermal efficiency, biomass calorific value, and feedstock flow rate. The conversion of bioenergy into usable electrical energy is captured based on fuel quality and availability.

P_{B M G} (t) = η_{B M G} \cdot Q_{b i o} \cdot m_{b i o} (t)

(5)

here,

P_{B M G}

is the power output from the biomass generator at time t,

Q_{b i o}

is the calorific value of the biomass, and

m_{b i o}

is the biomass mass flow at time t.

Diesel generator power

The diesel generator's fuel consumption is approximated by a second-order polynomial function of output power. This reflects real-world performance, where efficiency varies non-linearly with the load factor.

F_{D G} (t) = a \cdot P_{D G} (t) + b \cdot P_{D G} {(t)}^{2}

(6)

here,

F_{D G}

is the fuel consumption of the diesel generator at time t, which is modeled as a quadratic function of the diesel generator's output

P_{D G}

. To ensure demand-supply balance, the diesel generator is activated only when renewable and stored energy cannot meet the load. This decision rule governs the dispatch schedule and ensures efficient integration of backup systems will be:

P_{D G} (t) = m a x (0, P_{l o a d} (t) - P_{R E} (t) - P_{b a t, d i s} (t))

(7)

here,

P_{D G}

is the power output of the diesel generator, determined by the unmet load after accounting for renewable generation

P_{R E}

and battery discharge

P_{b a t, d i s}

. The total renewable energy generation

P_{R E}

is the sum of the outputs from all renewable sources: solar, wind, and biomass will be.

P_{R E} (t) = P_{P V} (t) + P_{W T} (t) + P_{B M G} (t)

(8)

Hybrid battery modeling

The hybrid battery bank, comprising Lithium-Ion and Nickel-Iron units, is modeled to capture dynamic charging, discharging, and degradation processes. Mathematical expressions detail the evolution of SOC, voltage characteristics, depth of discharge (DoD), and degradation-induced cost. This level of modeling ensures that battery behavior under different dispatch scenarios is realistically represented. The SOC of each battery evolves based on charging and discharging power flows. The govern energy conservation within the storage system and account for charge/discharge efficiency will be:

S O C_{b} (t + 1) = S O C_{b} (t) + \frac{η_{b, c} P_{c h, b} (t) - \frac{P_{d i s, b} (t)}{η_{b, d}}}{C_{b}} Δ t

(9)

here,

η_{b, c}

is charging efficiency,

η_{b, d}

is discharging efficiency, and

C_{b}

is capacity. Battery voltage is a function of SOC, internal resistance, and current. This physical model is necessary for estimating power output limits and understanding voltage-based control strategies will be:

V_{b} (t) = V_{0} - k \cdot (1 - S O C_{b} (t)) + R \cdot I_{b} (t)

(10)

here,

V_{b}

is voltage of the battery depends on the nominal voltage

V_{0}

, the SOC, internal resistance R, and current

I_{b}

. Battery wear correlates strongly with the DoD per cycle. To capture that metric by evaluating the delta between SOC extrema will be:

\frac{d S O H_{cal}}{d t} = - k_{0} \exp (- \frac{E_{a}}{R T (t)}) [1 + α_{SOC} {(S O C (t) - S O C^{⋆})}^{2}]

(11)

here

S O H_{cal}

is the state of health attributed to calendar fade,

k_{0}

is the pre-exponential factor,

E_{a}

is the activation energy (J/mol), R is the universal gas constant,

T (t)

is the cell temperature (K),

S O C (t)

is the instantaneous SOC, and

S O C^{⋆}

represents the least-stress SOC, typically around 50%. The term

α_{SOC}

captures SOC deviation sensitivity. The degradation from charge-discharge cycling is expressed using rain-flow counting and Miner's damage principle. For each identified cycle i:

{\begin{matrix} D_{cyc} = \sum_{i} \frac{n_{i}}{N (D o D_{i}, T_{i}, C_{i})} \\ Δ Q_{cyc} = Q_{nom} \cdot D_{cyc} \end{matrix}

(12)

here

n_{i}

is the number of cycles of depth

D o D_{i}

T_{i}

is the operating temperature,

C_{i}

is the C-rate,

N (\cdot)

is the cycle life under specified conditions, and

Q_{nom}

is the nominal capacity. The functional form of N is expressed as:

\ln N = a_{0} + a_{1} \ln (D o D) + a_{2} D o D + a_{3} (\frac{1}{T}) + a_{4} \ln (C)

(13)

here

a_{0}, a_{1}, a_{2}, a_{3}, a_{4}

are empirically fitted constants obtained from manufacturer data. The charge control decision

u_{b} (t)

minimizes the degradation cost

C_{d e g, b} (t)

while considering the change in SOC. Summing the discharge contributions across all batteries gives the total stored energy released into the system. This ensures the power flow model remains consistent across hybrid energy storage units will be:

P_{b a t, d i s} (t) = \sum_{b} P_{d i s, b} (t)

(14)

The total battery discharge power

P_{b a t, d i s}

is the sum of all battery discharges at time t. During charging events, the net energy stored depends on the sum of all battery inputs adjusted for efficiency:

E_{b a t} (t) = \sum_{b} η_{b, c} \cdot P_{c h, b} (t) \cdot Δ t

(15)

The energy stored in the batteries is the sum of all the stored energy contributions, adjusted for efficiency. Ohmic losses due to internal battery resistance reduce the effective power available. This quadratic loss model is essential for accurate energy flow and battery heating analysis is:

P_{l o s s, b} (t) = I_{b} {(t)}^{2} \cdot R_{b}

(16)

The losses due to internal resistance in the battery are proportional to the square of the current

I_{b}

Load balance and energy flow

A fundamental constraint of the microgrid is that supply must meet demand at each time step t. This ensures that the sum of generation and discharge equals total consumption, including water pumping and curtailment is:

P_{R E} (t) + P_{D G} (t) + P_{b a t, d i s} (t) = P_{l o a d} (t) + P_{b a t, c h} (t) + P_{R O} (t) + P_{l o s s} (t)

(17)

The energy generated by renewable sources, the diesel generator, and the battery must match the load demand, battery charging, and the reverse osmosis power consumption. When the available supply falls short of the demand, the resulting unmet load is captured by:

P_{u n m e t} (t) = max (0, P_{l o a d} (t) - P_{s u p p l i e d} (t))

(18)

The unmet load is the difference between the total load demand and the supplied power from all sources. LPSP quantifies the reliability of the system by calculating the fraction of total demand that remains unmet is:

L P S P = \frac{\sum_{t = 1}^{T} P_{u n m e t} (t)}{\sum_{t = 1}^{T} P_{l o a d} (t)}

(19)

LPSP is the probability that the power supply is insufficient to meet the load demand. Reverse osmosis units require a certain amount of energy per liter of water purified. To calculate the power consumption of the RO system as a function of water production rate is:

P_{R O} (t) = Q_{R O} (t) \cdot E_{R O}

(20)

The power required by the reverse osmosis system is determined by the amount of water

Q_{R O} (t)

and the energy required per unit of water

E_{R O}

. Using the RO efficiency and flow rate, the daily water output will be:

W_{p r o d} (t) = Q_{R O} (t) \cdot η_{R O}

(21)

The amount of water produced is the product of the flow rate and the efficiency of the reverse osmosis process.

Objective optimization

LCC is the total cost incurred over the system's lifetime, discounted to present value. It includes capital, operational, fuel, and replacement expenses. It serves as the economic objective in the optimization framework is:

L C C = C_{c a p} + \sum_{t = 1}^{T} \frac{C_{O & M} (t) + C_{f u e l} (t) + C_{r e p} (t)}{{(1 + r)}^{t}}

(22)

The total cost is the sum of capital expenditure, operation and maintenance costs, fuel costs, and battery replacement costs, discounted over time. LCOE evaluates the average cost per unit of electricity delivered. By dividing total cost by the discounted energy output, this metric enables direct comparison with utility tariffs and other systems is:

L C O E = \frac{L C C}{\sum_{t = 1}^{T} \frac{E_{d e l i v e r e d} (t)}{{(1 + r)}^{t}}}

(23)

LCOE is the cost of producing one unit of electricity, averaged over the system's lifetime. Diesel generator usage leads to CO₂ emissions proportional to power generation and an emission factor. To quantify the environmental impact and supports low-carbon system is designed as:

C O_{2} = \sum_{t = 1}^{T} ϕ_{D G} \cdot P_{D G} (t) \cdot Δ t

(24)

The total CO₂ emissions from the diesel generator are calculated by multiplying the fuel consumption rate by the emission factor

ϕ_{D G}

. To facilitate numerical optimization the decision variables are:

x = [A_{P V}, A_{W T}, C_{L i}, C_{N i F e}, P_{B M G}, D G_{s i z e}, r_{s p l i t}]

(25)

The optimization vector consists of decision variables, including the capacities of PV, wind, battery types, and the diesel generator size. To define the three simultaneous objectives of the study, the optimizer seeks a Pareto front of trade-off solutions balancing cost, emissions, and energy affordability is given as:

min_{x} [f_{1} (x), f_{2} (x), f_{3} (x)] = [L C C, L C O E, C O_{2}]

(26)

Battery safety and longevity require the SOC to be maintained within certain bounds. This constraint is enforced throughout the simulation horizon for both battery types is:

S O C_{\min} \leq S O C_{b} (t) \leq S O C_{m a x}

(27)

The SOC of each battery must lie within predefined bounds. Each power source and converter have a rated capacity. This constraint prevents overloads and guides the correct sizing during optimization.

P_{f l o w, i} \leq P_{r a t e d, i}, \forall i \in [P V, W T, I N V, B M G, D G]

(28)

The power generated by each source must not exceed its rated capacity. A policy-driven constraint is that at least 70% of total energy must come from renewables. This drives sustainability and aligns the model with emission reduction targets:

R_{s h a r e} = \frac{\sum P_{R E} (t)}{\sum P_{l o a d} (t)} \geq 70 %

(29)

Reinforcement learning (DQN)

The Deep Q-Network (DQN) agent observes the system via a state vector that includes SOCs, generation, load, and time. These inputs allow the neural network to learn an optimal policy under diverse scenarios:

s_{t} = [S O C_{L i}, S O C_{N i F e}, P_{R E} (t), P_{l o a d} (t), t]

(30)

The state vector includes information about the SOC of each battery, renewable energy generation, load, and the time step. To define the permissible actions, charge/discharge decisions and diesel dispatch that the agent can take is:

a_{t} \in [c h a r g e L i, d i s c h a r g e L i, c h a r g e N i F e, d i s c h a r g e N i F e, r u n D G]

(31)

The action space defines the available actions the system can take, such as charging/discharging batteries or using the diesel generator. DQN training uses the Bellman function to update the expected return for each state action pair. It combines immediate rewards with future estimates to converge toward an optimal policy:

Q (s_{t}, a_{t}) \leftarrow Q (s_{t}, a_{t}) + α [r_{t} + γ max_{a^{'}} Q (s_{t + 1}, a^{'}) - Q (s_{t}, a_{t})]

(32)

The Q-value is updated using the Bellman equation based on the current state, action, and received reward. The agent is rewarded based on minimizing costs, unmet load, and battery degradation. This multi-faceted reward encourages behaviors that balance economy, reliability, and lifespan is given as:

R_{t} = - α C_{op} (t) - β P_{unmet} (t) - γ \sum_{b} C_{\deg, b} (t)

(33)

The reward is designed to penalize high operational costs, unmet load, and battery degradation. The loss function measures the difference between predicted and target Q-values. Minimizing this error improves the accuracy of the agent's decisions over time:

L (θ) = E_{D} [{(r + γ max_{a^{'}} Q (s^{'}, a^{'}; θ^{-}) - Q (s, a; θ))}^{2}]

(34)

The loss function is minimized using the experience replay mechanism in DQN. Weights in the Q-network are updated using gradient descent on the loss function which enables the network to learn complex patterns in system dynamics:

θ \leftarrow θ - η \nabla_{θ} L (θ)

(35)

The Q-network parameters are updated using gradient descent.

Sensitivity analysis

To test robustness, input parameters such as cost and fuel price are perturbed is:

p_{i}^{n e w} = p_{i}^{b a s e} \cdot (1 + δ)

(36)

Parameters are perturbed by a small factor

δ

to assess their impact on the system's performance. To measure sensitivity, the index relative impact of a parameter change on the objective function is:

S_{i} = \frac{f (p_{i} + δ) - f (p_{i})}{f (p_{i})} \cdot \frac{p_{i}}{δ}

(37)

The sensitivity index measures the effect of parameter changes on the output of the system. To measure the Sobol indices, quantify how much each uncertain parameter contributes to the total output variance is:

S_{T i} = 1 - \frac{(V a r_{p \sim i} (E [y | p \sim i]))}{V a r (y)}

(38)

The reliability metric captures the percentage of demand that was successfully served. A higher value indicates better system availability:

E_{r e l} = 1 - L P S P E

(39)

The energy reliability index quantifies the system's ability to meet load demand. Autonomy is defined as:

A = \frac{E_{R E}}{E_{R E} + E_{D G}}

(40)

The system's autonomy is defined as the ratio of renewable energy contribution to the total energy supplied. To capture minimum run-times behavior, binary variables are introduced to represent start-up and shut-down decisions, leading to the following constraint:

P_{g}^{\min} u_{g, t} \leq P_{g, t} \leq P_{g}^{\max} u_{g, t}

(41)

here,

u_{g, t}

denotes the generator on/off status, and

P_{g}^{\min}, P_{g}^{\max}

are its minimum and maximum capacities. The unit status evolves with start-up and shut-down indicators

v_{g, t}

and

w_{g, t}

u_{g, t} - u_{g, t - 1} = v_{g, t} - w_{g, t}

(42)

To prevent frequent cycling, minimum up-time

M_{g}^{↑}

and down-time

M_{g}^{↓}

requirements are imposed:

{\begin{matrix} \sum_{τ = t}^{t + M_{g}^{↑} - 1} u_{g, τ} \geq M_{g}^{↑} v_{g, t} \\ \sum_{τ = t}^{t + M_{g}^{↓} - 1} (1 - u_{g, τ}) \geq M_{g}^{↓} w_{g, t} \end{matrix}

(43)

Furthermore, the generator must respect ramp-rate limits, expressed as:

- R_{g}^{↓} \leq P_{g, t} - P_{g, t - 1} \leq R_{g}^{↑}

(44)

While generator dynamics are addressed above, the battery model also requires refinement. To improve realism, the state of charge is updated with a leakage rate

σ_{b}

, capacity

C_{b}

, and efficiencies

η_{b, c}, η_{b, d}

{SOC}_{b, t + 1} = (1 - σ_{b} Δ t) {SOC}_{b, t} + \frac{η_{b, c} P_{b, t}^{ch} Δ t}{C_{b}} - \frac{P_{b, t}^{dis} Δ t}{η_{b, d} C_{b}}

(45)

Inverters often reduce their output capacity when exposed to high ambient temperatures. This effect is represented by a derating coefficient

α_{inv}

, a threshold temperature

T_{thr}

, and the ambient temperature

T_{amb} (t)

P_{inv}^{\max} (t) = P_{inv}^{rated} [1 - α_{inv} \max (0, T_{amb} (t) - T_{thr})]

(46)

The total AC output from all connected sources must then remain within this temperature-adjusted limit:

\sum_{i \in AC sources} P_{i, t}^{AC} \leq P_{inv}^{\max} (t)

(47)

In addition to derating, scheduled maintenance can temporarily reduce system availability. This is modeled using a binary availability parameter

A_{i, t}

, which restricts maximum output:

0 \leq P_{i, t} \leq A_{i, t} P_{i, t}^{\max}

(48)

To ensure that each unit observes the required downtime

D_{i}

, the availability is further constrained as:

\sum_{τ = t}^{t + D_{i} - 1} (1 - A_{i, τ}) = D_{i} \forall i

(49)

Since these additional constraints influence dispatch and decision-making, they must also be embedded into the energy management system. This requires expanding the state vector to include ambient temperature, inverter derating, availability, and generator timers:

s_{t} = [{SOC}_{L i, t}, {SOC}_{N i F e, t}, P_{R E} (t), P_{load} (t), t, T_{amb} (t), P_{inv}^{\max} (t), A_{i, t}, τ_{g, t}^{↑ / ↓}]

(50)

The reward function is extended with penalties for generator start-up costs

C_{g}^{start}

, inverter curtailment, and unmet demand, alongside the original operating and degradation costs:

R_{t} = - α C_{op} (t) - β P_{unmet} (t) - γ C_{\deg} (t) - δ \sum_{g} C_{g}^{start} v_{g, t} - ζ {Spill}_{t}^{inv}

(51)

The Metaheuristic Algorithm Hyperparameters and Configuration Settings presents in Table 1 a comprehensive comparison of five widely used metaheuristic algorithms such as PSO, GWO, GA, SSA, and DQN. It details essential hyperparameters such as population size, iterations, and specific algorithmic constants like inertia weight and learning rates. Additionally, it highlights parameter adaptation strategies such as linearly decreasing inertia for PSO, dynamic inertia for SSA, initialization methods such as random uniform for most algorithms, Latin hypercube for SSA, and unique operational features such as social hierarchy in GWO, experience replay in DQN. By offering this detailed breakdown, the table ensures the reproducibility and fairness of benchmarking across algorithms, providing transparency in the experimental setup and facilitating future comparisons.

Table 1.

Metaheuristic algorithm hyperparameters and configuration settings.

Algorithm	Population size	Iterations	Key parameters	Parameter adaptation	Initialization strategy
PSO	50	200	ω = 0.7, c₁ = 1.5, c₂ = 1.5	Linearly decreasing inertia	Random uniform
GWO	30	150	a^→ linearly decreased from 2 to 0, α–β–δ hierarchy	Linear control parameter a	Random uniform
GA	40	250	Crossover rate = 0.8, Mutation rate = 0.02	Fixed	Elite + Random selection
SSA	30	200	inertia = 1	Dynamic inertia adjustment	Latin hypercube sampling
DQN	N/A	500 episodes	Learning rate = 0.001, γ = 0.95, ε-decay = 0.995	Experience replays on target network	Random policy initialization

Table 2 consolidates all the essential parameters used in the proposed microgrid optimization framework. It brings together technical specifications of renewable generation units (PV, wind, biomass, and diesel generators), hybrid battery system characteristics (Li-ion and Ni‒Fe technologies), as well as economic assumptions such as capital costs, O&M costs, replacement costs, and discount rate. In addition, system-level constraints related to reliability, renewable energy penetration, and DSM scenarios are summarized. Presenting these parameters in a single table improves readability, ensures transparency, and facilitates reproducibility of the study by clearly outlining the baseline assumptions and optimization boundaries.

Table 2.

Summary of key technical, economic, and system parameters adopted in the microgrid optimization model.

Category	Parameter	Symbol/unit	Value/range
Photovoltaic	PV module efficiency	$η_{P V} (%)$	∼15–20
	PV panel area	$A_{P V} (m^{2})$	Decision variable
	Nominal operating cell temp.	NOCT (°C)	Given by datasheet
	Solar irradiance	G(t) (W/m²)	Hourly dataset
Wind turbine	Swept area	$A_{W T} (m^{2})$	Decision variable
	Air density	$ρ (k g / m^{3})$	1.225
	Weibull shape parameter	$k (-)$	Site-specific
	Weibull scale parameter	$c (m / s)$	Site-specific
	Cut-in/cut-out speed	$v_{c u t - i n / o u t} (m / s)$	3/25
Biomass generator	Efficiency	$η_{B M G} (%)$	25–35
	Biomass calorific value	$Q_{b i o} (M J / k g)$	15–18
	Mass flow of biomass	$m_{b i o} (k g / s)$	Site-specific
Diesel generator	Fuel consumption	FDG (L/kWh)	Quadratic model
	Rated capacity	PDG (kW)	Decision variable
	Diesel fuel price	– (USD/L)	Sensitivity study
Hybrid battery (HBESS)	Battery types	–	Li-ion and Ni‒Fe
	Capacity	$C_{b} (k W h)$	Decision variable
	SOC limits	$S O C_{m i n}, S O C_{m a x}$	0.2–0.9 (Li-ion), 0.3–0.8 (Ni‒Fe)
	Round-trip efficiency	$η (%)$	90–95 (Li-ion), ∼80 (Ni‒Fe)
	Cycle life	$N_{c y c l e}$	3000–5000 (Li-ion), > 10,000 (Ni‒Fe)
	Replacement cost	$C_{r e p} (U S D / k W h)$	Case study assumption
Economic parameters	Discount rate	$r (%)$	8–10
	O&M costs	$C O & M$	Technology-specific
	Capital costs	$\hat{C}$	Tech-specific
System constraints	Reliability	LPSP	<0.01
	Renewable share	$R_{s h a r e} (%)$	≥70%
	Simulation horizon	–	1 year (hourly resolution)
	Demand-side management	DSM scenarios	Base/moderate/aggressive

Table 3 summarizes the architecture and training parameters of the DQN employed in this study. The network consists of a five-dimensional input layer representing the state variables, two fully connected hidden layers with 128 and 64 neurons respectively using ReLU activation to capture nonlinear dynamics, and an output layer with five nodes corresponding to the discrete action space. The training was performed using the Adam optimizer with a learning rate of 0.001 and a discount factor of 0.95 to balance short- and long-term rewards. An ε-greedy strategy was adopted to ensure exploration during early training, with ε decaying gradually from 1.0 to 0.01. To stabilize learning, a replay memory buffer of 10,000 experiences with a batch size of 64 was used, and a target network was updated every 200 steps. Training was carried out for 5000 episodes using the PyTorch framework, with convergence observed in terms of stable reward progression and Q-value approximation.

Table 3.

DQN architecture and training parameters.

Component	Configuration
Input layer	5 nodes (SOC of Li-ion, SOC of Ni‒Fe, renewable generation, load demand, time step)
Hidden layer 1	128 neurons, fully connected, ReLU activation
Hidden layer 2	64 neurons, fully connected, ReLU activation
Output layer	5 nodes (Q-values for actions: charge/discharge Li-ion, charge/discharge Ni‒Fe, diesel dispatch)
Optimizer	Adam, learning rate = 0.001
discount factor (γ)	0.95
Exploration strategy	ε-greedy, ε decays from 1.0 → 0.01 over training episodes
Replay memory	10,000 experiences, batch size = 64
Target network	Updated every 200 steps
Training episodes	5000 episodes until convergence
Framework	Python

Case study

The case study is based on experimental battery datasets under different representative operating temperatures. These datasets, derived from laboratory characterization, capture key resistance and capacitance parameters as a function of SOC. Unlike meteorological case studies tied to a specific geographic region, this study focuses on the battery performance domain, making the findings broadly applicable to diverse microgrid and renewable integration contexts. The baseline microgrid system shown in Figure 2 operates under a conventional rule-based control strategy. Renewable power from the PV array and wind turbine is first directed to the energy system bus and given priority in meeting the critical load. A single lithium-ion battery provides storage, managed by a rule-based controller that enforces SOC limits between $20 - 90 %$ , priorities charging from renewables, and permits discharge only when SOC is above $30 %$ . If renewable generation and battery discharge are insufficient, the diesel generator supplies the remaining demand under a load-following mode, operating at no less than $30 %$ of its rated capacity. When renewable output exceeds the combined demand and charging capability, excess power is curtailed.

Figure 2.

Baseline microgrid system configuration with rule-based control.

Figure 3 shows the hourly energy balance of an IHRES under three different DSM scenarios such as Base, Moderate DSM, and Aggressive DSM. In the Base Scenario, the peak PV output aligns with the midday load demand, resulting in significant battery charging activity. During off-peak PV hours such as early morning and evening, battery discharge compensates for the energy shortfall. Biomass and wind provide consistent support throughout the day, while diesel usage remains minimal, highlighting the system's ability to rely primarily on renewables. With Moderate DSM, a temporal load-shifting mechanism is applied, effectively flattening the load curve $L (t)$ . This shifts non-critical loads to periods of higher renewable generation $G_{R E} (t) = P_{P V} (t) + P_{W i n d} (t) + P_{B i o m a s s} (t)$ , thus minimizing the mismatch $Δ P (t) = L (t) - G_{R E} (t)$ . This results in reduced reliance on storage and improved alignment between generation and consumption. While the Aggressive DSM scenario enhances this alignment by aggressively reshaping the load to closely follow the RE supply curve. The battery charging during PV peak becomes deeper, and the discharge window narrows, reducing energy losses and battery cycling stress. This leads to a lower value of the net power deviation $\sum_{t = 1}^{24} ∣ Δ P (t) ∣$ , enhancing system efficiency. The LPSP is expected to be minimal as the system operates closer to energy autonomy $α = \frac{\sum P_{R E} (t)}{\sum L (t)} \approx 1$ .

Figure 3.

Energy balance analysis.

Figure 4 shows the system's performance under an extreme event stress test combining calm wind, cloudy solar conditions, and a mid-episode storm outage. The electrical load, which maintains a strong daily cycle around 400–480 kW. Renewable generation from wind and solar is visibly suppressed compared to typical variability, with wind output frequently near zero and solar attenuated by heavy cloud cover. During the storm period such as days 5–6, both wind and solar generation are drastically curtailed, highlighting the severity of the stress scenario. To maintain supply, diesel generation ramps up significantly during renewable deficits, demonstrating its role as a backup resource. Meanwhile, the SOC battery fluctuates between 35% and 90%, but crucially remains above the reserve threshold, indicating that the energy management system enforces a stricter SOC floor under stress. The combined effect shows that despite prolonged renewable scarcity and a storm-induced outage, the system preserves reliability by intelligently coordinating storage and diesel dispatch.

Figure 4.

Dispatch and SOC during a calm and cloudy stress scenario with storm outage, showing diesel backup and SOC reserves maintain reliability under prolonged renewable scarcity.

Figure 5 shows the representative load scenarios used to evaluate the robustness of the proposed EMS. Figure 5(a) presents high-demand event profiles, including heatwave, cold snap, and festival/recovery days. These scenarios introduce significant deviations from normal demand, such as sharp midday and evening peaks during heatwaves, extended high evening loads under cold snaps, and atypical multi-peak structures during festival days. The DSM variants (Base, Moderate, Aggressive) demonstrate a progressive reduction in peak magnitude, with load shifted into PV-rich midday hours, thereby alleviating stress on evening demand. Figure 5 (b) compares weekday and weekend patterns, where weekday loads exhibit stronger evening peaks associated with workday return, while weekend profiles show flatter daytime consumption. Again, DSM application reshapes the profiles by reducing evening peaks and distributing demand more evenly across the day. It highlights the variability in demand conditions under which the EMS is tested, as well as the effectiveness of DSM in mitigating peak stresses and enhancing operational flexibility.

Figure 5.

Representative load scenarios under DSM: (a) high-demand events and (b) weekly patterns.

Figure 6 shows the SOC profiles of Li-ion and Ni‒Fe batteries over one year, comparing conventional static dispatch strategies against a DQN-based RL approach. In the upper subplot, the Li-ion battery SOC under static dispatch exhibits aggressive oscillations, regularly approaching both the upper and lower bounds of allowable SOC of 0.2 to 0.9. This behavior leads to frequent deep cycling, which is known to accelerate capacity fade and increase the battery degradation cost $C_{\deg} \propto N_{c y c l e s}$ ⋅ The DQN-controlled dispatch shows smoothed and more centralized SOC modulation, maintaining the battery within a narrower operating window. This effectively reduces the average DOD, prolonging battery life and reducing replacement frequency. The lower subplot illustrates similar trends for the Ni‒Fe battery. The static strategy shows wide SOC fluctuations, including frequent excursions below 30% SOC. This is problematic for Ni‒Fe chemistry, which is sensitive to electrolyte imbalance and efficiency loss at deep discharge levels. The DQN-managed profile, on the other hand, operates the battery more conservatively, maintaining SOC between approximately 40–80%. This strategy maximizes round-trip efficiency and minimizes wear.

Figure 6.

SOC profile Li-ion and Ni‒Fe.

Figure 7 shows the convergence trajectories of four metaheuristic optimization algorithms such as SSA, GA, PSO, and GWO, based on the normalized objective function value $f_{n}^{(k)} = \frac{f^{(k)}}{f^{(0)}} \in [0, 1]$ . The SSA demonstrates the fastest convergence, achieving $f_{n}^{(100)} = 0.036$ , while GWO closely follows with $f_{n}^{(100)} = 0.042$ . GA and PSO exhibit slower convergence, settling at $f_{n}^{(100)} = 0.051$ and $f_{n}^{(100)} = 0.059$ , respectively. SSA outperforms all methods in early iterations, reaching $f_{n}^{(20)} < 0.48$ , compared to GWO $f_{n}^{(20)} < 0.51$ , GA $f_{n}^{(20)} < 0.56$ , and PSO $f_{n}^{(20)} < 0.61$ . The convergence rate $r = \frac{(f^{0} - f^{k})}{k}$ is highest for SSA across $k = 1 - 50$ , indicating superior exploitation capability. Additionally, all algorithms show stable convergence behavior with negligible oscillations beyond iteration $75$ , affirming convergence toward a near-global optimum.

Figure 7.

Convergence curves of optimization algorithms.

Figure 8 shows the multi-objective optimization defined by LCOE, CO₂ emissions, and LPSP. In analyzing the Pareto optimization results, it is evident that trade-offs emerge between economic and environmental objectives. Solutions that minimize LCOE tend to rely more heavily on dispatchable backup sources, which can lead to higher CO₂ emissions. Conversely, solutions emphasizing emission reduction often necessitate larger renewable generation and storage capacities, which increase capital investment and slightly raise LCOE. The Pareto fronts illustrate these dynamics, where SSA-generated solutions cluster toward balanced outcomes with relatively low LCOE (0.32–0.35 USD/kWh) and minimal LPSP (<0.01), while GA and PSO solutions show greater dispersion and in some cases higher emissions (Makhzom et al., 2023). Similarly, the LCOE-emission relationship highlights that deeper renewable penetration not only reduces emissions but also stabilizes long-term system costs by distributing expenditures across a larger share of clean energy generation. These results underscore the practical value of multi-objective optimization: decision-makers can select solutions along the Pareto frontier depending on whether cost minimization, emission reduction, or system reliability is prioritized. This aligns with recent studies that emphasize Pareto-based energy planning as a tool for balancing affordability, sustainability, and resilience in renewable-dominated systems.

Figure 8.

Multi-objective optimization.

Table 4 shows a comparative view of the scalability characteristics of the investigated algorithms. The results indicate that population-based heuristics such as SSA and PSO demonstrate rapid convergence for small- to medium-scale problems and maintain relatively low training overhead. However, their computational burden increases as the number of decision variables and the problem dimensionality expand, which limits their efficiency when applied to larger microgrids or extended horizons. GA shows moderate convergence but suffers from the steepest increase in execution time with problem size, making it the least efficient option for high-dimensional planning problems. In contrast, DQN emerges as the most scalable framework, although it requires significant training effort and computational resources upfront, its post-training performance is superior across multiple dimensions. Specifically, DQN exhibits very high convergence reliability, manageable execution time growth, and excellent real-time suitability, as decision-making during inference is nearly instantaneous and independent of microgrid size. This distinction underscores a key trade-off, while classical heuristics may be adequate for short-term or localized optimization tasks, reinforcement learning–based methods such as DQN provide a more robust and future-ready solution for large-scale, long-horizon microgrid applications.

Table 4.

Comparative scalability assessment of optimization algorithms.

Algorithm	Convergence speed	Execution time growth	Convergence reliability	Training overhead	Real-time suitability	Scalability in large systems
SSA	Fast	Moderate increase	High	Low	Medium	Limited beyond medium scale
PSO	Fast	High increase	High	Low	Medium	Moderate scalability
GA	Moderate	High increase	Medium	Moderate	Low	Less efficient at large scale
DQN	Slow	Excellent	Excellent	Excellent	Excellent	Excellent scalable once trained

Figure 9 shows the impact of increasing system complexity on the solution quality of the tested metaheuristic optimizers. The left panel shows performance trends when the number of microgrid clusters is scaled up to three times as much as the baseline configuration, while the right panel presents results for extended planning horizons of up to three years. For larger cluster sizes, SSA exhibits a steady increase in relative cost compared to the baseline, indicating that its solution quality deteriorates as the system expands. PSO demonstrates stronger sensitivity, with a sharp decline at twofold scaling before partial recovery at threefold scaling, suggesting instability under larger search spaces. In contrast, GA maintains near-constant solution quality across all cluster scales, reflecting greater robustness. When the horizon length is extended, SSA shows a consistent degradation in performance, while PSO maintains quality up to two years but drops significantly at three years. GA again proves most stable, with solution values remaining close to the baseline throughout. These results highlight the trade-off between exploration ability and computational stability: SSA and PSO may offer stronger performance in smaller systems but face challenges in long-horizon or larger-scale optimization, whereas GA demonstrates more consistent scalability at the cost of slower convergence in some cases.

Figure 9.

Normalized solution quality of SSA, PSO, and GA under scalability tests: (a) variation with cluster scale (×baseline) and (b) Variation with planning horizon (years).

Figure 10 shows the execution time scalability of the metaheuristic optimizers when applied to increasingly complex microgrid configurations. In the left panel, execution times are plotted against cluster scaling, while the right panel shows runtime growth as the planning horizon is extended to three years. For cluster scaling, SSA and PSO exhibit a relatively smooth and near-linear increase in runtime, reflecting predictable growth in computational demand as the search space expands. GA, however, shows a more irregular trend, with execution time peaking sharply at twofold scaling before decreasing at threefold scaling, which may be attributed to premature convergence or reduced exploration effort at higher scales. When considering horizon length, all algorithms show a clear upward trend, with SSA and PSO experiencing almost linear growth in runtime as simulation length increases, reaching close to 4 s at a 3-year horizon. GA, in contrast, demonstrates a flatter curve beyond two years, suggesting greater efficiency in handling long temporal horizons, albeit with the earlier noted risks of solution quality variability. These results confirm that SSA and PSO maintain consistency in runtime scaling but at higher computational costs, while GA offers efficiency advantages for extended horizons but may trade off robustness in larger system sizes.

Figure 10.

Execution time scalability of SSA, PSO, and GA under extended problem sizes: (a) execution time versus cluster scale (× baseline) and (b) execution time versus planning horizon (years).

Figure 11 shows the comparison of five optimization algorithms such as GA, PSO, SSA, GWO, and DQN across three key normalized performance metrics such as Execution Time, Solution Stability ( $σ$ ), and Constraint Violations. Each metric is derived from a consistent test environment to assess trade-offs in speed, robustness, and feasibility. From a computational cost perspective, SSA delivers the fastest execution time at $75 s$ , followed by PSO $90 s$ and GWO $100 s$ . GA and DQN are relatively slower at $120 s$ and $130 s$ , respectively. The DQN approach compensates with superior solution stability, exhibiting the lowest standard deviation $σ = 0.005$ , indicating minimal variation across multiple runs. In contrast, GA shows the highest variability at $σ = 0.015$ , reflecting less consistent convergence behavior. Regarding feasibility, DQN and SSA both achieve zero constraint violations, demonstrating perfect adherence to system boundaries such as power balance and storage limits. GA incurs 3 violations, GWO 2, and PSO 1, signaling reduced reliability.

Figure 11.

Execution time and stability comparison.

Figure 12 shows a Q-value heatmap representing the action policy of a DQN agent in a HBESS. At low SOC < 0.4, charging is dominant across nearly all load levels, reflecting the DQN's learned priority to preserve battery reserve and avoid deep cycling. While at high SOC > 0.8 and low to moderate loads < 1.5 kW, discharging becomes prevalent, optimizing energy availability and reducing diesel use. The idle state is scattered and appears primarily in mid-range SOC ∼ 0.5–0.7, indicating energy equilibrium where supply meets demand without battery involvement. This Q-policy shows the agent's nuanced control logic shaped by long-term rewards, including minimizing degradation cost, emissions, and LCOE, while ensuring energy sufficiency. The heatmap effectively confirms that the DQN agent has learned state-aware energy management strategies that balance charging and discharging in response to both system condition and external demand.

Figure 12.

Q-value heatmap (this figure was generated by the authors using Python 3.11.4 with Matplotlib v3.7.1 and Seaborn v0.12.2.).

Figure 13 shows the normalized Q-value distribution $Q (s, a_{c h a r g e}) \in [0, 1]$ for the charging action state space defined by $S O C \in [0.2, 1.0]$ and Load Demand $\in [0.1, 2.0] k W$ . The color intensity represents the magnitude of the Q-value associated with charging, where red indicates high reward potential Q > 0.9 and dark blue signifies low utility Q < 0.1. High Q-values are concentrated in the upper-left region such as SOC > 0.8, Load < 0.6 kW, suggesting that the agent learns to prioritize charging when demand is low and battery capacity is available. Conversely, for SOC < 0.4 and Load > 1.5 kW, Q-values remain below 0.2, reflecting a learned penalty for charging under high load conditions. The optimal policy arises from $π * (s) = a r g max_{a} Q (s, a)$ , and the heatmap visualizes the scalar field of $Q (s, a_{c h a r g e})$ over the discretized state grid $s = (S O C, L o a d)$ . The sharp Q-value gradients in the mid-SOC zones from $0.5 - 0.7$ shows the agent's sensitivity to marginal cost–benefit transitions during battery scheduling.

Figure 13.

Q-value heatmap for charging (This figure was generated by the authors using Python 3.11.4 with Matplotlib v3.7.1 and Seaborn v0.12.2.).

Figure 14 shows the Q-value map for the discharging action, where each cell encodes the normalized Q-value $Q (s, a_{d i s c h a r g e}) \in [0, 1]$ across a discretized state space defined by $S O C \in [0.2, 1.0]$ and Load Demand $\in [0.1, 2.0]$ kW. High Q-values such as 0.8–1.0, shown in red, are concentrated in the regions with moderate to high load demand ≥ 1.2 kW and SOC > 0.5, indicating that discharging is optimal when sufficient energy is stored and demand is high. In, Q-values fall below 0.2 when both SOC and demand are low, suggesting high cost or risk in discharging under such conditions. Q-value gradients are observed along the demand axis at SOC levels near 0.6–0.8, showing that discharge action utility increases nonlinearly with rising load.

Figure 14.

Q-value heatmap for discharging (this figure was generated by the authors using Python 3.11.4 with Matplotlib v3.7.1 and Seaborn v0.12.2.).

Figure 15 quantifies the sensitivity of the LCOE to key system parameters using a univariate perturbation method. The most sensitive parameter is diesel price, where a $\pm 10 %$ change leads to a maximum $Δ L C O E \approx \pm 0.062$ $/kWh, corresponding to a normalized sensitivity coefficient. Similarly, the interest rate induces a $Δ L C O E o f \pm 0.033$ $/kWh, while PV capital cost contributes $\pm 0.022$ $/kWh. The battery-related parameters, such as degradation rate and replacement cost, yield symmetric impacts of $Δ L C O E = \pm 0.017$ and $\pm 0.018$ $/kWh, respectively. Wind maintenance cost shows the least impact, with $Δ L C O E < 0.005$ $/kWh. Parameters such as diesel price and interest rate amplify the numerator without proportional gain in $E_{g e n}$ , leading to LCOE escalation. This sensitivity analysis highlights that financial interest and fossil-based (diesel) variables dominate LCOE uncertainty, guiding targeted cost reduction strategies in HRES planning.

Figure 15.

Sensitivity of LCOE to key parameters.

Figure 16 shows the sensitivity of the LPSP to ±20% variations in key techno-economic parameters. The base-case LPSP is 0.008, indicating a highly reliable system. Changes in diesel price and interest rate have negligible influence, with sensitivity indices of 0.031 and 0.000, respectively. By contrast, PV capital cost shows a strong negative sensitivity of $S_{i} = - 0.562$ , where a 20% reduction in PV cost decreases LPSP from 0.0089 to 0.0071, highlighting the importance of PV affordability in enhancing reliability. Similarly, DSM level exerts the largest impact of $S_{i} = - 0.750$ , with higher DSM penetration reducing LPSP from 0.0092 to 0.0068. Wind speed variations also play a significant role of $S_{i} = - 0.344$ , improving LPSP from 0.0086 to 0.0075 when wind availability increases by 20%. Battery replacement cost and maximum DoD of $D o D_{\max}$ exert moderate effects, with sensitivity indices of −0.062 and 0.187, respectively. The results show that renewable resource parameters and DSM have a far stronger influence on supply adequacy compared to purely financial variables.

Figure 16.

Sensitivity of LPSP to ±20% variations in key system parameters.

Figure 17 shows the sensitivity of battery lifetime to ±20% variations in system parameters. The base-case battery life is 7.6 years. Diesel price and interest rate have negligible influence, with sensitivity indices of −0.033 and 0.000, respectively. PV capital cost exhibits a positive effect of $S_{i} = 0.197$ , where a 20% reduction extends battery life from 7.3 to 7.9 years, due to reduced cycling stress from higher PV penetration. Battery replacement cost shows a moderate negative impact of $S_{i} = - 0.263$ , shortening life from 7.9 to 7.1 years when costs increase. DSM level exerts a significant positive influence of $S_{i} = 0.428$ , with greater DSM penetration extending life from 6.9 to 8.2 years, reflecting reduced storage burden. Wind speed has a modest positive effect of $S_i = 0.197$ , increasing lifetime from 7.2 to 7.8 years. The strongest influence arises from the maximum DoD of $D o D_{\max}$ , which shows a marked negative sensitivity of $S_{i} = - 0.658$ , deeper cycling reduces lifetime from 8.6 to 6.6 years, underscoring the importance of DoD management in sustaining battery health.

Figure 17.

Sensitivity of battery lifetime to ±20% variations in key system parameters.

Figure 18 shows a correlation matrix between four renewable energy sources such as PV, Wind, Biomass, and Diesel. The values within the matrix represent the correlation coefficients between pairs of these energy sources, ranging from −1 to 1. The diagonal elements, which compare each energy source to itself, are all 1, as they are perfectly correlated. The correlation between PV and Wind is −0.002353, indicating a very weak negative correlation, suggesting that changes in one have a negligible inverse relationship with the other. On the other hand, Biomass and Diesel show a correlation of 0.01264, indicating a very weak positive correlation, meaning that their outputs have a slight tendency to move in the same direction. The color scale, from cyan low correlation to magenta high correlation, visually represents the strength of these relationships. This matrix provides an overview of how independently these renewable sources behave relative to each other, with no strong correlations observed, implying that they can be used complementarily in an energy system.

Figure 18.

Renewable resource complementarity.

Figure 19 shows the sessional profile of renewable resources across four seasons of the year. In each season, energy generation from the four sources is plotted for day and night periods, with the day highlighted in red and the night in blue. During Spring and Summer, PV generation peaks in the midday hours, reaching over 300 kWh, reflecting the high solar irradiance during these seasons. Wind energy shows variability, with higher production during the afternoon, especially in Spring and Summer, where it reaches approximately 150 kWh. Biomass provides a stable contribution across the day, generally ranging between 50 and 100 kWh. Diesel generation remains minimal across all seasons, with energy production typically below 50 kWh, indicating limited use during the day. In Autumn and Winter, PV output decreases significantly due to shorter daylight hours, while Wind generation becomes more prominent, especially in Winter, with values peaking at about 200 kWh. Biomass and Diesel contributions remain consistent across seasons, providing steady energy supply when renewable sources are insufficient.

Figure 19.

Sessional profile of renewable resource.

Figure 20 shows a radar chart comparing the impacts of key system parameters on LCOE and CO₂ emissions. The axes represent Battery Cost, Interest Rate, PV Efficiency, DSM Level, and Diesel Price, with each point indicating the normalized impact of each parameter on the respective objective. Diesel price exhibits the largest influence on both LCOE and CO₂, with LCOE impact peaking at 0.6 and CO₂ impact at 0.55. Battery cost and interest rate also show significant impacts on LCOE but have a lower effect on CO₂ emissions, with values approaching 0.3 and 0.2, respectively. The DSM level and PV efficiency contribute the least to both LCOE and CO₂ impacts, registering values close to 0 in both metrics.

Figure 20.

Parameter impact on LCOE and CO₂.

Figure 21 compares the performance of two EMS such as DQN and Rule-Based EMS, using a radar chart. The chart measures five key metrics: Degradation (Battery Cost), Runtime (Seconds), Emissions (CO₂), Cost Reduction (%), and Reliability (LPSP). In terms of battery cost degradation, the DQN shows significantly better performance, with a value of approximately 0.2 compared to the Rule-Based EMS, which has a value closer to 0.8. Similarly, the emissions of CO₂ for DQN are notably lower, approaching 0.2, while Rule-Based EMS has emissions around 0.6. Regarding cost reduction, DQN achieves a higher reduction of about 0.8, whereas Rule-Based EMS performs around 0.4. DQN also shows better reliability, with LPSP closer to 0.1, compared to Rule-Based EMS, which has an LPSP closer to 0.5. However, Rule-Based EMS has a shorter runtime, closer to 0.1 s, compared to DQN, which takes around 0.6 s.

Figure 21.

Comparison of DQN versus Rule-Based EMS Performance.

Figure 22 shows the impact of increasing the number of village clusters on both LCOE and CO₂ emissions. As the number of clusters rises from 1 to 10, the LCOE decreases from approximately 1.8 USD/kWh to 1.3 USD/kWh, reflecting the economies of scale effect. This is because the system's fixed costs, such as capital and operational expenditures, are distributed over a larger energy output, lowering the per-unit energy cost. Simultaneously, CO₂ emissions decrease from around 0.36 to 0.3 kg/kWh, driven by the higher renewable energy penetration in the system as more clusters are integrated. The reduction in emissions indicates the positive environmental impact of expanding renewable capacity and reducing reliance on fossil fuels, showcasing the benefits of system aggregation in terms of both economic and environmental performance.

Figure 22.

LCOE and CO₂.

Figure 23 shows the degradation timeline for HBES, focusing on both capacity fade and degradation costs over a 10-year period. The capacity fades for Li-ion batteries, which increases linearly over time, starting at around 1% and reaching about 6% by year 10. While the capacity fade of Ni‒Fe batteries, which also increases, but at a higher rate, reaching about 7% by year 10. Alongside this, the degradation costs for each battery type. Li-ion degradation costs increase steadily, starting at around 100 USD and reaching over 700 USD by year 10, while Ni‒Fe degradation costs follow a similar trend, but start higher and grow at a slightly steeper rate, reaching almost 750 USD by year 10.

Figure 23.

Degradation timeline for Li-ion versus Ni‒Fe.

Figure 24 shows the LCOE surface plotted against Li-ion and Ni‒Fe battery sizes. The figure clearly demonstrates that the LCOE increases with both battery sizes, showing a non-linear rise as the capacity of either battery increases. At lower battery sizes ∼50 kWh, the LCOE is relatively low but increases sharply as the size of the batteries grows beyond 100 kWh. There are economies of scale at smaller battery sizes, but larger capacities result in diminishing returns with respect to cost-efficiency. The color bar indicates that the LCOE varies from 1.0 to 5.5 USD/kWh, with the highest values occurring for larger Li-ion and Ni‒Fe battery configurations. While this surface plot visually captures the trade-off between battery size and the overall cost of energy, with a clear indication that there is an optimal region where the combination of Li-ion and Ni‒Fe sizes can minimize costs. The non-linear relationship signifies that merely increasing battery capacity without optimizing the mix may result in higher LCOE, highlighting the importance of carefully considering the trade-offs between the types and sizes of energy storage used in hybrid energy systems.

Figure 24.

LCOE surface versus Battery Sizes.

To quantify the robustness of the optimized hybrid microgrid, a Monte Carlo simulation with 1000 realizations was conducted. As shown in Figure 25, demonstrate narrow distributions for economic indicators and acceptable spread for environmental and reliability metrics. The LCOE exhibits a mean of 0.0555 USD/kWh with a 95% confidence interval (CI) of [0.0553–0.0557 USD/kWh], indicating that economic performance remains stable despite uncertainties. Similarly, the LCC centers around 0.908 million USD, with a 95% CI of [0.907–0.910 million USD], confirming low financial risk. For environmental performance, the annual CO₂ emissions show a wider spread, with values ranging from 0.2 to 1.0 tons/year and an average of 0.42 tons/year, reflecting the influence of diesel generator usage during unfavorable renewable conditions. In terms of reliability, the LPSP distribution remains tightly clustered below 0.005, with a mean of 0.0015. This implies that even under extreme variability, the probability of unmet load remains below 0.5%, ensuring system resilience. The Monte Carlo analysis confirms that the dual-battery EMS strategy provides robust economic viability, environmental sustainability, and high reliability, directly addressing the reviewer's concern about comprehensive uncertainty and resilience evaluation.

Figure 25.

Monte Carlo distributions of LCOE, LCC, CO₂ emissions, and reliability for 1000 stochastic realizations.

Figure 26 shows the battery degradation cost heatmap as a function of Li-ion and Ni‒Fe battery sizes. The color gradient, ranging from black to yellow, represents the costs associated with battery degradation, which are a function of battery cycling and DoD. As the Li-ion battery size increases along the x-axis and the Ni‒Fe battery size increases along the y-axis, the degradation cost steadily rises, especially in the top-right corners of the heatmap, where the larger battery configurations are located. The highest degradation costs occur at the upper-end of the battery size spectrum such as Li-ion size ∼160 kWh and Ni‒Fe size ∼160 kWh, where the degradation cost exceeds 930 USD. This suggests that increasing battery size without a balanced approach may significantly impact long-term maintenance and replacement costs due to higher cycling rates and deeper discharges. The lower-end configurations such as Li-ion ∼40 kWh, Ni‒Fe ∼40 kWh show lower degradation costs ∼840–870 USD, indicating that smaller batteries with more frequent cycling may have reduced wear-and-tear due to less capacity being utilized per cycle.

Figure 26.

Battery degradation cost heatmap (this figure was generated by the authors using Python 3.11.4 with Matplotlib v3.7.1 and Seaborn v0.12.2.).

Figure 27 compares the battery discharge and total operating cost across three different load profiles such as Base Load, Moderate DSM, and Aggressive DSM. The plot shows that as DSM becomes more aggressive, both battery discharge and operating costs decrease. For the Base Load scenario, the battery discharges approximately 2.275 kWh, with a total operating cost of 0.114 USD. Under Moderate DSM, the battery discharge reduces to 0.915 kWh, with the total cost decreasing to 0.046 USD, indicating that load shifting during non-peak hours reduces battery usage and operational costs significantly. The Aggressive DSM scenario further minimizes battery discharge to 0.202 kWh, with the total operating cost dropping to 0.010 USD, demonstrating the effectiveness of aggressive load shifting to optimize battery usage and system costs. This clear reduction in both battery discharge and costs with DSM implementation emphasizes the potential for DSM strategies to reduce battery degradation and overall system costs, improving the financial and operational efficiency of hybrid energy systems.

Figure 27.

Battery usage versus operating cost.

Figure 28 shows the relationship between battery discharge and CO₂ emissions under three distinct DSM strategies such as Base Load, Moderate DSM, and Aggressive DSM. The graph shows that as DSM strategies become more aggressive, both battery discharge and CO₂ emissions significantly decrease. In the Base Load scenario, the system discharges approximately 2.5 kWh from the battery, resulting in 12 kg of CO₂ emissions. When implementing Moderate DSM, battery discharge drops to 1.5 kWh, and CO₂ emissions reduce to 7 kg, reflecting the efficiency gains achieved through shifting load to periods with lower demand. Finally, in the Aggressive DSM scenario, battery discharge is further reduced to 0.25 kWh, and CO₂ emissions drop to 2 kg, demonstrating the effectiveness of more aggressive load shifting in reducing both battery usage and emissions. This inverse relationship highlights how DSM strategies, particularly the aggressive ones, optimize energy consumption, reduce reliance on battery discharge, and, as a result, minimize CO₂ emissions, making the energy system more environmentally sustainable and cost-effective.

Figure 28.

Battery usage versus emissions across DSM strategies.

Figure 29 shows the comparison of CO₂ emissions intensity for three different battery dispatch strategies such as LF, CC, and DQN. The chart shows the CO₂ emissions for each strategy, with DQN showing the lowest emissions at $0.1700 kg / kWh$ , followed by CC at $0.2380 kg / kWh$ , and LF with the highest emissions at $0.2800 kg / kWh$ . DQN, utilizing RL to optimize battery dispatches, significantly reduces emissions compared to the traditional LF and CC methods, which exhibit higher emission intensities due to less efficient use of renewable energy sources. This highlights the potential for DQN-based optimization to provide substantial reductions in CO₂ emissions, making it a more environmentally sustainable solution in HRES. The trend clearly shows that intelligent dispatch strategies, such as DQN, contribute to the greening of energy systems by minimizing reliance on fossil fuels and reducing the carbon footprint.

Figure 29.

${CO}_{2}$ emission intensity comparison.

Conclusion

This study presents a robust techno-economic framework for optimizing the design and operation of off-grid HRES integrated with a HBESS and a reinforcement learning-based energy management strategy utilizing DQN. The system architecture incorporates multiple energy sources, including PV, Wind Turbine, Biomass Generator, and Diesel Generator, complemented by Li-ion and Ni‒Fe batteries. By applying metaheuristic optimization algorithms such as SSA, PSO, GA, GWO and DQN, the model optimizes both component sizing and operational scheduling. The results indicate that the DQN-based energy management system reduces the total LCC by more than 20% compared to conventional strategies, while maintaining a high level of reliability with a LPSP of less than 0.01. The integration of DQN enhances the system's ability to adapt to variations in resource availability and load, improving both operational efficiency and environmental sustainability. Sensitivity analyses confirm that the model is resilient to variations in key parameters, such as fuel costs and resource availability, with diesel price showing the most significant effect on LCOE. The proposed system also minimizes CO₂ emissions by optimizing the use of renewable energy sources, with a reduction of emissions by up to 30% under varying operational conditions. These findings emphasize the practical benefits of combining hybrid storage with intelligent reinforcement learning for optimizing the performance and sustainability of off-grid microgrids. Future work will focus on refining the DQN strategy through hardware-in-the-loop validation and exploring DRL for real-time deployment.

Future work

While the present study establishes a robust techno-economic framework for HRES integrated with hybrid battery energy storage and reinforcement learning-based control, several avenues remain open for further research:

Hardware-in-the-loop validation and pilot deployment: Future work will focus on validating the proposed framework using hardware-in-the-loop platforms and real-world pilot-scale microgrids. This will allow testing under realistic conditions, including inverter derating in high-temperature environments, generator run-time constraints, and long-term battery self-discharge.

Computational scalability: Applying the optimization framework to larger microgrid topologies, extended horizons, and multi-year datasets will help evaluate computational efficiency at scale. Parallelized and distributed implementations will also be explored to enhance convergence for real-time operation.

Advanced AI and hybrid learning approaches: The integration of more advanced deep reinforcement learning (DRL) methods such as actor critic architectures, proximal policy optimization (PPO), and multi-agent reinforcement learning (MARL) will be investigated. Coupling DRL with physics-informed neural networks (PINNs) could further improve interpretability and robustness by embedding physical laws into learning.

Multi-objective optimization: Expanding beyond cost minimization, future studies will incorporate sustainability metrics, resilience indices, and risk-aware planning to balance economics, technical performance, and environmental impacts under uncertainty.

Emerging storage technologies and circular economy: Future frameworks will consider second-life EV batteries, supercapacitors, and hydrogen storage alongside hybrid battery systems. These not only enhance flexibility and lifetime but also align with circular economy principles and policy-driven sustainability targets.

Footnotes

ORCID iDs

Wajid Khan

Mebratu Sintie Geremew

Author contributions

Wajid Khan, Feng Renhai, and Abdul Aziz contributed to conceptualization, methodology, software, visualization, investigation, and writing‒original draft preparation. Muhammad Zain Yousaf, Zhi Cai, and Muhammad Umair Iqbal contributed to data curation, validation, supervision, resources, and writing‒review and editing. Jiang Wang, Mustafa Abdullah, and Mebratu Sintie Geremew contributed to project administration, supervision, resources, and writing‒review and editing.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability

Availability of data and materials datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.

Abbreviations

References

Abdalla

Nazir

Tiezhu

, et al. (2021) Optimized economic operation of microgrid: Combined cooling and heating power and hybrid energy storage systems. Journal of Energy Resources Technology 143(7): 070906.

Abraham

Chandrasekar

Rajamanickam

, et al. (2023) Fuzzy-based efficient control of DC microgrid configuration for PV-energized EV charging station. Energies 16(6): 2753.

Afgan

Carvalho

(2008) Sustainability assessment of a hybrid energy system. Energy Policy 36(8): 2903–2910.

Agajie

Ibrahim

Amoussou

, et al. (2025) Comparative techno-economic analysis of grid-connected solar PV-battery and PV-fuel cell systems for educational institutions sustainable academic laboratories. Discover Sustainability 6(1): 1–31.

Akter

Zafir

Dana

, et al. (2024) A review on microgrid optimization with meta-heuristic techniques: Scopes, trends and recommendation. Energy Strategy Reviews 51: 101298.

Alkanhel

El-Kenawy

ESM

Eid

, et al. (2024) Optimizing IoT-driven smart grid stability prediction with dipper throated optimization algorithm for gradient boosting hyperparameters. Energy Reports 12: 305–320.

Al-Najjar

Pfeifer

Al Afif

, et al. (2020) Estimated view of renewable resources as a sustainable electrical energy source, case study. Designs 4(3): 32.

Anitha

Suchitra

Uthra

, et al. (2025) Power sharing in an autonomous microgrid with hybrid energy sources. In: Energy Conversion Systems-Based Artificial Intelligence. Singapore: Springer, 157–178.

Ashetehe

Shewarega

Bantyirga

, et al. (2024) Optimal design of off-grid hybrid system using a new zebra optimization and stochastic load profile. Scientific Reports 14(1): 29255.

10.

Barakat

Guven

Abdelaziz

, et al. (2026) A comprehensive review of electric vehicles and sustainable urban mobility in the Middle East and North Africa. Renewable and Sustainable Energy Reviews 225: 116154.

11.

Choudhury

Varghese

Mohanty

, et al. (2023) Energy management and power quality improvement of microgrid system through modified water wave optimization. Energy Reports 9: 6020–6041.

12.

Dashtdar

Bajaj

Hosseinimoghadam

SMS

(2022a) Design of optimal energy management system in a residential microgrid based on smart control. Smart Science 10(1): 25–39.

13.

Dashtdar

Nazir

Hosseinimoghadam

SMS

, et al. (2022b) Improving the sharing of active and reactive power of the islanded microgrid based on load voltage control. Smart Science 10(2): 142–157.

14.

Davoudkhani

Zare

Abdelaziz

, et al. (2024a) Robust load-frequency control of islanded urban microgrid using 1PD-3DOF-PID controller including mobile EV energy storage. Scientific Reports 14(1): 13962.

15.

Davoudkhani

Zare

Shenava

SJS

, et al. (2024b) Maiden application of mountaineering team-based optimization algorithm optimized 1PD-PI controller for load frequency control in islanded microgrid with renewable energy sources. Scientific Reports 14(1): 22851.

16.

Dhundhara

Verma

Williams

(2018) Techno-economic analysis of the lithium-ion and lead-acid battery in microgrid systems. Energy Conversion and Management 177: 122–142.

17.

Dong

Zhu

, et al. (2022) How green technology innovation affects carbon emission efficiency: Evidence from developed countries proposing carbon neutrality targets. Environmental Science and Pollution Research 29(24): 35780–35799.

18.

Dunna

Chandra

KPB

Rout

, et al. (2024) Super-twisting MPPT control for grid-connected PV/battery system using higher order sliding mode observer. Scientific Reports 14(1): 16597.

19.

Elhadidy

(2002) Performance evaluation of hybrid (wind/solar/diesel) power systems. Renewable Energy 26(3): 401–413.

20.

El-Khozondar

El-batta

(2022) Solar energy implementation at the household level: Gaza strip case study. Energy, Sustainability and Society 12(1): 17.

21.

El-Khozondar

El-batta

El-Khozondar

, et al. (2023) Standalone hybrid PV/wind/diesel-electric generator system for a COVID-19 quarantine center. Environmental Progress & Sustainable Energy 42(3): e14049.

22.

El-Khozondar

EL-Khozondar

Nassar

, et al. (2025) Technical-economical-environmental assessment of grid-connected hybrid renewable energy power system for gaza strip-palestine. Engineering Science and Technology, an International Journal 69: 102120.

23.

El-Khozondar

Mtair

Qoffa

, et al. (2024) A smart energy monitoring system using ESP32 microcontroller. e-Prime-Advances in Electrical Engineering, Electronics and Energy 9: 100666.

24.

Fara

Finta

Iancu

, et al. (2006) Products of HYPOS-DILETR project: Distance learning courses in design and operation of Hybrid Power Systems. In 2006 IEEE International Conference on Automation, Quality and Testing, Robotics, 25–28 May 2006, Cluj-Napoca, Romania.

25.

Güven

(2024a) Heuristic techniques and evolutionary algorithms in microgrid optimization problems. In: Pandey

Padmanaban

Tripathi

, et al. (eds) Microgrid. Boca Raton, FL: CRC Press, 260–301.

26.

Güven

(2024b) Integrating electric vehicles into hybrid microgrids: A stochastic approach to future-ready renewable energy solutions and management. Energy 303: 131968.

27.

Güven

(2025) A novel hybrid salp swarm Kepler optimization for optimal sizing and energy management of renewable microgrids with EV integration. Energy 137696.

28.

Guven

Abdelaziz

Samy

, et al. (2024) Optimizing energy dynamics: A comprehensive analysis of hybrid energy storage systems integrating battery banks and supercapacitors. Energy Conversion and Management 312: 118560.

29.

Güven

Ateş

Alotaibi

, et al. (2025a) Sustainable hybrid systems for electric vehicle charging infrastructures in regional applications. Scientific Reports 15(1): 4199.

30.

Güven

Barakat

Samy

(2024a) Optimal design and cost analysis of a hybrid renewable energy system for a small hotel based on the Arctic puffin optimization algorithm. In: 2024 25th International Middle East Power System Conference (MEPCON), 17–19 December 2024, Cairo, Egypt.

31.

Güven

Hassan

Kamel

(2025b) Optimization of a hybrid microgrid for a small hotel using renewable energy and EV charging with a quadratic interpolation beluga whale algorithm. Neural Computing and Applications 37(5): 3973–4008.

32.

Güven

Kamel

Hassan

(2025c) Optimization of grid-connected photovoltaic/wind/battery/supercapacitor systems using a hybrid artificial gorilla troops optimizer with a quadratic interpolation algorithm. Neural Computing and Applications 37(4): 2497–2535.

33.

Güven

Mengi

OÖ

(2023) Assessing metaheuristic algorithms in determining dimensions of hybrid energy systems for isolated rural environments: Exploring renewable energy systems with hydrogen storage features. Journal of Cleaner Production 428: 139339.

34.

Güven

Mengi

OÖ

Bajaj

, et al. (2025d) Integrated techno-economic optimization and metaheuristic benchmarking of grid-connected hybrid renewable energy systems using real-world load and climate data. e-Prime-Advances in Electrical Engineering, Electronics and Energy 101099.

35.

Güven

Mete

(2021) Balikesir İli Erdek İlçesi İçin Bağimsiz Hibrit Enerji Sisteminin Fizibilite Çalişmasi Ve Ekonomik Analizi. Konya Journal of Engineering Sciences 9(4): 1063–1076.

36.

Güven

Poyraz

(2021) Feasibility study and techno-economic analysis of stand-alone hybrid energy system for Muğla Province Köyceğiz. Karadeniz Fen Bilimleri Dergisi 11(1): 70–85.

37.

Güven

Rizk-Allah

(2025) Optimal configuration framework of hybrid renewable energy technologies-based hydrogen energy storage system assessment using enhanced artificial rabbit algorithm. Energy 326: 135408.

38.

Güven

Samy

(2022) Performance analysis of autonomous green energy system based on multi and hybrid metaheuristic optimization approaches. Energy Conversion and Management 269: 116058.

39.

Güven

Türkmen

Aşıklı

, et al. (2023a) Investigating the effects of different types of battery impacts in energy storage systems on standalone hybrid renewable energy systems. Karadeniz Fen Bilimleri Dergisi 13(3): 943–964.

40.

Güven

Yörükeren

Mengi

OÖ

(2024b) Multi-objective optimization and sustainable design: A performance comparison of metaheuristic algorithms used for on-grid and off-grid hybrid energy systems. Neural Computing and Applications 36(13): 7559–7594.

41.

Güven

Yörükeren

Samy

(2022) Design optimization of a stand-alone green energy system of university campus based on Jaya-Harmony Search and Ant Colony optimization algorithms approaches. Energy 253: 124089.

42.

Güven

Yörükeren

Tag-Eldin

, et al. (2023b) Multi-objective optimization of an islanded green energy system utilizing sophisticated hybrid metaheuristic approach. IEEE Access 11: 103044–103068.

43.

Güven

Yücel

(2023) Application of HOMER in assessing and controlling renewable energy-based hybrid EV charging stations across major Turkish cities. International Journal of Energy Studies 8(4): 747–780.

44.

Güven

Yücel

(2025) Sustainable energy integration and optimization in microgrids: Enhancing efficiency with electric vehicle charging solutions. Electrical Engineering 107(2): 1541–1573.

45.

IEA . (2025) Global Energy Review 2025. IEA (2025), Global Energy Review 2025, IEA, Paris. https://www.iea.org/reports/global-energy-review-2025. Licence: CC BY 4.0

46.

Imbayah

El-Khozondar

Hasan

MHM

, et al. (2025) Modeling and simulation control of MPPT in solar array for green hydrogen production: التحكم في النمذجة والمحاكاةتتبع نقطة القدرة القصوى لمصفوفة الطاقة الشمسية لإنتاج الهيدروجين الأخضر. Solar Energy and Sustainable Development Journal 14(1): 379–393.

47.

Imbayah

Hasan

El-Khozondare

, et al. (2024) Review paper on green hydrogen production, storage, and utilization techniques in Libya. Journal of Solar Energy and Sustainable Development 13(1): 1–21.

48.

Infield

(1994) Wind diesel design and the role of short term flywheel energy storage. Renewable Energy 5(1–4): 618–625.

49.

Karthik

Rajagopalan

Bajaj

, et al. (2024) Chaotic self-adaptive sine cosine multi-objective optimization algorithm to solve microgrid optimal energy scheduling problems. Scientific Reports 14(1): 18997.

50.

Khaleel

Yusupov

Guneser

, et al. (2024) Towards hydrogen sector investments for achieving sustainable electricity generation. Journal of Solar Energy and Sustainable Development 13(1): 71–96.

51.

Khan

Minai

Godi

, et al. (2025) Optimal sizing, techno-economic feasibility and reliability analysis of hybrid renewable energy system: A systematic review of energy storage systems’ integration. IEEE Access 13: 59198–59226.

52.

Khazali

Al-Wreikat

Fraser

, et al. (2024) Planning a hybrid battery energy storage system for supplying electric vehicle charging station microgrids. Energies 17(15): 3631.

53.

Khosravi

Baghbanzadeh

Oubelaid

, et al. (2023) A novel control approach to improve the stability of hybrid AC/DC microgrids. Applied Energy 344: 121261.

54.

Kumar

Alluraiah

Gopi

, et al. (2025) Techno-economic optimization and sensitivity analysis of off-grid hybrid renewable energy systems: A case study for sustainable energy solutions in rural India. Results in Engineering 25: 103674.

55.

Kumar

Yashaswini

Sharma

, et al. (2023, March) Microgrid systems with classical primary control techniques—a review. In: International Conference on Renewable Power. Singapore: Springer Nature Singapore, 75–83.

56.

Makhzom

Eshdok

Nassar

, et al. (2023) Estimation of CO₂ emission factor for power industry sector in Libya. In: 2023 8th International Engineering Conference on Renewable Energy & Sustainability (ieCRES), 08–09 May 2023, Gaza, State of Palestine.

57.

Manzoor

Judge

Islam

, et al. (2023) AHHO: Arithmetic Harris Hawks optimization algorithm for demand side management in smart grids. Discover Internet of Things 3(1): 3.

58.

Mekhilef

Faramarzi

Saidur

, et al. (2013) The application of solar technologies for sustainable development of agricultural sector. Renewable and Sustainable Energy Reviews 18: 583–594.

59.

Modu

Abdullah

Bukar

, et al. (2023) A systematic review of hybrid renewable energy systems with hydrogen storage: Sizing, optimization, and energy management strategy. International Journal of Hydrogen Energy 48(97): 38354–38373.

60.

Mohammed

Mustafa

Bashir

(2014) Hybrid renewable energy systems for off-grid electric power: Review of substantial issues. Renewable and Sustainable Energy Reviews 35: 527–539.

61.

Mohanty

Panda

Parida

, et al. (2022) Demand side management of electric vehicles in smart grids: A survey on strategies, challenges, modeling, and optimization. Energy Reports 8: 12466–12490.

62.

Molu

RJJ

Naoussi

SRD

Bajaj

, et al. (2024) A techno-economic perspective on efficient hybrid renewable energy solutions in Douala, Cameroon’s grid-connected systems. Scientific Reports 14(1): 13590.

63.

Nadimuthu

LPR

Victor

Bajaj

, et al. (2024) Feasibility of renewable energy microgrids with vehicle-to-grid technology for smart villages: A case study from India. Results in Engineering 24: 103474.

64.

Nagarajan

Rajagopalan

Bajaj

, et al. (2024) Optimizing dynamic economic dispatch through an enhanced cheetah-inspired algorithm for integrated renewable energy and demand-side management. Scientific Reports 14(1): 3091.

65.

Nagarajan

Rajagopalan

Bajaj

, et al. (2025) Improved lyrebird optimization for multi microgrid sectionalizing and cost efficient scheduling of distributed generation. Scientific Reports 15(1): 17345.

66.

Nassar

El-khozondar

Ahmed

, et al. (2024) A new design for a built-in hybrid energy system, parabolic dish solar concentrator and bioenergy (PDSC/BG): A case study–Libya. Journal of Cleaner Production 441: 140944.

67.

Nassar

El-Khozondar

Fakher

(2025) The role of hybrid renewable energy systems in covering power shortages in public electricity grid: An economic, environmental and technical optimization analysis. Journal of Energy Storage 108: 115224.

68.

Olatomiwa

Mekhilef

Ismail

, et al. (2016) Energy management strategies in hybrid renewable energy systems: A review. Renewable and Sustainable Energy Reviews 62: 821–835.

69.

Pablo-Romero

MdP

Sánchez-Braza

González-Jara

(2023) Economic growth and global warming effects on electricity consumption in Spain: A sectoral study. Environmental Science and Pollution Research 30(15): 43096–43112.

70.

Panda

Mohanty

Rout

, et al. (2022) Residential demand side management model, optimization and future perspective: A review. Energy Reports 8: 3727–3766.

71.

Panda

Mohanty

Rout

, et al. (2023) A comprehensive review on demand side management and market design for renewable energy support and integration. Energy Reports 10: 2228–2250.

72.

Panda

Samanta

Sahoo

, et al. (2025) Comprehensive framework for smart residential demand side management with electric vehicle integration and advanced optimization techniques. Scientific Reports 15(1): 9948.

73.

Paul

Jyothi

Kumar

, et al. (2025) Optimizing sustainable energy management in grid connected microgrids using quantum particle swarm optimization for cost and emission reduction. Scientific Reports 15(1): 5843.

74.

Pinthurat

Surinkaew

Hredzak

(2024) An overview of reinforcement learning-based approaches for smart home energy management systems with energy storages. Renewable and Sustainable Energy Reviews 202: 114648.

75.

Prasad

Devakirubakaran

Muthubalaji

, et al. (2022) Power management in hybrid ANFIS PID based AC–DC microgrids with EHO based cost optimized droop control strategy. Energy Reports 8: 15081–15094.

76.

Protogeropoulos

Brinkworth

Marshall

(1997) Sizing and techno-economical optimization for hybrid solar photovoltaic/wind power systems with battery storage. International Journal of Energy Research 21(6): 465–479.

77.

Rajagopalan

Nagarajan

Bajaj

, et al. (2024) Multi-objective energy management in a renewable and EV-integrated microgrid using an iterative map-based self-adaptive crystal structure algorithm. Scientific Reports 14(1): 15652.

78.

Ramesh

Saini

(2020) Dispatch strategies based performance analysis of a hybrid renewable energy system for a remote rural area in India. Journal of Cleaner Production 259: 120697.

79.

Ramli

Bouchekara

Alghamdi

(2018) Optimal sizing of PV/wind/diesel hybrid microgrid system using multi-objective self-adaptive differential evolution algorithm. Renewable Energy 121: 400–411.

80.

Safi

Chen

Wahab

, et al. (2021) Does environmental taxes achieve the carbon neutrality target of G7 economies? Evaluating the importance of environmental R&D. Journal of Environmental Management 293: 112908.

81.

Sahoo

Choudhury

Rathore

, et al. (2023a) A novel prairie dog-based meta-heuristic optimization algorithm for improved control, better transient response, and power quality enhancement of hybrid microgrids. Sensors 23(13): 5973.

82.

Sahoo

Choudhury

Rathore

, et al. (2023b) Scaled conjugate-artificial neural network-based novel framework for enhancing the power quality of grid-tied microgrid systems. Alexandria Engineering Journal 80: 520–541.

83.

Selvaraj

Rajangam

Vishnuram

, et al. (2024) Optimal power scheduling in real-time distribution systems using crow search algorithm for enhanced microgrid performance. Scientific Reports 14(1): 30982.

84.

Shaaban

Mohamed

Ismail

, et al. (2019) Joint planning of smart EV charging stations and DGs in eco-friendly remote hybrid microgrids. IEEE Transactions on Smart Grid 10(5): 5819–5830.

85.

Sharma

Sood

Sharma

, et al. (2022) Modeling and sensitivity analysis of grid-connected hybrid green microgrid system. Ain Shams Engineering Journal 13(4): 101679.

86.

Singh

Dey

Bajaj

, et al. (2025a) Advanced microgrid optimization using price-elastic demand response and greedy rat swarm optimization for economic and environmental efficiency. Scientific Reports 15(1): 2261.

87.

Singh

Kumar

Bajaj

, et al. (2024a) Machine learning-based energy management and power forecasting in grid-connected microgrids with multiple distributed energy sources. Scientific Reports 14(1): 19207.

88.

Singh

Kumar

Bajaj

, et al. (2025b) A blockchain consortium-based framework to enhance interoperability, standardization, and secure demand response management in smart grid applications. Results in Engineering 106056.

89.

Singh

Kumar

Madhavi

, et al. (2024b) Optimizing demand response and load balancing in smart EV charging networks using AI integrated blockchain framework. Scientific Reports 14(1): 31768.

90.

Singh

Pandit

Srivastava

(2023) Multi-objective optimal sizing of hybrid micro-grid system using an integrated intelligent technique. Energy 269: 126756.

91.

Thirumalai

Suresh

Hemalatha

, et al. (2025) Cheetah optimization-based smart energy management for appliance scheduling and DER integration in residential and commercial-industrial grids. Energy Conversion and Management X: 101192.

92.

Ullah

Al-Rahmi

Alblehai

, et al. (2024) Blockchain-powered grids: Paving the way for a sustainable and efficient future. Heliyon 10(10): e31592.

93.

Valenciaga

Puleston

(2005) Supervisor control for a stand-alone hybrid generation system using wind and photovoltaic energy. IEEE transactions on Energy Conversion 20(2): 398–405.

94.

Zhou

Liu

, et al. (2022) Bi-level framework for microgrid capacity planning under dynamic wireless charging of electric vehicles. International Journal of Electrical Power & Energy Systems 141: 108204.

Deep reinforcement learning-based energy management for design and control of off-grid renewable microgrids with dual-battery storage

Abstract

Keywords

Introduction

Literature review

Green energy integration and policy context

Energy system optimization approaches

Hybrid battery energy storage systems (HBESS)

Artificial intelligence and reinforcement learning

Sustainability and emerging directions

Research gaps

Methodology

Renewable energy system modeling

PV power generation

Wind power generation

Biomass generator power

Diesel generator power

Hybrid battery modeling

Load balance and energy flow

Objective optimization

Reinforcement learning (DQN)

Sensitivity analysis

Case study

Conclusion

Future work

Footnotes

ORCID iDs

Author contributions

Funding

Declaration of Conflicting Interests

Data availability

Abbreviations

References