Sage Journals: Discover world-class research

Abstract

Achieving accurate trajectory tracking in Active Denial System (ADS) remains a complex and pressing task, particularly for real-time tracking-and-pointing operations. In this context, we begin by examining the stringent accuracy demands of such systems and put forward an adaptive control strategy that integrates Integral Sliding Mode Control (ISMC) with an improved Double Deep Q-Network (AdaDDQN). The proposed method employs a high-dimensional state-space framework coupled with a context-aware reward mechanism, enabling the controller to adaptively fine-tune its parameters on the fly. This hybrid approach harnesses the learning flexibility of Reinforcement Learning (RL) while preserving the inherent stability of SMC. Experimental evaluations reveal that this method achieves a significant improvement—over 80% gain in tracking precision compared to traditional sliding mode schemes—maintaining the tracking error consistently within 8 mrad. Moreover, it demonstrates strong resilience against mechanical friction, external interference, and model uncertainties, ensuring both accuracy and robustness under real-world operational constraints.

Keywords

active denial system integral sliding mode control reinforcement learning Double Deep Q-network trajectory tracking

Introduction

The Active Denial System (ADS) represents a class of non-lethal directed-energy technologies that project microwave radiation toward a target using a continuous-wave transmitter operating in the 94–95 GHz range (∼0.1 THz), delivering output power between 10 and 100 kW.^1–3 This radiation travels at the speed of light and induces a thermal effect roughly 0.4 mm beneath the surface of human skin—sufficient to stimulate pain receptors within seconds and trigger a strong withdrawal response. Notably, once the exposure ceases or the individual moves beyond the beam’s reach, the sensation dissipates rapidly. Due to its controllable physiological impact without lasting harm, ADS offers a compelling solution for crowd dispersion, conflict de-escalation, and minimizing collateral injury.⁴

In parallel with ADS technology, the development of precise tracking and pointing platforms has become increasingly critical. These servo control systems are essential for ensuring that the microwave beam consistently targets the intended area with minimal deviation, especially under dynamic conditions. Among control strategies, robust control and Sliding Mode Control (SMC) are frequently favored due to their resilience against model uncertainties and external disturbances.^5–7 More recently, Deep Reinforcement Learning (DRL) has emerged as a promising paradigm, offering real-time adaptability and enhanced perception in high-dimensional environments.^8–11 Algorithms like Deep Q-Network (DQN) and Double Deep Q-Network (DDQN) have addressed stability and policy evaluation biases, paving the way for intelligent, self-adjusting control frameworks. In this context, combining the adaptability of DRL with the reliability of SMC is gaining traction as a promising direction for high-precision tracking applications.^12–18 In addition to these developments, several recent works have explored advanced control strategies for complex dynamic systems. Regarding intelligent decision-making in networked environments, Cho et al.¹⁹ proposed a resilient algorithm combining multi-agent reinforcement learning (MARL) with message-passing mechanisms to ensure robust operation of UAVs under unreliable network conditions. While such distributed learning frameworks demonstrate superior resilience, high-precision tracking tasks for servo systems often require more focused compensation for local dynamic nonlinearities; To address specific actuator nonlinearities such as dead-zones, Yuan et al.²⁰ developed a time-scale decoupled adaptive prescribed-time control scheme. Their method cleverly utilizes fuzzy logic to compensate for unknown dead-zones without prior knowledge, ensuring convergence within a user-specified time. Similarly, addressing the constraints of communication bandwidth and system security, Zhang et al.²¹ introduced a periodic event-triggering adaptive control strategy capable of mitigating actuator attacks and exogenous disturbances in networked uncertain nonlinear systems. Although these state-of-the-art methods have achieved remarkable success in their respective domains (network resilience, dead-zone compensation, and security), controlling the Active Denial System (ADS) presents unique challenges, particularly regarding constant unmodeled loads and the trade-off between rapid convergence and steady-state precision.

Therefore, applying such techniques in real-world ADS scenarios remains challenging. Specific control performance indicators must be identified based on the unique characteristics of microwave-based systems. A meaningful control framework must not only achieve high accuracy but also operate robustly in the face of chattering, mismatched disturbances, and hardware-induced constraints. These include friction, backlash, input saturation, quantization noise, and sampling delay, all of which influence closed-loop dynamics and complicate controller design.

Moreover, while RL introduces adaptability, it also raises new concerns: function approximation errors, generalization gaps, and violations of Lyapunov stability can compromise safety. Although methods like Constrained Policy Optimization and Control Barrier Functions offer theoretical safeguards, they are often difficult to implement under strict real-time or high-disturbance conditions.

To address these challenges, this study first establishes quantitative control accuracy requirements—specifically, ensuring that the ADS tracking error remains within 10 mrad. On this basis, we propose a novel adaptive control scheme that integrates an improved DDQN (AdaDDQN) into the structure of Integral Sliding Mode Control (ISMC). The approach features a multi-dimensional state-space formulation combined with an intelligent reward mechanism, allowing for online parameter adjustment and real-time learning during tracking tasks.

At the core of the method lies a redesigned integral sliding variable, which includes cumulative error compensation and decomposes the control torque into equivalent, switching, and adaptive components. RL determines the angular velocity increments, enabling the controller to respond swiftly to rapid trajectory changes while retaining SMC’s characteristic robustness. To address chattering, the system incorporates diversity metrics, sliding variable history, and torque transformation statistics, which enhance smoothness perception and suppress excessive torque fluctuation.

Experimental validations confirm that the proposed method outperforms traditional SMC, achieving an improvement of over 80% in tracking accuracy and maintaining errors consistently within 8 mrad. Even in the presence of structural friction and environmental disturbances, the system demonstrates stable, high-precision performance.

The primary contributions of this work include:

Development of a smoothness-enhanced adaptive control framework which seamlessly integrates the adaptive decision-making of AdaDDQN with the theoretical stability of Integral Sliding Mode Control (ISMC). This hybrid framework effectively handles both unmodeled dynamics and transient disturbances.

Proposing a RL decision-making module based on AdaDDQN, capable of adaptively optimizing angular velocity adjustments and key control parameters in a multi-dimensional state space, enabling robust tracking under changing inertial and disturbance conditions.

The proposed strategy is rigorously implemented and validated on a physical ADS tracking-and-pointing platform. Comprehensive experiments demonstrate that it achieves the best performance metrics among all tested methods, maintaining the tracking error consistently within 8 mrad and fully satisfying the stringent beam coverage requirements.

Precision analysis and system modeling

Precision analysis

Given the operational demands of microwave-based ADS, achieving high-precision tracking and pointing is essential to ensure continuous, effective energy projection while minimizing unintended damage. To formalize the control objectives of such systems, this section derives quantitative tracking and pointing accuracy criteria by analyzing both the diffraction behavior of 94 GHz millimeter waves and a probabilistic hit model.

As a starting point, we consider the inherent beam divergence characteristics of the system. With the operating frequency set at 94 GHz and the antenna aperture $D_{w}$ specified as 0.2 m, the beam divergence angle $θ_{w}$ can be estimated using the Rayleigh diffraction criterion. Assuming a beam quality factor $β_{w}$ the divergence angle is given by the following expression:

θ_{w} = \frac{1.22 λ_{w} β_{w}}{D_{w}}

(1)

By inserting the given parameters into the Rayleigh-based formula, the resulting theoretical beam divergence is calculated to be approximately 19.5 mrad. At an operational distance of 250 m—a typical engagement range for ADS—this divergence directly limits the physical spread of the microwave beam.

To establish an accuracy threshold grounded in stable aiming probability, it is essential to distinguish between two primary error sources: static pointing error (a systematic offset) and dynamic tracking error (a random fluctuation). The latter typically appears as line-of-sight jitter with respect to the moving target and is commonly modeled using a Gaussian distribution. For the microwave beam to consistently maintain coverage over the target, this jitter must remain within a controllable range. Leveraging the statistical properties of Gaussian noise, the stable aiming probability $P$ can be expressed as a function of the beam diameter $D$ and the root-mean-square tracking error (RMSE) $σ$ (i.e. $Δ θ$ ) through the following integral relationship:

P = \int_{Δ θ_{0}}^{\infty} F (Δ θ) d Δ θ = 1 - P' = 1 - \exp [- {(\frac{k D}{2})}^{2} / 2 σ^{2}]

(2)

Here, the hit probability $F$ can be expressed using the probability characteristics of a Gaussian distribution as follows:

F (θ_{e}) = \frac{1}{σ \sqrt{2 π}} e^{- \frac{(θ_{e} - \bar{x_{p}})}{2 σ^{2}}}

(3)

Where $θ_{e}$ represents the aiming error. As indicated by the above equation, its characteristics follow a normal distribution, where $\bar{x_{p}}$ denotes the mean error and $σ$ represents the root mean square (RMS) error.

The hit probability also satisfies the condition:

\int_{0}^{\infty} F (θ_{e}) d θ_{e} = 1

(4)

Letting $Δ θ$ represent the precision requirement, and the RMS error be defined as $σ = Δ θ L_{w}$ , the stable aiming probability $P$ can be expressed by the equation:

P = 1 - \exp [- \frac{{(1.22 k β_{w} λ_{w})}^{2}}{2 Δ θ^{2} D_{w}^{2}}]

(5)

Figure 1 presents a schematic representation of the microwave beam diameter and the associated accuracy requirements, while Figure 2 illustrates how the ratio between beam divergence and line-of-sight jitter influences the tracking precision at a distance of 250 m. The analysis suggests that to achieve stable and high-precision tracking, the jitter of the microwave spot must remain within half of the beam diameter. Under this constraint, the system’s allowable tracking error must be tightly bounded. Simulation results indicate that when the tracking accuracy satisfies $Δ θ \leq 10$ mrad, the probability of the irradiation spot remaining fully within this limit (i.e. not exceeding half the beam diameter) is approximately 99.87%. This high level of confidence ensures that the target remains effectively “locked” within the beam path, preventing unintended escape.

Figure 1.

Beam divergence and tracking accuracy constraints.

Figure 2.

Effect of beam diameter and tracking accuracy constraints at 250 m.

In light of these findings, and considering the nonlinear disturbances introduced by platform motion at the 250 m operating range, the tracking-and-pointing system is required to meet a maximum RMSE of $Δ θ \leq 10$ mrad. This threshold serves as a design benchmark for the control strategy and parameter tuning throughout the rest of this work.

System dynamics model and control objectives

To effectively manage the tracking-and-pointing control task for an ADS (Figure 3), we begin by establishing a dynamic model that captures the essential coupling behaviors of the system. The tracking platform is represented as a rigid-body mechanism with two degrees of freedom (2-DOF). Let the joint-angle vector be defined as $q = {[θ, β]}^{T}$ , where $θ$ represents the azimuth angle and $β$ represents the elevation angle. The coupled dynamic equations governing the motion of the system are formulated as follows:

M (q) \overset{\cdot\cdot}{q} + C (q, \overset{\cdot}{q}) \overset{\cdot}{q} + G (q) + D \overset{\cdot}{q} + τ_{d} = τ

(6)

Figure 3.

Dynamic model of two-axis tracking and pointing platform.

Figure 3 illustrates the structural and functional configuration of the tracking-and-pointing platform within ADS. The azimuth axis rotates around the vertical vector to control the horizontal heading, providing panoramic field coverage, while the pitch axis rotates around the horizontal vector to adjust the vertical tilt of the payload.

Based on the system’s physical properties, the corresponding mathematical formulations for its key dynamic components are defined as follows:

The positive-definite inertia matrix $M (q)$ characterizes the system’s mass distribution and satisfies as $m_{1} I < M (q) < m_{2} I$ .

M (q) = [\begin{matrix} I_{θ} + m L^{2} \sin^{2} β & 0 \\ 0 & I_{β} + m L^{2} \end{matrix}]

(7)

The coupling effects during rotational motion are described as follows:

C (q, \overset{\cdot}{q}) = [\begin{matrix} 0 & - m L^{2} \overset{\cdot}{β} \sin β \cos β \\ m L^{2} \overset{\cdot}{θ} \sin β \cos β & 0 \end{matrix}]

(8)

The gravitational effect $G (q)$ primarily influences the elevation axis, which can be expressed as follows:

G (q) = [\begin{matrix} 0 \\ mgLcos β \end{matrix}]

(9)

The damping matrix $D$ and disturbance vector $τ_{d}$ represent the system’s inherent damping and external non-matching disturbances, respectively.

The control objective is to ensure that the system meets the prescribed RMSE criteria by driving the actual state vector $q$ to closely follow the desired trajectory $q_{d}$ . The system tracking error is defined as follows:

e (t) = q_{d} (t) - q (t)

(10)

RMSE can be obtained as follows:

RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {| | e_{i} | |}^{2}}

(11)

ISMC and improved RL-based adaptive optimization strategy

Construction of integral sliding variable

The ISMC is constructed by first defining a suitable sliding variable, which shapes the system’s motion to achieve the desired dynamic response. This design helps eliminate steady-state error while enhancing transient performance. The sliding variable is formulated as follows:

s (t) = \overset{\cdot}{e} (t) + ρ e (t) + K \int_{0}^{t} e (τ) d τ - s_{0}

(12)

where $e$ represents the tracking error, consisting of the elevation and azimuth angle components; $ρ$ and $K$ are positive definite diagonal gain matrices; $s_{0}$ is the initial compensation term defined by the initial system states:

s_{0} = \overset{\cdot}{e} (0) + ρ e (0)

(13)

By substituting $t = 0$ into equation (12), it is evident that:

s (0) = \overset{\cdot}{e} (0) + ρ e (0) + K \cdot 0 - [\overset{\cdot}{e} (0) + ρ e (0)] = 0

(14)

Remark: By setting $s (0) = 0$ , the system starts directly on the sliding manifold. This eliminates the reaching phase, thereby guaranteeing robustness against matched uncertainties and disturbances throughout the entire control process. And the time derivative of the sliding variable remains:

\overset{\cdot}{s} = \overset{\cdot\cdot}{e} + ρ \overset{\cdot}{e} + Ke

(15)

Based on the integral sliding variable, the control law is formulated to drive the system states to the sliding manifold within finite time and to ensure sustained sliding motion thereafter.

The control input τ is composed of two main components: the nominal control, which reflects the known system dynamics, and the switching control, designed to counteract external disturbances and uncertainties.

τ = M (q) u + C (q, \overset{\cdot}{q}) \overset{\cdot}{q} + G (q) + D \overset{\cdot}{q}

(16)

The auxiliary control component $u$ is defined by the following expression:

u = {\overset{\cdot\cdot}{q}}_{d} + ρ \overset{\cdot}{e} + Ke + K_{s} sat (s / ϕ)

(17)

Where ${\overset{\cdot\cdot}{q}}_{d}$ denotes the desired acceleration vector. $K_{s} = diag (K_{s 1}, K_{s 2})$ represents the switching gain matrix, which determines the disturbance-rejection capability; $sat (\cdot)$ represents the saturation function; and $φ$ represents the boundary-layer thickness. Together, the latter two mitigate the inherent chattering of sliding-mode control.

Substituting the control law (equation (16)) into the system dynamics (equation (6)) yields:

M (q) \overset{\cdot\cdot}{q} + τ_{d} = M (q) u + C \overset{\cdot}{q} + G + D \overset{\cdot}{q}

(18)

Since the nonlinear terms are compensated by the control law and the inertia matrix $M (q)$ is invertible, the closed-loop dynamics simplifies to:

\overset{\cdot\cdot}{q} = u - M^{- 1} (q) τ_{d}

(19)

Substituting the expression for $u$ (equation (17)) and using the error definition (equation (10)), the error dynamics is obtained as:

\overset{\cdot\cdot}{e} = - ρ \overset{\cdot}{e} - Ke - K_{s} sat (\frac{s}{ϕ}) + M^{- 1} (q) τ_{d}

(20)

Combining this with the derivative of the sliding variable from equation (15), the decoupled sliding mode dynamics is derived as:

\overset{\cdot}{s} = - K_{s} sat (\frac{s}{ϕ}) + δ (t)

(21)

where $δ (t) = M^{- 1} (q) τ_{d}$ represents the transformed matched disturbance.

Adaptive optimization strategy via improved RL

Traditional sliding-mode control often depends on empirically tuned parameters, which limits its adaptability in the face of rapidly changing or uncertain environments. To address this limitation, we integrate AdaDDQN into the control framework, enabling online optimization of key parameters.

The parameter tuning process is reformulated as a Markov Decision Process (MDP), characterized by the standard five-tuple M = (S, A, P, R, γ):

State S represents the system’s current dynamic characteristics, capturing essential information for decision-making.

Action A represents the incremental adjustment applied to the control parameters.

Reward R represents the evaluative feedback that guides the learning agent toward optimal policy selection.

To improve the agent’s sensitivity to trajectory complexity and control smoothness, we design a high-dimensional state space that embeds both diversity metrics and predictive information about the system’s future behavior.

s_{t} = [q, \overset{\cdot}{q}, diversit y_{D}, diversit y_{F}, FI]

(22)

The following features are carefully designed to capture critical aspects of system dynamics and trajectory complexity:

State-space diversity ( $diversit y_{D}$ ) captures how widely the current state samples are distributed within the system’s state space.

diversit y_{D} = \frac{2}{n (n - 1)} \sum_{i = 1}^{n} \sum_{j = 1}^{n} euclidean ({s_{t}}_{i}, s_{tj})

(23)

Target-space diversity ( $diversit y_{F}$ ) captures the structural complexity of the normalized target trajectory Obj.

diversit y_{F} = \frac{2}{n (n - 1)} \sum_{i = 1}^{n} \sum_{j = 1}^{n} euclidean (Ob j_{i}, Ob j_{j})

(24)

where Obj represents the normalized target vector, which can be determined as follows:

Obj = \frac{s_{t} - \min (s_{t})}{max (s_{t}) - \min (s_{t})}

(25)

By jointly analyzing the dispersion of state samples and the complexity of the normalized target trajectory, the system gains improved resilience to distributional shifts. This, in turn, enhances the generalization capability of the policy across varying control scenarios.

The feature vector includes future-informed indicators—collectively denoted as FI—which capture predictive information about target motion and disturbance trends. The FI component is structured into three elements: $F I_{1}, F I_{2}, and F I_{3}$ .

$F I_{1}$ (cumulative error) reflects the system’s historical control performance:

F I_{1} = \int edt \approx \sum_{i = 0}^{k} e_{i} Δ t

(26)

$F I_{2}$ (mean of the sliding variable) reflects the recent convergence status of the sliding variable:

F I_{2} = \frac{1}{10} \sum_{i = k - 9}^{k} | s_{i} |

(27)

$F I_{3}$ (torque rate-of-change) reflects the smoothness of the control output:

F I_{3} = \frac{| | τ_{k} - τ_{k - 1} | |}{Δ t}

(28)

The structure of the action space is carefully designed to suit the needs of online parameter tuning in a dynamic control environment.

The action space is structured to enable real-time adaptation of control parameters, ensuring the agent can respond effectively to changes in dynamic and uncertain environments:

a_{t} = {[Δ ρ_{1}, Δ ρ_{2}, Δ K_{1}, Δ K_{2}, Δ K_{s 1}, Δ K_{s 2}]}^{T}

(29)

The adjustable parameters include the sliding-variable coefficients, integral gains, and switching gains, all defined separately for the azimuth and elevation control channels.

For the switching gain adjustment ( $Δ K_{s}$ ): The action range is defined as $[- 25.0, 25.0]$ N·m. This wide range is critical for the RL agent to instantly boost the control gain from its nominal value to a high saturation level when a shock disturbance is detected.

For the sliding variable parameter adjustment ( $Δ ρ$ ): The action range is set to $[- 10.0, 10.0]$ . This allows the agent to significantly modulate the system’s bandwidth in real-time, increasing stiffness during fast maneuvers and reducing it during steady states to minimize noise sensitivity.

For the integral gain Adjustment ( $Δ K$ ): The integral gain is kept relatively stable with a narrow adjustment range ( $[- 0.1, 0.1]$ in practice) to preserve the steady-state error elimination capability without introducing integral windup oscillations.

To improve both learning stability and decision quality, the proposed AdaDDQN network leverages a Dueling DDQN architecture. This structure incorporates two key mechanisms:

DDQN introduces a separate target network to compute the estimated Q-values. By decoupling action selection from value evaluation, this approach effectively reduces the overestimation bias often encountered in standard DQN methods.

L (θ) = E [{(R + γ \max_{a_{t + 1}} Q (s_{t + 1}, a_{t + 1}; θ^{*}) - Q (s_{t}, a_{t}; θ))}^{2}]

(30)

The dueling network separates the Q-value into a state-value $V$ and an advantage function $A$ , allowing the model to better distinguish between the importance of states and the relative utility of actions:

\begin{matrix} (s_{t}, a_{t}) = V (s_{t}) + (A (s_{t}, a_{t}) - \frac{1}{| A |} \sum_{a_{t + 1}} A (s_{t}, a_{t + 1})) \\ (s_{t}, a_{t}) = V (s_{t}) + (A (s_{t}, a_{t}) - \frac{1}{| A |} \sum_{a_{t + 1}} A (s_{t}, a_{t + 1})) \end{matrix}

(31)

The reward function is constructed to penalize angular tracking errors while simultaneously incorporating constraints on control torque magnitude and variations. The mathematical formulation of the reward is as follows:

r = 10 e^{- 10 | | e | |} - δ_{1} | | τ | | - δ_{2} | | Δ τ | |

(32)

Specifically, the first term in the reward function applies an exponential penalty to the tracking error, encouraging precise alignment with the target trajectory. The second term serves as a constraint on the control energy, limiting excessive torque output. The third term discourages large control adjustments by penalizing the magnitude of changes in torque. This adjustment penalty $| | Δ τ | |$ is formulated as follows:

| | τ | | = Δ ρ_{p}^{2} + Δ K_{p}^{2} + Δ ρ_{a}^{2} + Δ K_{a}^{2}

(33)

where $Δ ρ_{p}$ and $Δ ρ_{a}$ represent the variations of the sliding-surface error-term coefficients for the azimuth and elevation channels, respectively; while $Δ K_{p}$ and $Δ K_{a}$ represent the variations of the integral-term coefficients for azimuth and elevation, respectively. Overall, the penalty structure comprehensively incorporates azimuth and elevation tracking errors, sliding-surface deviations, integral terms, and control effort. In the RL process, the exponential error term ensures large deviations are rapidly penalized, encouraging fast convergence. Meanwhile, constraints on torque magnitude prevent excessive control energy, and smoothing the parameter updates helps reduce oscillations that would otherwise arise from abrupt changes.

Stability analysis and ultimate boundedness

To rigorously verify the stability of the proposed control scheme under the influence of the RL-optimized parameters and the saturation function, we first establish the necessary assumptions and then derive the stability theorem based on Lyapunov theory.

Assumption 1 (Bounded Disturbance): The transformed disturbance term $δ (t)$ is bounded, satisfying $| δ (t) | \leq Δ_{\max}$ . Given that the inertia matrix $M (q)$ is positive definite and bounded, this assumption is consistent with the physical constraints of the gimbal system.

Assumption 2 (RL Safety Constraint): Although the switching gain $K_{s}$ is dynamically adjusted by the AdaDDQN agent, its action space is strictly constrained to ensure stability. Specifically, the minimum eigenvalue of the switching gain satisfies the lower bound condition: $λ_{\min} (K_{s} (t)) \geq Δ_{\max} + η$ , where $η > 0$ is a positive safety margin.

Theorem 1 (Uniform Ultimate Boundedness): Consider the gimbal dynamics described by equation (16). Under Assumptions 1 and 2, and utilizing the control law with the saturation function $sat (s / ϕ)$ , the closed-loop system is Uniformly Ultimately Bounded (UUB). Both the sliding variable $s$ and the tracking error $e$ will converge to a compact residual set centered at the origin.

To formally establish this result, we construct the following Lyapunov candidate function:

V = \frac{1}{2} s^{T} s

(34)

Differentiating $V$ with respect to time and substituting the closed-loop dynamics, we obtain:

\overset{\cdot}{V} = s^{T} \overset{\cdot}{s} = s^{T} δ (t) - s^{T} K_{s} sat (\frac{s}{ϕ})

(35)

We analyze the stability in two regions based on the boundary layer thickness $ϕ$ :

Utilizing the skew-symmetric property $s^{T} (\overset{\cdot}{M} - 2 C) s = 0$ and substituting the control law $τ = M {\overset{\cdot\cdot}{q}}_{eq} - K_{s} sat (s / ϕ)$ , the derivative simplifies to:

\overset{\cdot}{V} = s^{T} τ_{d} - s^{T} K_{s} sat (\frac{s}{ϕ})

(36)

We analyze the stability in two regions based on the boundary layer thickness $ϕ$ :

Case 1: Outside the Boundary Layer ( $| s | \geq ϕ$ ) In this region, the saturation function operates as a sign function, that is, $sat (s / ϕ) = sgn (s)$ . Thus:

\begin{matrix} \overset{\cdot}{V} = s^{T} δ (t) - s^{T} K_{s} sgn (s) \leq | s | | δ (t) | \\ - λ_{\min} (K_{s}) | s | \leq | s | Δ_{\max} - λ_{\min} (K_{s}) | s | \end{matrix}

(37)

By Assumption 2, since $λ_{\min} (K_{s}) \geq Δ_{\max} + η$ , it follows that:

\overset{\cdot}{V} \leq - η | s | < 0

(38)

This ensures that any system trajectory starting outside the boundary layer will reach the boundary layer region in finite time.

Case 2: Inside the Boundary Layer ( $| s | < ϕ$ ) In this region, the saturation function behaves linearly, that is, $sat (s / ϕ) = s / ϕ$ . The derivative becomes:

\overset{\cdot}{V} = s^{T} δ (t) - s^{T} K_{s} \frac{s}{ϕ} \leq | s | Δ_{\max} - \frac{λ_{\min} (K_{s})}{ϕ} | s |^{2}

(39)

For $\overset{\cdot}{V}$ to remain negative, the condition $| s | Δ_{\max} < \frac{λ_{\min} (K_{s})}{ϕ} | s |^{2}$ implies:

| s | > \frac{Δ_{\max} ϕ}{λ_{\min} (K_{s})}

(40)

Consequently, the sliding variable $s$ does not converge to zero but to an ultimate bound set $Ω_{s}$ :

Ω_{s} = {s : | s | \leq \frac{Δ_{\max} ϕ}{λ_{\min} (K_{s})}}

(41)

In the steady state, the dynamics of the sliding variable are dominated by $\overset{\cdot}{e} + ρ e \approx s$ . This can be viewed as a stable first-order linear filter with input $s$ and state matrix $- ρ$ (since $ρ$ is positive definite). The ultimate bound of the tracking error $| e |$ is derived as:

| e |_{bound} \leq \frac{| s |_{bound}}{λ_{\min} (ρ)} = \frac{Δ_{\max} ϕ}{λ_{\min} (ρ) λ_{\min} (K_{s})}

(42)

Equation (42) provides the theoretical lower bound of the tracking error. It explicitly reveals that the error bound is proportional to the disturbance magnitude $Δ_{\max}$ and boundary layer thickness $ϕ$ , and inversely proportional to the control gains $K_{s}$ and $ρ$ . The proposed AdaDDQN method is theoretically grounded in this relationship; by dynamically maximizing the gains $K_{s}$ and $ρ$ (within stability constraints) and optimizing $ϕ$ , the RL agent minimizes the theoretical error bound $| e |_{bound}$ , thereby achieving high-precision tracking performance while mitigating chattering.

Simulation and experiment results

To evaluate the performance of the proposed adaptive ISMC, which integrates AdaDDQN, a comprehensive validation framework is established. The experimental setup includes both numerical simulations and physical hardware testing, aimed at examining the system’s tracking accuracy, response speed, and robustness in the presence of nonlinear friction, external disturbances, and structural uncertainties.

A simulation model of the two-DOF gimbal platform is constructed based on its physical configuration. The system dynamics—including the inertia matrix $M (q)$ , Coriolis matrix $C (q, \overset{\cdot}{q})$ , and gravitational term $G (q)$ —are derived using real-world mechanical parameters. The key configuration values are summarized in Table 1.

Table 1.

Physical parameters of two-axis tracking and pointing platform.

Parameters	Physical significance	Values
$m$	Load mass	35 kg
$L$	Distance to the rotation axis	0.8 m
$I_{θ}$	Azimuthal axis moment of inertia	8 $kg \cdot m^{2}$
$I_{β}$	Pitch axis rotational inertia	12.5 $kg \cdot m^{2}$
$d_{θ}$	Azimuth axis damping	0.2 $N \cdot m \cdot s / rad$
$d_{β}$	Pitch axis damping	0.3 $N \cdot m \cdot s / rad$
g	Acceleration of gravity	9.8 $m / s^{2}$

To closely replicate real-world operating conditions, the simulation environment incorporates nonlinear friction and stochastic disturbances.

Nonlinear friction modeling: A Stribeck friction model is employed to capture the key characteristics of mechanical friction, including breakaway static friction, Coulomb friction, and viscous damping. The parameters are configured as follows: maximum breakaway torque $T_{break} = 3.25 N \cdot m$ , Coulomb torque $T_{c} = 2.5 N \cdot m$ , and viscous coefficient $f = 1.2 \times 10^{- 3}$ .

Random torque disturbances: To emulate torque ripple inherent in motor drive systems, a zero-mean random noise is applied with an amplitude of 4% of the instantaneous motor output torque ( $0.04 T_{m}$ ).

The simulation is conducted over an extended duration to assess the long-term performance of the proposed controller. Initial conditions include an azimuth angle offset of 0.1 rad and an elevation angle offset of −0.05 rad, with both angular velocities initialized to zero. A composite reference trajectory is applied to evaluate the system’s tracking accuracy and dynamic response under varying conditions. Figure 4 illustrates the azimuth tracking performance, where the desired trajectory is shown as a red dashed line, and the actual response as a blue solid line. During the first 1–2 s, a noticeable discrepancy is observed due to the initial misalignment and the RL agent not yet having optimized the control parameters. As learning progresses, the agent continuously refines the control strategy, leading to a gradual reduction in tracking error. After a short transient period, the actual trajectory closely aligns with the reference, demonstrating high-precision tracking. A similar behavior is observed in Figure 5, which presents the elevation (pitch) tracking results. After an initial adjustment phase, the tracking error rapidly diminishes and remains negligible throughout the remainder of the simulation, confirming the method’s robust dynamic performance across both motion axes.

Figure 4.

Azimuth tracking analysis results.

Figure 5.

Pitch tracking analysis results.

Figure 6 presents the tracking error curves for both azimuth and elevation channels, shown in blue and red, respectively. Immediately following system activation, both error signals exhibit rapid decay. Once the initial transient subsides, the tracking errors remain consistently within the target threshold of 10 mrad. Minor oscillations are observed, primarily due to external disturbances and the system’s inherent nonlinearities. However, thanks to the synergistic effect of integral sliding-mode control and adaptive RL, these fluctuations are effectively suppressed. The RL agent’s continuous parameter tuning ensures ongoing error convergence and robust performance over time.

Figure 6.

Tracking errors analysis results.

Figures 7 –11 illustrate the internal dynamics and adaptive behavior of the control system. Figure 7 shows the output torque response, reflecting how the system modulates control effort under dynamic conditions. Figure 8 presents the evolution of the sliding variable, which rapidly converges toward zero, indicating the effectiveness of the sliding-mode strategy in stabilizing the trajectory. In Figure 9, the adaptive parameter $ρ$ . Figure 10 displays the RL agent’s training performance, where the upward trend of the cumulative reward clearly demonstrates progressive policy optimization and improved control quality. Figure 11 illustrates the changes in two adaptive gains, both of which influence the behavior of the saturation function embedded in the sliding-mode controller; these gains directly affect the system’s convergence speed and play a key role in suppressing chattering during the tracking process.

Figure 7.

Control torque analysis results.

Figure 8.

Sliding variable analysis results.

Figure 9.

$ρ$ analysis results.

Figure 10.

Reward analysis results.

Figure 11.

Adaptive adjustment of $K_{s}$ analysis results.

Relative to conventional ISMC, the proposed approach demonstrates significantly enhanced performance across several key dimensions.

In Figure 12, the blue curve represents the proposed RL-enhanced control method, the yellow dashed line shows the desired trajectory, and the red curve corresponds to the conventional ISMC baseline. The proposed method achieves high tracking precision, with minimal deviation from the reference across the full trajectory. While the ISMC baseline performs adequately in azimuth tracking, it exhibits noticeable oscillations during the initial transient phase and slightly lower steady-state accuracy. Figure 13 reveals a more pronounced difference in the elevation channel: the ISMC baseline suffers from an observable response delay, whereas the RL-improved controller maintains close adherence to the target trajectory. This performance gap highlights the advantage of online adaptation; the RL component dynamically adjusts control parameters, enabling rapid convergence to near-optimal configurations without requiring manual tuning. In contrast, traditional ISMC relies on fixed or heuristically selected parameters, which may require iterative adjustment to suit different dynamics. The absence of delay in azimuth tracking for the ISMC baseline can be attributed to a favorable initial parameter setting, but this does not generalize across axes or tasks. Overall, the results demonstrate that the proposed RL-ISMC framework consistently delivers better tracking performance, improved stability, and faster convergence compared to the conventional ISMC approach.

Figure 12.

Azimuth tracking analysis results.

Figure 13.

Pitch tracking analysis results.

Another simulation is designed to rigorously evaluate the contribution of each component (RL adaptation and Integral Sliding Mode) under challenging, realistic conditions. The study compares six distinct control strategies to isolate the effects of the Integral term and the RL-based adaptation: The complete framework combining Integral Sliding Mode Control with adaptive RL tuning of gains ( $ρ, K, K_{s}$ ). It features high base gains and strong integral action (AdaDDQN-ISMC); ISMC with fixed, pre-tuned gains. This tests the necessity of online adaptation by removing the RL agent’s dynamic tuning (DDQN-ISMC); Standard Sliding Mode Control without the integral term. This tests the necessity of the integral action for eliminating steady-state errors (SMC); Adaptive RL applied to standard SMC (no integral). This tests if RL alone can compensate for the lack of an integral term (AdaDDQN-SMC). A non-adaptive ISMC baseline with standard gains, serving as a traditional robust control benchmark (ISMC-NoRL). A classic PID controller for broad performance comparison (PID).

A continuous torque bias of approximately 8 Nm was applied to simulate unmodeled gravitational moments and constant friction and a sudden shock disturbance of 5 Nm was introduced from $t = 5 s$ to $t = 7 s$ to evaluate the system’s reaction to abrupt external forces. Six control strategies were compared as shown in Figure.

Figures 14 and 15 illustrate the tracking trajectories for the pitch and azimuth angles, respectively. A comparative analysis demonstrates that the proposed method exhibits the best performance in eliminating errors induced by the 8 N·m bias. Furthermore, it maintains superior tracking stability even when a shock disturbance is introduced at $t = 5$ s. In contrast, the fixed-gain DDQN-ISMC and ISMC-NoRL methods display significant deviation peaks due to insufficient gain stiffness, whereas the proposed AdaDDQN-ISMC effectively mitigates this by rapidly adjusting its switching gain, $K_{s}$ .

Figure 14.

Pitch tracking analysis with ablation: (a) pitch position tracking, (b) error of pitch tracking, (c) torque for pitch tracking and (d) $K_{s}$ for pitch tracking.

Figure 15.

Azimuth tracking analysis with ablation: (a) Azimuth position tracking, (b) error of azimuth tracking, (c) torque for Azimuth tracking and (d) $K_{s}$ for Azimuth tracking.

Notably, the comparison between AdaDDQN-ISMC and AdaDDQN-SMC highlights the critical role of the integral term. Methods lacking an integral sliding variable (specifically SMC and AdaDDQN-SMC) failed to eliminate the steady-state error caused by the 8 N·m bias. Although the RL agent attempted to compensate by increasing the gain, the AdaDDQN-SMC method still exhibited a persistent non-zero mean error. Conversely, both the proposed AdaDDQN-ISMC and the fixed ISMC methods successfully converged to a near-zero steady-state error, confirming that integral action is indispensable for handling constant unmodeled loads.

To evaluate the proposed control method under real-world conditions, a physical trajectory-tracking experiment was conducted using a two-axis gimbal platform (Figure 16). A predefined target trajectory was issued to the system, and the gimbal executed the motion commands while onboard sensors continuously recorded its response. The reference trajectory and real-time sensor data were processed by the RL-improved ISMC algorithm, which computed the appropriate control signals for each actuator. These control outputs were then applied to drive the gimbal in real time. Once the motion was completed, the recorded position feedback was compared with the original command trajectory to assess the tracking performance of the physical system.

Figure 16.

Experimental setup for control validation: (a) control box interior, (b) gimbal joint, (c) control box and host computer and (d) gimbal assembly.

The gimbal control system is composed of three primary components: the mechanical gimbal structure, the gimbal control box, and a host computer functioning as the upper-level controller. Within the control box, several integrated modules are responsible for core functions, including an embedded processor, a pulse signal generator, a data acquisition card, and a communication interface. The pulse generator translates control instructions into electrical pulse signals that directly drive the gimbal motors. The acquisition card continuously samples the state signals from onboard sensors and structures the data for processing. Acting as the central processing unit, the embedded motherboard receives both sensor data and control commands via the communication module. It processes incoming signals from the host, packages sensor data collected by the acquisition card, and transmits it back to the host. Simultaneously, it converts the host-generated control commands into motor control signals via the pulse generator. At the upper level, the host computer synthesizes real-time sensor feedback, the user-defined target trajectory, and predefined environmental disturbances, applying the proposed RL-based control algorithm to compute control outputs, which are then dispatched to the control box. The AdaDDQN agent is first trained in a high-fidelity simulation environment to learn the optimal policy for adjusting control parameters. The trained policy is then deployed to the physical gimbal system. During this phase, the agent operates in real-time inference mode.

During runtime, the host continuously uses the current system state and reference trajectory as inputs to generate the next-step control command. These commands are transmitted to the control box, which actuates the motors accordingly. Sensor modules then acquire the resulting gimbal motion in real time and return the measurements to the host. By comparing the feedback data with the reference trajectory, the tracking error is calculated to evaluate performance. Unlike simulation conditions, the physical setup introduces additional challenges such as stronger mechanical vibrations and unmodeled uncertainties, which can influence system behavior. The following section presents the experimental tracking results obtained under these real-world operating conditions.

In the physical testing environment, the gimbal was subjected to additional unmodeled factors including stronger mechanical vibrations and environmental disturbances not present in the simulation setup.

The results indicate that the maximum tracking error in azimuth remained below 0.01°, while the maximum error in elevation was under 0.03°, as can be seen from Figures 17 and 18. Despite the increased complexity and unpredictability of the real-world conditions, the system maintained stable operation without any signs of divergence or significant oscillation. These findings confirm the proposed algorithm’s strong robustness and its effective adaptability to hardware platforms.

Figure 17.

Pitch tracking analysis results.

Figure 18.

Azimuth tracking analysis results.

Discussions

The core innovation of this study lies in the seamless integration of RL adaptability with the robust control guarantees of SMC. Traditional SMC approaches often rely on fixed switching gains to handle worst-case disturbances, which can lead to excessive chattering. In contrast, the proposed method leverages an AdaDDQN agent to dynamically adjust the switching gain $K_{s}$ and sliding variable parameters ρ and K, based on real-time system observations such as error integrals and torque rate changes. Experimental results show that $K_{s}$ adapts proportionally to disturbance intensity, enabling the system to maintain a flexible trade-off between rapid convergence and smooth control. To mitigate phase lag—common in dynamic target tracking—a multi-dimensional state space is introduced, incorporating diversity metrics and future-information (FI) parameters, which improve the controller’s perception of trajectory complexity and system smoothness. The RL agent then predicts angular-velocity increments to preemptively counteract delays. Notably, an azimuth-axis tracking error below 0.52 mrad (0.03°) validates the method’s effectiveness in addressing mechanical latency. The algorithm further benefits from a Dueling DDQN architecture, which decouples action selection from value estimation, thereby enhancing stability and avoiding the common issue of value overestimation in high-dimensional continuous spaces.

Compared with conventional ISMC and the foundational design requirements, the proposed AdaDDQN-ISMC delivers measurable performance gains, validated through rigorous ablation analysis. Firstly, regarding steady-state robustness and precision, the ablation study confirms the indispensability of the integral term. Under a continuous torque bias (simulating unmodeled gravity and friction), the conventional SMC and the ablated AdaDDQN-SMC variants exhibited persistent steady-state errors, as they lacked the error-accumulation mechanism to counteract the constant load. In contrast, the proposed method leverages the integral sliding surface to completely eliminate this bias. Consequently, the system consistently maintains tracking errors below 8 mrad, exceeding the 10 mrad accuracy threshold required for effective beam coverage. Measured results further highlight this precision, showing an elevation error of only 0.01° ( $\approx 0.17$ mrad), which is far below the theoretical design limit. Secondly, in terms of dynamic response and shock rejection, the RL-based adaptive mechanism proves superior to fixed-gain strategies. During the sinusoidal tracking tests, traditional ISMC suffered from noticeable phase lag and insufficient stiffness against the sudden shock disturbance. The fixed-gain ablation baselines (DDQN-ISMC) displayed significant deviation peaks during these transient events. Conversely, the proposed algorithm utilized the AdaDDQN agent to instantly boost the switching gain $K_{s}$ upon detecting the shock, aligning the output closely with the reference signal throughout the trajectory. This “on-demand” gain adjustment resulted in an improvement of more than 80% in tracking precision during discontinuous trajectory scenarios. Thirdly, concerning control smoothness and chattering suppression, the proposed method resolves the trade-off inherent in conventional sliding mode control. While traditional SMC requires high static gains to suppress nonlinear frictional disturbances (e.g. Stribeck effect), leading to steady-state oscillations, the proposed architecture uses the integral term to handle static loads, allowing the switching gain to remain low during steady states. By further incorporating a torque rate penalty in the reward function, the algorithm produces smoother torque profiles and effectively mitigates chattering, demonstrating superior disturbance rejection without compromising mechanical integrity.

Despite promising results in both simulation and hardware testing, several limitations remain. Firstly, the current evaluation is limited to the gimbal’s own nonlinear dynamics; in real-world deployment, platform-induced disturbances such as translational motion and carrier rocking (e.g. on vehicles or ships) introduce complex coupling effects. Future work will extend testing to high-DOF motion platforms to validate disturbance suppression under carrier-coupled dynamics. Secondly, while the simulation environment incorporates measured mechanical parameters, real-world factors such as aerodynamic drag and temperature-sensitive friction remain difficult to model accurately. Enhancing sim-to-real transfer through meta-RL or domain randomization is an important direction to improve generalization in unstructured environments. Lastly, the DRL model incurs non-negligible computational costs, which may challenge real-time deployment in resource-constrained embedded systems. Future research will explore network pruning, model compression, and hardware-aware policy acceleration to enable lightweight deployment in compact Active Denial devices.

Conclusions

The proposed method improves the tracking and pointing performance of microwave ADS. Experimental results indicate that the tracking accuracy increases by more than 80% compared with conventional SMC, with dynamic errors reliably constrained within 8 mrad. Hardware tests on a two-DOF gimbal platform show that the elevation and azimuth tracking errors remain below 0.01° and 0.03°, respectively—well within the threshold required to maintain beam jitter within half of the spot diameter. Validated by rigorous ablation studies, the integral sliding term effectively eliminates steady-state bias, while the RL agent—utilizing a novel torque-rate penalized reward mechanism—dynamically adjusts gains to mitigate phase lag and suppress chattering. This fusion of adaptive learning and robust control delivers reliable performance in both simulated and physical environments. According to the accuracy model, this level of precision corresponds to a target-lock probability of approximately 99.87% at an operational range of 250 m. By enabling online decision-making through RL, the approach effectively mitigates issues such as phase lag and torque chattering that commonly affect traditional controllers. This fusion of adaptive learning and robust control delivers reliable performance in both simulated and physical environments, offering a technically viable path toward enhancing the effectiveness of non-lethal directed energy systems.

Future work will focus on extending the validation of the proposed approach under more complex operational conditions. In particular, when mounted on mobile carriers such as vehicles or ships, the gimbal system experiences translational motion and rotational disturbances that introduce nonlinear coupling through kinematic interactions. To address this, further experiments will be conducted on dynamic platforms to assess the algorithm’s ability to reject carrier-induced disturbances. In addition, considering the computational load of DRL in embedded systems, we plan to investigate model compression techniques and edge computing frameworks for real-time deployment. Additional efforts will be made to examine the influence of unstructured environmental factors—including extreme temperatures, wind loading, and mechanical vibration—on control performance, with the goal of improving the system’s generalization ability and responsiveness in field applications.

Footnotes

ORCID iD

Zongzheng Sun

Ethical considerations

This study is based on theoretical analysis and/or published literature, and no ethical approval was required.

Consent to participate

This study did not involve human participants, and thus no informed consent to participate was required.

Consent for publication

This study does not contain any individual person’s data in any form, and therefore consent for publication is not applicable.

Author contributions

Zongzheng Sun: conceptualization, data curation, formal analysis, investigation, methodology, software, writing–original draft, writing–review and editing.

Xinjian Niu, Jianwei Liu, Yinghui Liu: supervision, validation, project administration, funding acquisition, review and editing. Jin Han: validation, formal analysis, review and editing.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Key Laboratory of Science and Technology on Vacuum Electronics.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability statement

The datasets used and analysed during the current study are available from the corresponding author on reasonable request, and all key data are already presented in the main text of the article.

References

Sun

Niu

Jia

, et al. An improved cubature Kalman filter state prediction method based on the design of Active Denial Tracking System. Rev Sci Instrum 2024; 95(8): 083703.

Niu

Lei

Liu

, et al. A study on 94 GHz low-voltage, low-current Gyrotron. IEEE Trans Electron Devices 2013; 60(11): 3907–3912.

Niu

Wang

, et al. Numerical simulation of 94 GHz complex cavity gyrotron. High Power Laser Part Beams 2011; 23(11): 3033–3035.

LeVine

. The Active Denial System: a revolutionary, non-lethal weapon for today’s battlefield. Report no. ADA501865, 2009. Fort Belvoir, VA: Defense Technical Information Center. DOI: 10.21236/ada501865.

Hao

Yuan

, et al. Observer-based distributed convex optimization of bipartite containment control for higher order nonlinear uncertain multi-agent systems. Meas Control 2024; 57(5): 566–579.

Shi

Ijaz

, et al. Adaptive backstepping integral sliding mode control of multirotor UAV system used for smart agriculture. In: Proceedings of the 2024 IEEE international conference on systems, man, and cybernetics (SMC), 2024, pp.303–308. DOI: 10.1109/smc54092.2024.10831925.

Xiao

Zhang

Zhao

, et al. Output feedback integral sliding mode predictive control for linear discrete time systems. Eur J Control 2024; 80(b): 101131.

Feng

. Deep reinforcement learning based optimal defense for cyber-physical system in presence of unknown cyber-attack. In: 2017 IEEE symposium series on computational intelligence (SSCI), 2018, pp.1–8. DOI: 10.1109/SSCI.2017.8285298.

Liu

Quan

Gao

, et al. Distributed asynchronous learning for multipath data transmission based on P-DDQN. China Commun 2021; 18(8): 62–74.

10.

Zhang

Zhou

Shi

, et al. DDQNC-P: a framework for civil aircraft tactical synergetic trajectory planning under adverse weather conditions. Chin J Aeronaut 2024; 37(12): 434–457.

11.

Zhai

Zhou

Zhang

, et al. Delay minimization in hybrid edge computing networks: a DDQN-based task offloading approach. IEEE Trans Vehicular Technol 2024; 73(10): 15098–15108.

12.

Lan

Gang

Zhang

, et al. A multi-stage collision avoidance model for autonomous ship based on fuzzy set theory with TL-DDQN algorithm. Ocean Eng 2024; 311(1): 118912.

13.

Qiu

, et al. DT-DDQNR: a digital twin assisted direct-to-cell satellite network intelligent routing algorithm. Adv Space Res 2025; 75(1): 635–647.

14.

Wang

Tang

. Comprehensive evaluation of power flow and adjustment method to restore solvability based on GCRNN and DDQN. Int J Electr Power Energy Syst 2021; 133: 107160.

15.

Jing

Chuhang

. Intelligent cluster-based routing protocol for wireless sensor networks using fitness-guided FGO and DDQN with intrinsic curiosity. IEEE Sens J 2026; 26: 1295–1310.

16.

Cao

Kou

, et al. Autonomous maneuver decision of UCAV air combat based on double deep Q network algorithm and stochastic game theory. Int J Aerosp Eng 2023; 2023: 1–20.

17.

Chen

Wang

Deng

, et al. Improved double deep Q-network algorithm applied to multi-dimensional environment path planning of hexapod robots. Sensors 2024; 24(7): 2061.

18.

Wang

Tang

Huang

, et al. Power system emergency control strategy based on severely disturbed units identification and STGCN-DDQN. Elect Power Syst Res 2024; 226: 109903.

19.

Cho

Park

. Joint multi-agent reinforcement learning and message-passing for resilient multi-UAV networks. IEEE Trans Netw Serv Manag 2026; 23: 2051–2063.

20.

Yuan

Sun

. Time-scale decoupled adaptive prescribed-time control for flexible-joint manipulator: fuzzy dead-zone compensation without prior knowledge. IEEE Trans Ind Electron 2025: 1–12. DOI: 10.1109/TIE.2025.3629408.

21.

Zhang

Zhao

Lim

, et al. Periodic event-triggering adaptive control for networked uncertain nonlinear systems against actuator attacks and its applications. IEEE Trans Syst Man Cybern Syst 2026; 56(1): 579–592.

Adaptive Integral Sliding Mode control based on reinforcement learning for high-precision tracking in active denial systems

Abstract

Keywords

Introduction

Precision analysis and system modeling

Precision analysis

System dynamics model and control objectives

ISMC and improved RL-based adaptive optimization strategy

Construction of integral sliding variable

Adaptive optimization strategy via improved RL

Stability analysis and ultimate boundedness

Simulation and experiment results

Discussions

Conclusions

Footnotes

ORCID iD

Ethical considerations

Consent to participate

Consent for publication

Author contributions

Funding

Declaration of conflicting interests

Data availability statement

References