Abstract
Millimeter-wave (mmWave) MIMO systems are pivotal for enabling high-capacity wireless communications in next-generation networks. However, effective beamforming remains a key challenge due to high propagation loss and the dynamic nature of mmWave channels. Traditional approaches relying on exhaustive search or static codebooks are inefficient in mobile and blocked environments. In this work, we propose a deep reinforcement learning-based adaptive beamforming algorithm that formulates beam selection as a Markov Decision Process and applies the Deep Deterministic Policy Gradient (DDPG) method to learn continuous-valued beamforming strategies. To enhance reliability, the framework integrates a hybrid policy mechanism that fuses learned actions with domain-aware heuristics for robustness under uncertainty. Simulation results show that our method achieves up to 35% higher spectral efficiency and 1.5 dB lower beam misalignment loss compared to DQN and heuristic baselines, while converging 40% faster during training. These results highlight the promise of actor–critic reinforcement learning in realizing intelligent, low-overhead beam control for dynamic mmWave MIMO environments.
Keywords
Get full access to this article
View all access options for this article.
