Abstract
Massive multiple-input multiple-output (MIMO) systems are at the core of next-generation wireless communications because of their promise of high spectral efficiency, better signal quality, and power savings. However, the optimal beamforming design for such systems is an extremely compute-intensive problem, particularly when channel conditions vary, and the channel state information (CSI) is imperfect. This paper presents a reinforcement learning approach to beamforming that bypasses these limitations through an adaptive and data-driven policy learning paradigm for efficient transmission. The Q-learning-based reinforcement learning (RL) algorithm was run for 500 episodes and showed steady convergence, with the training loss decreasing from 0.87 to 0.04 and the validation loss from 0.90 to 0.05. Prominent performance parameters delivered by this approach constituted mean signal to interference plus noise ratio (SINR) is 22.9 dB, spectral efficiency 14.8 bits/s/Hz, and BER 3.1 × 10−4 at 15 dB signal to noise ratio (SNR). The SINR obtained could be improved by 50% and 22.5% with respect to maximum ratio transmission (MRT) and zero-forcing (ZF) techniques, respectively, while the power used was 9.6 W. Robustness testing with 5% CSI error and 100 Hz Doppler fading scenarios showed the very least deterioration in performance. This leads to the conclusion that RL in beamforming is a promising approach for real-time systems in view of future adaptive scalable MIMO deployments. This work can be extended in multi-user and multi-cell settings for more general applicability.
Keywords
Get full access to this article
View all access options for this article.
