Novel Convolutional Restricted Boltzmann Machine manifold learning inspired dynamic user clustering hybrid precoding for millimeter-wave massive multiple-input multiple-output systems

Abstract

Millimeter-wave massive multiple-input multiple-output is a key technology in 5G communication system. In particular, the hybrid precoding method has the advantages of being power efficient and less expensive than the full-digital precoding method, so it has attracted more and more attention. The effectiveness of this method in simple systems has been well verified, but its performance is still unknown due to many problems in real communication such as interference from other users and base stations, and users are constantly on the move. In this article, we propose a dynamic user clustering hybrid precoding method in the high-dimensional millimeter-wave multiple-input multiple-output system, which uses low-dimensional manifolds to avoid complicated calculations when there are many antennas. We model each user set as a novel Convolutional Restricted Boltzmann Machine manifold, and the problem is transformed into cluster-oriented multi-manifold learning. The novel Convolutional Restricted Boltzmann Machine manifold learning seeks to learn embedded low-dimensional manifolds through manifold learning in the face of user mobility in clusters. Through proper user clustering, the hybrid precoding is investigated for the sum-rate maximization problem by manifold quasi-conjugate gradient methods. This algorithm avoids the traditional method of processing high-dimensional channel parameters, achieves a high signal-to-noise ratio, and reduces computational complexity. The simulation result table shows that this method can get almost the best summation rate and higher spectral efficiency compared with the traditional method.

Keywords

mmWave massive multiple-input multiple-output manifold analysis hybrid precoding user clustering

Introduction

Millimeter-wave (mmWave) massive MIMO (multiple-input multiple-output) technology gradually becomes a key technology in 5G communication due to its rich spectrum resources.^1–3 Due to the high carrier frequency, mmWave signal suffers from high propagation loss so that large-scale antenna arrays are leveraged for path compensation.⁴ However, in massive MIMO system, the number of antennas at the transmitter and receiver is very large,⁵ configuring a radio frequency (RF) chain for each antenna in the traditional all-digital solution requires a lot of hardware cost and causes the loss of power. In response to this problem, a hybrid scheme has emerged, considering the reduction of hardware requirements in spectrum efficiency (SE) and energy efficiency (EE).^6–8 However, the hybrid precoding scheme in wideband channels is currently a difficult problem to solve.

How to obtain the optimal precoding matrix is the key issue of hybrid precoding. For the case of large-scale antennas in mmWave communication, large-scale matrix calculations are usually required.⁸ The difficulty of hybrid precoding is to reduce the complexity of the above situation.⁹ Some advanced beam-space-based hybrid precoding algorithms have been studied.^10,11 Previous investigations^12–16 make full use of the sparsity of the beam space channel according to the sparse signal processing scheme. In the literature,^12,13 the problem is transformed into finding the optimal precoder with hybrid structure, and an algorithm based on the basis tracking method is proposed. A hybrid precoding scheme designed is proposed according to the Orthogonal Match Pursuit (OMP) algorithm in the literature,^14,15 which can make full use of channel sparsity. In the multi-user scenario, the low-complexity multi-user hybrid precoding of the mmWave system is studied.¹⁶ A Kronecker decomposition hybrid beamforming (KDHB) method for multi-cell multi-user massive MIMO system based on sparse propagation path is proposed.¹⁷

However, the resolution of the beam space is not infinite. Due to the existence of power leakage, the sparse channel is non-ideal, and there are many possible non-zero terms. Some papers consider hybrid precoding of interfering mmWave channels.^18,19 Dealing with interference is very challenging, because the number of antennas is large, and the high-complexity precoding matrix is difficult to implement.²⁰ To address the high interference problem, a closed-form broadband hybrid precoding scheme was proposed in the literature.^21–24 An analytical framework of hybrid beamforming (AFHB) in multi-cell mmWave systems was proposed.²⁵ A combination of analog and digital beamforming is adopted. The former is based on a phase shifter, and the latter is based on a regularized zero-forcing method.

Recently, scholars have proposed the manifold learning in mmWave massive MIMO systems. Yu et al.²⁶ proposed a manifold optimization (MO)-based hybrid precoding algorithm with lower complexity. To replace the range of the constant envelope with a circular manifold, Chen²⁷ proposed a Riemannian conjugate gradient manifold algorithm. In Mai et al.,²⁸ a Riemann vector perturbation manifold for a multi-user massive MIMO system was studied, in which the RF-baseband hybrid precoding was jointly arranged. A Riemannian trust-region Newton manifold (RTRNM) showed an improved method of beamforming in multi-cluster scenarios.²⁹ The optimization beamforming is utilized to mitigate inter-cell interference by dividing multi-users into multi-clusters with spatial correlation. However, multi-user high-dimensional channels are not mapped into low-dimensional subspaces to achieve dimensionality reduction. Learning a form of a double digital beamforming schemes optimizes the network resource allocation in massive MIMO networks.³⁰ Moe Thet et al.³¹ analyze by greedy algorithm how fast-moving users in static and time-varying user clustering are executed according to the system sum-rate. The manifold learning algorithm is used to reduce the multi-user high-dimensional channels. It reduces the computational complexity while mitigating inter-cell interference-based fully digital beamforming. It focuses on the local linear spatial structure between user channels, and ignores the global spatial characteristics. And it is not possible to quickly analyze the global and local correlations between user channels in the case of moving users.

The traditional user precoding methods are not applicable to multiple users, although they optimize precoding using channel sparsity. In the multi-user scenario, the traditional method suffers from high precoding complexity, high channel dimensionality, and does not consider user mobility. Therefore, it is necessary to propose new algorithms that take into account these important problems in practical communication.

In this article, we propose a low-complexity hybrid precoding algorithm for dynamic user clustering in mmWave mass MIMO systems. Specifically, a large-scale number of antennas embedded in a low-dimensional subspace. The mmWave channel measurement results show that the mmWave has a diffuse scattering phenomenon on the surface of the rough scatterer, and the scattering range will increase as the wavelength decreases.³² For scenarios where users are dense, when there is not enough space between users, diffuse scattering may cause adjacent users to receive signals of the same path. Therefore, it causes serious inter-user interference. Our goal is to design a mixed precoding matrix, so they manage intra-cell and inter-cell interference requires a lower channel knowledge, and can be used to achieve low-complexity mixed analog/digital architecture, that is, compared with a small number of RF chains, the number of antennas. In order to solve the set classification problem, a manifold discriminant analysis (MDA)³³ is proposed. Set by each user is modeled as a manifold, we will issue expressed as clustering for multi-manifold learning. The manifold discriminative learning seeks to learn the embedding low-dimensional manifolds, wherein manifolds with different user cluster label better separation of high-dimensional partial space of each flow channel in the shape of the correlation is enhanced. Learning by discriminant manifold, the majority of high-dimensional mapping of the channel to a low-dimensional manifold, it is possible to fully utilize the potential of the high-dimensional channel spatial correlation. By transforming the non-linear problems of high-dimensional channels into global non-linearities and local non-linearities, the purpose of dimensionality reduction is achieved. In low-dimensional manifolds, the intra-cluster channels become more clustered and the separability of embedded features is enhanced. Facing the situation that users will move in different clusters, we introduce the novel Convolutional Restricted Boltzmann Machine (NCRBM) framework into our stream shape learning. By continuously updating the stream shape discriminations in clusters, we get the best state of stream shape learning for users in clusters. Through proper user clustering, the hybrid precoding is investigated for the sum-rate maximization problem by manifold quasi-conjugate gradient methods.³⁴ To enhance the spectral efficiency of the system, the design of each cluster analog RF precoder should balance the optimizing self-transmission and the interference. The digital precoding matrix is obtained by Karush Kuhn Tucker (KKT).^35–37 Compared with the traditional method, the proposed method does not require the solution of large-scale channel parameters, and can achieve a high signal-to-noise ratio (SNR) while reducing the computational complexity. The results show that the algorithm can obtain close to the optimal sum-rate and quite high spectral efficiency.

The rest of this article is as follows. Section “System model and channel model” introduces the system model and channel model. Section “User clustering hybrid precoding scheme” introduces the algorithm for dimensionality reduction and the hybrid precoding algorithm in multi-user high-dimensional channel scenarios. Section “Simulation results” presents the simulation results. Section “Conclusion” part summarizes this article.

Notations

Upper and lower-case boldface letters represent the matrices and the vectors, respectively. $(\cdot)^{H}$ , $(\cdot)^{- 1}$ , $(\cdot)^{T}$ , $(\cdot)^{*}$ , $tr (\cdot)$ , and $| | \cdot | |_{F}$ are the Hermitian transpose, inverse, transpose, complex conjugate, trace, and the Frobenius norm of a matrix, respectively. $E (\cdot)$ is the expectation. $diag (\cdot)$ denotes the diagonal matrix. $| G |$ is the cardinality of the set $G$ .⊗ indicates the Kronecker product. $CN (0, σ^{2})$ represents the zero-mean complex Gaussian distribution with zero-mean and the variance $σ^{2}$ . $span (Y)$ denotes the subspace spanned by the column vectors of $Y$ . $\nabla (\cdot)$ indicates the gradient. Finally, $I_{N}$ denotes the $N \times N$ identity matrix.

System model and channel model

System model

We consider a hybrid mmWave massive MIMO system model consisting of B cells. We assume that a base station (BS) equipped with $N_{t}$ antenna and $N_{RF}$ RF chains $(N_{t} \geq N_{RF} \geq K)$ serves $K$ single-antenna users, as shown in Figure 1. To manage the interference and improve the data rate for users, the users are partitioned into $L$ clusters $G_{1}, \dots, G_{L}$ with $g_{i} = | G_{i} |$ , $\sum_{i = 1}^{L} g_{i} = K$ , and $G_{i} \cap G_{i'} = \emptyset$ , $\forall i \neq i'$ . $G_{i}$ is $i th$ cluster, where $i = 1, \dots, L$ . The sets ${G_{1}, \dots, G_{L}}$ are all user clusters.

Figure 1.

Hybrid mmWave massive MIMO system model.

Let $u_{b, i, k}$ , $k = 1, \dots, g_{i}$ denote the $k th$ user of $G_{i}$ in the $b th$ cell $(b = 1, 2, \dots, B)$ . Hybrid precoding consists of two parts: baseband domain digital precoding and radio frequency domain analog precoding. In the downlink system, the transmitted symbol first passes through the digital precoder, and the generated signal is fed back to the radio frequency chain. The output of the RF chain is analog precoded and then sent to the antenna element. The transmitted signal vector $x_{b, i, k}$ at the base station is first precoded with a digital precoding $W_{b, i, k}$ . The resulting signals are fed to analog precoding ${F_{b,}}_{i, k}$ . The received signal $y_{b, i, k}$ of user $u_{b, i, k}$ can be given by

\begin{matrix} y_{b, i, k} = h_{b, i, k}^{H} {F_{b,}}_{i, k} W_{b, i, k} x_{b, i, k} \\ + \sum_{k' = 1, k' \neq k}^{| G_{i} |} h_{b, i, k}^{H} {F_{b,}}_{i, k'} W_{b, i, k'} x_{b, i, k'} \\ + \sum_{i' = 1, i' \neq i}^{L} h_{b, i'}^{H} {F_{b,}}_{i'} W_{b, i'} x_{b, i'} \\ + \sum_{b' = 1, b' \neq b}^{B} h_{b', i'}^{H} {F_{b',}}_{i'} W_{b', i'} x_{b', i'} + n_{b, i, k} \end{matrix}

(1)

where $h_{b, i, k} \in C^{N_{t}}$ is the channel vector between the BS and user $u_{b, i, k}$ . $x_{b, i, k} \in C^{N_{t}}$ represents the transmit signal of user $u_{b, i, k}$ . $n_{b, i, k} ~ CN (0, σ^{2})$ is the spatially white additive Gaussian noise. ${F_{b,}}_{i, k} \in C^{N_{t} \times n_{RF, i}}$ is the analog precoding matrix that adaptively steers an $n_{RF, i} - dimensional$ RF beamspace for the coverage of $G_{i}$ with $n_{RF, i} \geq g_{i}$ .

$W_{b, i, k} \in C^{n_{RF, i}}$ is the digital precoding matrix. $C$ is the set of complex numbers. $h_{b, i, k}^{H} {F_{b,}}_{i, k} W_{b, i, k} x_{b, i, k}$ are the desired signal. $\sum_{k' = 1, k' \neq k}^{| G_{i} |} h_{b, i, k}^{H} {F_{b,}}_{G_{i, k'}} W_{b, i, k'} x_{b, i, k'}$ are the intra-cluster interference. $\sum_{i' = 1, i' \neq i}^{L} h_{b, i'}^{H} {F_{b,}}_{G_{i'}} W_{b, i'} x_{b, i'}$ are the inter-cluster interference. $\sum_{b' = 1, b' \neq b}^{B} h_{b', i'}^{H} {F_{b',}}_{G_{i'}} W_{b', i'} x_{b', i'}$ are the inter-cell interference. The result of the hybrid method is more accurate than the statistical method, and the method can get faster and more generalized results, but it cannot provide enough accuracy in modeling the intra-cluster angle, which is necessary for beamforming and inter-cluster interference optimization.^28,38

Channel model

In order to take advantage of the unique spatial selectivity or scattering characteristics of the mmWave massive MIMO channel, this article adopts the Saleh–Valenzuela (SV) model,³⁵ where the channel matrix of the user in cluster can be expressed as

h_{i, k} = \sum_{l = 1}^{N_{l}} α_{i, l} a_{r} (θ_{r, i, l}) a_{t} (θ_{t, i, l})

(2)

where $N_{l}$ indicates the number of paths, $α_{l}$ is the complex gain in the $l th$ path, $a_{r} (θ_{r, i, l})$ and $a_{t} (θ_{t, i, l})$ are the user and the base station array response vectors, respectively, where $θ_{r, i, l}$ indicates the angle of arrival (AoA) at the user, and $θ_{t, i, l}$ is the angle of departure (AoD) at the base station. For a simple N-element uniform linear antenna array, the response vector is

a (θ) = \frac{1}{\sqrt{N}} {[1, e^{j (2 π / λ) d_{ULA} \sin (θ)}, \dots, e^{j (N - 1) j (2 π / λ) d_{ULA} \sin (θ)}]}^{T}

(3)

where $λ$ is the wavelength, and $d_{ULA}$ denotes the spacing between antennas. Due to the limited space during scattering of during mmWave propagation, the mmWave massive MIMO channel $h_{i, k}$ is low-rank. Therefore, we can use a limited number of RF chains to obtain near-optimal throughput.

User clustering hybrid precoding scheme

Our goal is to design a hybrid precoding matrix. Therefore, we must first deal with intra-cluster, inter-cluster, and inter-cell interference with less known channel knowledge, and second, we need to use little RF chains to complete the hybrid analog/digital architecture, avoiding the high complexity of traditional methods. Next, we propose a hybrid precoding method based on manifold learning to achieve the above goals.

NCRBM manifold learning for user clusters

With the increase of antennas and users in the mmWave massive MIMO system, inter-cell and intra-cell directional interference will occur during signal transmission. The high-dimensional channel matrix requires high-complexity hybrid analog/digital architectures. By modeling each user set as a manifold, we formulate the problem as clustering-oriented manifold discriminative learning.

The undirected similarity graph of multi-users is represented by the graph embedding method. To represent each user set as a manifold, the user channel characteristic graphs ${(h_{i, k}, m_{k, j})}_{i = 1}^{L}$ are constructed, as shown in Figure 2.

Figure 2.

User cluster undirected characteristic graph.

$μ'_{i, k}^{(0)}$ represents the intra-cluster channel weight function between users $k$ and user $j$ . $m_{ζ, k, j}$ represents the inter-cluster channel weight function between users $k$ and $j$ . The sets of the cluster channel weight functions are $M = {m_{k, j} : k, j \in (1, \dots, K)}$ .

The weight function $m_{ξ, k, j}$ of the intra-cluster is defined as follows

{\begin{matrix} 0 < m_{ξ, k, j} \leq 1, & k, j in the intra - cluster \\ 0, & otherwise \end{matrix}

(4)

The weight function $m_{ζ, k, j}$ of the inter-cluster is defined as follows

{\begin{matrix} 0 < m_{ζ, k, j} \leq 1, & k, j in the inter - cluster \\ 0, & otherwise \end{matrix}

(5)

The weight functions of the intra-cluster show that when users $k$ and $j$ are the same cluster, the weight is larger; when users $k$ and $j$ are the different cluster, the weight is 0. The weight functions of the inter-cluster show that when users $k$ and $j$ are different cluster, the weight is larger; when users $k$ and $j$ are the same cluster, the weight is 0. The manifold discriminative learning seek to learn the embedding low-dimensional manifolds, where manifolds with labels of different user groups can be separated more conveniently, and the local spatial correlation of the high-dimensional channels within each manifold is enhanced. Some existing manifold learning algorithms, such as Locally Linear Embedding (LLE),³⁹ cannot retain the complete global non-linear channel structure of user clusters.

We propose to perform the manifold discriminative learning for global dimensionality reduction. The high-dimensional channels are mapped in the low-dimensional manifolds, as shown in Figure 3. In order to reveal the potential non-linear manifold structure of high-dimensional channels, intra-cluster graph and inter-cluster graph are constructed using the label information of user characteristics. In addition, it can make the low-dimensional channels more clustered, and enhance the separability of embedded low-dimensional channels. The radio frequency eigen-beamformer is considered to be the best solution for user group transmission. The channel eigenvector learning corresponding to the maximum eigenvalue is taken as the spatial direction. In theory, the main direction learned is the beamforming. Multi-users of the same cluster have highly correlated transmission paths. We seek to learn a generic mapping $A$ that is defined as

h'_{k} = A^{T} h_{k}

(6)

where $A$ is projection matrix, $h'_{k}$ is the $k th$ user low-dimensional mapping of the high-dimensional channels $h_{k}$ . The original high-dimensional channels $h_{k}$ can be transformed into a low-dimensional channel $h'_{k}$ . The relative spatial relationship of neighboring users in high-dimensional channels remains unchanged in low-dimensional manifolds. In order to maintain the manifold structure of the high-dimensional channels, the optimization problem is the projection direction of manifold, that is, $\forall h_{k}, h_{j} (k \neq j)$ of the intra-cluster, the target function of the intra-cluster can be obtained as

max_{A} \sum_{k, j} {({h'}_{k} - {h'}_{j})}^{2} m_{ξ, k, j}

(7)

Figure 3.

Schematic diagram of dimension reduction.

The projection can maximize the use of all users in the cluster of the intra-cluster as equation (8), where $S_{ξ, local} = H (D_{ξ} - M_{ξ}) H^{T}$ is the local manifold structure of the intra-cluster, and $D_{ξ}$ is the diagonal matrix and $D_{ξ} = \sum_{k \neq j} M_{ξ} (k, j)$

\begin{matrix} max_{A} \sum_{k, j} {({h'}_{k} - {h'}_{j})}^{2} m_{ξ, k, j} \\ = max_{A} \frac{1}{2} \sum_{k, j} {(A^{T} h_{k} - A^{T} h_{j})}^{2} m_{ξ, k, j} \\ = max_{A} A^{T} H D_{ξ} H^{T} A - A^{T} H M_{ξ} H^{T} A \\ = max_{A} A^{T} S_{ξ, local} A \end{matrix}

(8)

According to the SV model, $R_{i, k} = E [h_{i, k} h_{i, k}^{H}]$ is the covariance matrix of the $k th$ user in the $i th$ cluster. The transmission covariance matrix of users in a cluster is the same, so, $R_{i, k}$ , that is

R_{i, k} = U_{i, k} Λ_{i, k} U_{i, k}^{H}

(9)

where $U_{i, k} \in C^{N_{t} \times r_{i}}$ is a matrix of eigenvectors corresponding to $r_{i} (r_{i} = N_{t})$ non-zero eigenvalues of $R_{i, k}$ . $Λ_{i, k}$ is the diagonal matrix whose elements are the non-zero eigenvalues of $R_{i, k} \in C^{N_{t} \times N_{t}}$ corresponding to the non-zero eigenvalues, satisfying $Λ_{k} \in C^{r \times r}$ . Since users in the same user cluster have similar spatial correlations, they have similar local scattering, $R_{i} = R_{i, k}, \forall k \in G_{i}$ . Measure of similarity between the user and the similarity criterion is a function of the distance function coefficients. Since $span (U) = U U^{T}$ , $\forall U_{i, k}, U_{i', k'}$ , the similarity measurement function between any two users based on the distance of subspace projection matrix can be expressed as

\begin{matrix} d_{pm} (U_{k} U_{k}^{T}, V_{i} V_{i}^{T}) \\ = \frac{1}{\sqrt{2}} ‖ U_{k} U_{k}^{T} - V_{i} V_{i}^{T} ‖_{F}^{2} \\ = \frac{1}{\sqrt{2}} tr (ψ_{k, i} ψ_{k, i}^{T}) \end{matrix}

(10)

where $U_{k}$ is the eigenvectors matrix of $R_{k}$ in any cluster, that is, $R_{k} = U_{k} Λ_{k} U_{k}^{H}$ , and $V_{i}$ is the eigenvectors matrix of the $i th$ cluster center $R_{i}$ . $ψ_{k, i} = U_{k} U_{k}^{T} - V_{i} V_{i}^{T}$ is the symmetric positive semi-definite matrix that needs to be learned. The global manifold structure $S_{ξ, global}$ of intra-cluster is measured as

S_{ξ, global} = \sum_{i = 1}^{L} \sum_{k \in G_{i}} \frac{1}{\sqrt{2}} \frac{1}{g_{i}} tr (ψ_{k, i} ψ_{k, i}^{T})

(11)

To effectively utilize the global characteristics and local manifold structure of intra-cluster channels, we can get the intra-cluster dispersion $η_{ξ}$ by combining equations (9) and (11)³³

η_{ξ} = υ S_{ξ, global} + (1 - υ) S_{ξ, local}

(12)

where $υ$ is the constants.

The weight functions $m_{ξ, k, j}$ of the intra-cluster can be obtained as

m_{ξ, k, j} = \exp (\frac{- d_{k, j}}{s'})

(13)

where $s'$ is the constants, and $d_{k, j}$ is the similarity measurement function between user $k$ and user $j$ .

In order to maintain the manifold structure of the inter-cluster user channels, the optimization problem is the projection direction of manifold, that is, $\forall h_{k}, h_{j} (k \neq j)$ of the inter-cluster, the objective function of the inter-cluster can be obtained as

max_{A} \sum_{k, j} {({h'}_{k} - {h'}_{j})}^{2} m_{ζ, k, j}

(14)

Therefore, the projection can maximize the use of all users of the inter-cluster, that is

\begin{matrix} max_{A} \sum_{k, j} {({h'}_{k} - {h'}_{j})}^{2} m_{ζ, k, j} \\ = \frac{1}{2} max_{A} \sum_{k, j} {(A^{T} h_{k} - A^{T} h_{j})}^{2} m_{ζ, k, j} \\ = max_{A} A^{T} H D_{ζ} H^{T} A - A^{T} H M_{ζ} H^{T} A \\ = max_{A} A^{T} S_{ζ, local} A \end{matrix}

(15)

where $S_{ζ, local} = H (D_{ζ} - M_{ζ}) H^{T}$ is the local manifold structure of the inter-cluster, and $D_{ζ}$ is the diagonal matrix and $D_{ζ} = \sum_{k \neq j} M_{ζ} (k, j)$ . The global inter-cluster $S_{ζ, global}$ is measured as

S_{ζ, global} = \sum_{i = 1}^{L} \sum_{k \notin G_{i}} \frac{1}{\sqrt{2}} \frac{1}{K - g_{i'}} tr (ψ_{k, i} ψ_{k, i}^{T})

(16)

To effectively utilize the global characteristics and local manifold structure of inter-cluster channels, we can get the inter-cluster dispersion $η_{ζ}$ by combining equations (15) and (16)

η_{ζ} = ℘ S_{ζ, global} + (1 - ℘) S_{ζ, local}

(17)

where ℘ is the constants.

The weight functions $m_{ξ, k, j}$ of the inter-cluster can be obtained as

m_{ζ, k, j} = \exp (\frac{- s ″}{d_{k, j}})

(18)

where $s ″$ is the constants.

After getting the intra-cluster dispersion $η_{ξ}$ and the inter-cluster dispersion $η_{ζ}$ , the mobility of the user causes these two values to change continuously. To cope with this problem, we introduce the NCRBM model⁴⁰ to acquire optimal value. The NCRBM is a multi-layer network constructed by Convolutional Restricted Boltzmann Machines (CRBMs) stacked on top of each other. NCRBM is an extension of the standard Restricted Boltzmann Machine (RBM) that has inherited all its properties but with faster operation. NCRBM optimize the intra-cluster dispersion value over time. In $i th (i = 1, . . ., L)$ cluster of users, let us denote the intra-cluster dispersion of $μ$ time interval by ${η_{ξ}^{i}}_{i = 1}^{L}$ , where $η_{ξ}^{i} = {η_{ξ}^{t_{0}}, η_{ξ}^{t_{0} + Δ t}, . . ., η_{ξ}^{t_{0} + μ Δ t}}$ is a set of intra-cluster dispersions belonging to $i th$ cluster. And $t_{0}$ is the start time, and $Δ t$ is the time interval. In addition, we define ${\tilde{η}}_{ξ}$ as the optimal intra-cluster dispersion after optimization. The optimal value is computed by averaging across manifold structure feature over all the intra-cluster dispersions belonging to set $η_{ξ}^{i}$ . An optimal value is a mean representation of a specific cluster in manifold structure feature space and must be calculated for each cluster separately. The energy of the joint configuration $(η_{ξ}^{i}, {\tilde{η}}_{ξ})$ of the input intra-cluster dispersion and optimized units for an NCRBM with real-valued input units can be defined as follows

\begin{array}{l} E (η_{ξ}^{i}, {\tilde{η}}_{ξ}^{i}) = \frac{1}{2} \sum_{u, v = 1}^{μ} {(η_{ξ}^{i})}_{u, v}^{2} \\ - \sum_{q = 1}^{Q} \sum_{m, n = 1}^{μ} {({\tilde{η}}_{ξ}^{i})}_{m, n}^{q} {(w^{q} * η_{ξ}^{i})}_{m, n} \\ - \sum_{q = 1}^{Q} b_{q} \sum_{m, n = 1}^{μ} {({\tilde{η}}_{ξ}^{i})}_{m, n}^{q} - c \sum_{u, v = 1}^{μ} {(η_{ξ}^{i})}_{u, v} \end{array}

(19)

where * stands for the convolution; $w^{q}$ , $({\tilde{η}}_{ξ}^{i})_{m, n}^{q}$ , and $(η_{ξ}^{i})_{u, v}$ denote the manifold structure feature detectors horizontally and vertically in filter q, the optimized unit on location $(m, n)$ , and the input intra-cluster dispersion unit on location $(u, v)$ , respectively. Also, $b_{q}$ is the shared bias among all units in feature map q, and $c$ is the bias value for input intra-cluster dispersion units.

The network assigns a probability to every possible pair of input and optimized unit through this energy function as follows

P (η_{ξ}^{i}, {\tilde{η}}_{ξ}^{i}) = \frac{1}{Z} \exp (- E (η_{ξ}^{i}, {\tilde{η}}_{ξ}^{i}))

(20)

where Z is the partition function.

The optimization of the model parameters can be performed by minimizing the following objective function using the Contrastive Divergence

min_{w, b, c} {- \sum_{i = t_{0}}^{t_{0} + μ Δ t} \log (\sum P (η_{ξ}^{i}, {\tilde{η}}_{ξ}^{i})) + λ_{sparsity} \sum_{q = 1}^{Q} {| p - \frac{1}{μ^{2}} \sum_{m, n = 1}^{μ} P ({({\tilde{η}}_{ξ}^{i})}_{m, n}^{q} = 1 | η_{ξ}^{i}) |}^{2}}

(21)

The second term in equation (21) is the sparsity regularization proposed to prevent the model from being overcomplete. For each intra-cluster dispersion, $η_{ξ}^{i}$ , we have ${\tilde{η}}_{ξ}^{i}$

{\tilde{η}}_{ξ}^{i} = \frac{1}{Q} \sum_{q = 1}^{Q} P ({({\tilde{η}}_{ξ}^{i})}^{q} = 1 | η_{ξ}^{i})

(22)

where $Q$ is the number of manifold structure features. Then, the optimal value for each cluster can be computed as follows

{\tilde{η}}_{ξ} = \frac{1}{μ} \sum_{i = t_{0}}^{t_{0} + μ Δ t} {\tilde{η}}_{ξ}^{i}

(23)

NCRBM objective function is comprised of two main parts, for example, generative and optimized parts. Generative objective (the first two terms in equation (24)) is the same as sparsity regularized CRBM, while this way of optimization does not guarantee the best intra-cluster dispersion value. Optimization of the generative part is performed by minimizing Contrastive Divergence. The second two terms in equation (24) correspond to the optimized function, which can be optimized by following a gradient-based process. Minimizing the following objective function can get the optimal intra-cluster dispersion

\begin{matrix} min_{w, b, c} {- \sum_{i = t_{0}}^{t_{0} + μ Δ t} \log (\sum P (η_{ξ}^{i}, {\tilde{η}}_{ξ}^{i})) + λ_{sparsity} \sum_{q = 1}^{Q} {| p - \frac{1}{μ^{2}} \sum_{m, n = 1}^{μ} P ({({\tilde{η}}_{ξ}^{i})}_{m, n}^{q} = 1 | η_{ξ}^{i}) |}^{2} \\ + λ_{dis} \frac{1}{Q} \sum_{r = 1}^{L} \sum_{q = 1}^{Q} e_{i} ‖ P ({({\tilde{η}}_{ξ}^{i})}_{m, n}^{q} = 1 | η_{ξ}^{i}) - M_{r} ‖_{F}^{2} - β_{dis} \sum_{u, v = 1}^{L} ‖ M_{u} - M_{v} ‖_{F}^{2}} \end{matrix}

(24)

e_{l} = {\begin{matrix} 1 if {\tilde{η}}_{ξ}^{i} \in η_{ξ}^{i} \\ 0 otherwise \end{matrix}

(25)

where L is the number of clusters. In equation (24), the third term tries to move the features of each intra-cluster dispersion in cluster r to its manifold structure $M_{r}$ , while the last term tries to maximize the distance between class-maps $M_{u}$ and $M_{v}$ for a better separation between clusters. The weighting parameters $λ_{dis}$ and $β_{dis}$ are used to adjust the amount of contribution of the optimized terms in the overall process.

The gradient of the optimized part can be computed exactly at each iteration. For each cluster, we summed the gradient contributions brought by the two components. Based on the definition in equation (23), update of intra-cluster dispersion in each cluster can be performed after optimizations. The limitation of this approach is that too long-time intervals can cause the obtained intra-cluster dispersion to be inaccurate. So, it needs to be run several times, which makes the overall computing time of the algorithm increase.

For inter-cluster dispersion, we use a similar approach to obtain the optimized inter-cluster values ${\tilde{η}}_{ζ}$ . In this way, if the users in a cluster move within a certain time, we use the NCRBM model to obtain optimal intra-cluster dispersion and inter-cluster dispersion while maintaining the manifold structure feature within a certain time. However, the overhead of manifold learning increases due to the constant updating of the dispersion values for intra- and inter-cluster users. This is mainly due to the fact that the introduced NCRBM model requires some iterations.

The discriminative function $J (A)$ is transformed as

J (A) = max_{A} \frac{A^{T} {\tilde{η}}_{ξ} A}{A^{T} {\tilde{η}}_{ζ} A}

(26)

\begin{matrix} J (A) & = max_{A} \frac{A^{T} (\tilde{υ} {\tilde{S}}_{ξ, whole} + (1 - \tilde{υ}) {\tilde{S}}_{ξ, local}) A}{A^{T} (\tilde{℘} {\tilde{S}}_{ζ, whole} + (1 - \tilde{℘}) {\tilde{S}}_{ζ, local}) A} \\ s . t . {h'}_{i, k} & = A^{T} h_{i, k} \end{matrix}

(27)

According to equation (27), the low-dimensional mapping of the $k th$ user channel matrix $h'_{i, k}$ is determined by the projection matrix $A$ . The Lagrange multipliers are introduced to transform the original optimization problem into a Lagrangian function problem to find the optimal projection matrix $A$ with intra-cluster and inter-cluster dispersion values.⁴¹ By solving the generalized eigenvalues of the discriminative function, we can obtain the projection matrix $A = [A_{1}, \dots, A_{n}]$ . $n$ is the dimensionality reduction of user channel matrix. After user clustering, the channel correlation of users in the same cluster is enhanced.

Then, according to the intra-cluster graph and inter-cluster graph constructed using the label information of user characteristics, the user clusters can be divided more accurately with lower complexity. Based on the maximum and minimum distances and the weighted likelihood similarity criterion, an optimized spatial fuzzy c-means clustering algorithm is proposed. The algorithm is an iterative optimization that minimizes the cost function defined as follows

J (μ_{i, k}) = \sum_{k = 1}^{K} \sum_{i = 1}^{L} μ_{i, k}^{ℑ} d_{i, k}

(28)

where $d_{i, k} = (1 / \sqrt{2}) tr (ψ_{k, i} ψ_{k, i}^{T})$ is the similarity measurement function between the $k th$ user and the $i th$ cluster center. $μ_{i, k}$ represents the membership function of user $u_{i, k}$ in the $i th$ cluster, and ℑ is a constant. The parameter ℑ controls the ambiguity of the result partition; in this article, we set $ℑ = 2$ . When assigning high membership values to users $u_{i, k}$ close to the cluster center and assigning low membership values to users $u_{i, k}$ far away from the cluster center, the cost function $J (μ_{i, k})$ is the smallest. The membership function represents the probability that a user $u_{i, k}$ belongs to a specific cluster. Member functions and cluster center are updated as follows

μ_{i, k} = \frac{1}{\sum_{i' = 1}^{L} {(\frac{d_{i, k}}{d_{i, i'}})}^{\frac{1}{(ℑ - 1)}}}

(29)

and

V_{i, k} V_{i, k}^{T} = \frac{\sum_{k = 1}^{K} μ_{i, k}^{ℑ} U_{i, k} U_{i, k}^{T}}{\sum_{k = 1}^{K} μ_{i, k}^{ℑ}}

(30)

where $d_{i, i'} = (1 / \sqrt{2}) tr (ψ_{i, i'} ψ_{i, i'}^{T})$ is the similarity measurement function between the $i th$ cluster center and the $i' th$ cluster center. In summary, to represent each user set as a manifold, the process of clustering-oriented manifold discriminative learning is as follows:

Step 1: construct the user channel characteristic graphs ${(h_{i, k}, m_{k, j})}_{i = 1}^{L}$ .

Step 2: find the two farthest distances $U_{i}$ and $U_{i'}$ , and their center as the first user group, that is, $V_{1}^{(0)} = U_{i}, V_{2}^{(0)} = U_{i'}$ . The number of the user clusters is $i = 2$ .

Step 3: from the Euclidean distance criterion $d_{pm} (U_{k} U_{k}^{T}, V_{i} V_{i}^{T}) = (1 / \sqrt{2}) tr (ψ_{k, i} ψ_{k, i}^{T})$ , all users are gathered in the $i$ user cluster.

Step 4: among the $i$ user groups that have completed clustering, find the weakest similarity point in each user group to obtain the $i$ user clusters. Calculate the sum distance $d_{i, k}$ between the user $k (k = 1, 2, . . ., K)$ , the membership functions $μ_{i, k}^{(0)}$ , and the center point $V_{i}^{(0)} (i = 1, 2, . . ., L)$ of each user cluster in turn.

Step 5: calculate the spatial membership function and update the center point $V_{i}^{(0)} (i = 1, 2, . . ., L)$ of each user cluster with

V_{i, k} V_{i, k}^{T} = \frac{\sum_{k = 1}^{K} {({μ'}_{i, k})}^{ℑ} U_{i, k} U_{i, k}^{T}}{\sum_{k = 1}^{K} {({μ'}_{i, k})}^{ℑ}}

Then, the maximum value among $d_{i, k}$ is found. $V_{i + 1}^{(0)} = \arg max_{k} d_{i, k}$ . All users entering $(i + 1)$ are re-divided into different user clusters.

Step 6: if $i = i + 1 \geq L$ is true, perform Step 5; otherwise repeat Step 3.

Step 7: $‖ {(U_{k} Σ_{k}^{1 / 2})}^{H} V_{i} ‖_{F}^{2}$ is calculated, each user is arranged to the user clusters with the largest similarity coefficient.

Step 8: output cluster result, and the number of users in each cluster.

Step 9: calculate the $m_{ξ, k, j}$ and $m_{ζ, k, j}$ according to equations (13) and (18); construct intra-cluster graph and inter-cluster graph using the label information of user characteristics.

Step 10: calculate the $S_{ζ, whole}$ , $S_{ξ, local}$ , $S_{ζ, local}$ , and $S_{ζ, whole}$ according to equations (9), (11), (15), and (16).

Step 11: calculate the $η_{ζ}$ and $η_{ξ}$ according to equations (12) and (17).

Step 12: calculate the optimized ${\tilde{η}}_{ξ}$ and ${\tilde{η}}_{ζ}$ according to equation (23).

Step 13: optimize the discriminative function $J (A)$ according to equation (27).

Step 14: according to the obtained projection matrix, get the projection in low-dimensional subspace $h^{'}$ .

Manifold discriminative learning for hybrid precoding

On the basis of manifold discriminative learning for global dimensionality reduction and user clustering, we investigate the sum-rate maximization problem for hybrid precoding. In order to design the precoding matrix $F'_{G_{i}} W'_{G_{i}}$ , the sum-rate maximization of mixed precoding g is studied, such that they manage intra-cluster interference and inter-cluster interference. To improve the spectral efficiency of the systems, the design of each cluster analog precoding should balance optimizing self-transmission and interference. By representing each user set as a manifold, the received signal of the cluster can be represented as

\begin{matrix} {y'}_{G_{i}} = H'_{G_{i}}^{H} {F'}_{G_{i}} {W'}_{G_{i}} x_{G_{i}} \\ + \sum_{k' = 1, k' \neq k}^{| G_{i} |} H'_{G_{i}, k}^{H} {F'}_{G_{i}, k'} {W'}_{G_{i}, k'} x_{G_{i}, k'} \\ + \sum_{i' = 1, i' \neq i}^{L} H'_{G_{i'}}^{H} {F'}_{G_{i'}} {W'}_{G_{i'}} x_{G_{i'}} + n_{G_{i}} \end{matrix}

(31)

where $y'_{G_{i}} = [y'_{G_{i}, 1}^{T}, \dots, y'_{G_{i}, g_{i}}^{T}]^{T}$ represents the received signal, $H'_{G_{i}} = [H'_{G_{i}, 1}, \dots, H'_{G_{i}, g_{i}}]$ represents the channel matrix for the $i th$ cluster, $F'_{G_{i}} = [F'_{G_{i}, 1}, \dots, F'_{G_{i}, g_{i}}]$ , and $W'_{G_{i}} = diag (W'_{G_{i}, 1}, \dots, W'_{G_{i}, g_{i}})$ . $H'_{G_{i}}^{H} F'_{G_{i}} W'_{G_{i}} x_{G_{i}}$ is the desired signal, $\sum_{k' = 1, k' \neq k}^{| G_{i} |} H_{G_{i}, k}'^{H} {F'}_{G_{i, k'}} {W'}_{G_{i}, k'} x_{G_{i}, k'}$ is the intra-cluster interference, $\sum_{i' = 1, i' \neq i}^{L} H'_{G_{i'}}^{H} {F'}_{G_{i'}} {W'}_{G_{i'}} x_{G_{i'}}$ is the inter-cluster interference after the low-dimensional mapping. In order to adapt to special scenarios and requirements, the hybrid precoding matrix can be determined by per-cluster processing (PCP). The goal of PCP is to balance the performance and complexity by effectively separating the clusters in the RF beam domain.

In PCP mode, the analog precoding matrix $F'_{G_{i}}$ of each cluster is calculated according to manifold quasi-conjugate gradient algorithm, while the digital precoding matrix $W'_{G_{i}}$ is calculated by each user cluster according to their equivalent channel matrix. Let $H'_{eq} = H'^{H} F'$ denote the equivalent channel matrix after analog precoding, and it is an approximate block diagonal matrix, which can be expressed as

H'_{eq} = [\begin{matrix} H'_{G_{1}}^{H} {F'}_{G_{1}} & H'_{G_{1}}^{H} {F'}_{G_{2}} & \dots & H'_{G_{1}}^{H} {F'}_{G_{L}} \\ H'_{G_{2}}^{H} {F'}_{G_{1}} & H'_{G_{2}}^{H} {F'}_{G_{2}} & \dots & H'_{G_{2}}^{H} {F'}_{G_{L}} \\ \dots & \dots & \dots & \dots \\ H'_{G_{L}}^{H} {F'}_{G_{1}} & H'_{G_{L}}^{H} {F'}_{G_{2}} & \dots & H'_{G_{L}}^{H} {F'}_{G_{L}} \end{matrix}]

(32)

where $H'_{e q_{G_{i}}} = H'_{G_{i}}^{H} F'_{G_{i}}$ represents the diagonal elements of the matrix in equation (32), and off-diagonal elements of the matrix $H'_{G_{i}}^{H} F'_{G_{i'}} (i \neq i')$ represents the interference channel matrix between user clusters. After analog precoding, the inter-cluster interference is eliminated, that is, $H'_{G_{i}}^{H} F'_{G_{j}} \approx 0$ . $H'_{eq}$ can be expressed as

H'_{eq} = [\begin{matrix} H'_{G_{1}}^{H} {F'}_{G_{1}} & 0 & \dots & 0 \\ 0 & H'_{G_{2}}^{H} {F'}_{G_{2}} & \dots & 0 \\ \dots & \dots & \dots & \dots \\ 0 & 0 & \dots & H'_{G_{L}}^{H} {F'}_{G_{L}} \end{matrix}]

(33)

The digital precoding matrix $W'$ is a block diagonal matrix as follows

W' = diag ({W'}_{G_{1}}, \dots, {W'}_{G_{L}})

(34)

With scalar equalization $β_{G_{i}}^{- 1}$ , the signal estimate ${\hat{x}}_{G_{i}}$ for $G_{i}$ can be expressed as

{\hat{x}}_{G_{i}} = β_{G_{i}}^{- 1} (H'_{G_{i}}^{H} {F'}_{G_{i}} {W'}_{G_{i}} x_{G_{i}} + \sum_{k' = 1, k' \neq k}^{| G_{i} |} H'_{G_{i}, k}^{H} {F'}_{G_{i, k'}} {W'}_{G_{i}, k'} x_{G_{i}, k'} + \sum_{i' = 1, i' \neq i}^{L} H'_{G_{i'}}^{H} {F'}_{G_{i'}} {W'}_{G_{i'}} x_{G_{i'}} + n_{G_{i}})

(35)

where $β_{G_{i}}$ is a scaling equalization that is jointly optimized with the hybrid precoding. The conditional mean square error (MSE) for $G_{i}$ is

\begin{matrix} ε ({F'}_{G_{i}}, {W'}_{G_{i}}, β_{G_{i}}) = E [{‖ x_{G_{i}} - {\hat{x}}_{G_{i}} ‖}^{2}] \\ = E [{‖ x_{G_{i}} - β_{G_{i}}^{- 1} (H'_{G_{i}}^{H} {F'}_{G_{i}} {W'}_{G_{i}} x_{G_{i}}) ‖}^{2}] \\ + E [\sum_{k' = 1, k' \neq k}^{| G_{i} |} ‖ β_{G_{i}}^{- 1} H'_{G_{i}, k}^{H} {F'}_{G_{i, k'}} {W'}_{G_{i}, k'} x_{G_{i}, k'} ‖] \\ + [\sum_{i' = 1, i' \neq i,}^{L} {‖ β_{G_{i}}^{- 1} H'_{G_{i'}}^{H} {F'}_{G_{i'}} {W'}_{G_{i'}} x_{G_{i'}} ‖}^{2} + β_{G_{i}}^{- 2} n_{G_{i}}] \end{matrix}

(36)

The conditional MSE in equation (31) is simplified as

ε ({F'}_{G_{i}}, {W'}_{G_{i}}, β_{G_{i}}) = ε_{G_{i}}^{(1)} + ε_{G_{i}}^{(2)}

(37)

where

ε_{G_{i}}^{(1)} = E [{‖ x_{G_{i}} - β_{G_{i}}^{- 1} (H'_{G_{i}}^{H} {F'}_{G_{i}} {W'}_{G_{i}} x_{G_{i}}) ‖}^{2}]

(38)

\begin{matrix} ε_{G_{i}}^{(2)} = E [\sum_{k' = 1, k' \neq k}^{| G_{i} |} ‖ β_{G_{i}}^{- 1} H'_{G_{i}, k}^{H} {F'}_{G_{i, k'}} {W'}_{G_{i}, k'} x_{G_{i}, k'} ‖] \\ + E [\sum_{i' = 1, i' \neq i,}^{L} {‖ β_{G_{i}}^{- 1} H'_{G_{i'}}^{H} {F'}_{G_{i'}} {W'}_{G_{i'}} x_{G_{i'}} ‖}^{2} + β_{G_{i}}^{- 2} n_{G_{i}}] \end{matrix}

(39)

Therefore, the hybrid precoding based on interference leakage is jointly optimized with $F'_{G_{i}}$ , $W'_{G_{i}}$ , and $β_{G_{i}}$ . According to the literature,¹⁹ $W'_{G_{i}}$ can be transformed to into $W'_{G_{i}} = β_{G_{i}} {W ″}_{G_{i}}$ , where ${W ″}_{G_{i}}$ is an unnormalized digital precoding matrix, which can be obtained by KKT conditions as

\begin{matrix} {W ″}_{G_{i}} = {(H'_{e q_{G_{i}}}^{H} {H'}_{e q_{G_{i}}} + γ_{G_{i}}^{- 1} I_{G_{i}})}^{- 1} H'_{e q_{G_{i}}}^{H} \\ = {(F'_{G_{i}}^{H} {H'}_{G_{i}} H'_{G_{i}}^{H} {F'}_{G_{i}} + γ_{G_{i}}^{- 1} I_{G_{i}})}^{- 1} F'_{G_{i}}^{H} {H'}_{G_{i}} \end{matrix}

(40)

where $γ_{G_{i}}^{- 1}$ is the regularization factor, which depends on noise variance and base station transmit power. $I_{G_{i}}$ is

\begin{matrix} I_{G_{i}} = \sum_{k' = 1, k' \neq k}^{| G_{i} |} H_{G_{i}, k}^{H} F_{G_{i, k'}} W_{G_{i}, k'} x_{G_{i}, k'} \\ + \sum_{i' = 1, i' \neq i,}^{L} H'_{G_{i'}}^{H} {F'}_{G_{i'}} {W'}_{G_{i'}} x_{G_{i'}} + n_{G_{i}} \end{matrix}

(41)

The optimal value given in Ayach et al.¹² is $γ_{G_{i}}^{- 1} = P_{tol} / K σ^{2}$ . $P_{tol}$ is the total power of the transmitted signal. The optimal scaling factor $β_{G_{i}}$ can be obtained from the base station transmission power with $tr (F' W' W'^{H} F'^{H}) \leq P_{tol}$ as

β_{G_{i}} = \sqrt{\frac{P_{tol}}{\sum_{i = 1}^{L} tr ({F'}_{G_{i}} {W'}_{G_{i}} W'_{G_{i}}^{H} F'_{G_{i}}^{H})}}

(42)

Accordingly, equation (38) can be expressed as

\begin{matrix} ε_{G_{i}}^{(1)} = E [{‖ x_{G_{i}} - β_{G_{i}}^{- 1} (H'_{G_{i}}^{H} {F'}_{G_{i}} {W'}_{G_{i}} x_{G_{i}}) ‖}^{2}] \\ = E {tr [{(x_{G_{i}} - β_{G_{i}}^{- 1} (H'_{G_{i}}^{H} {F'}_{G_{i}} {W'}_{G_{i}} x_{G_{i}}))}^{H} (x_{G_{i}} - β_{G_{i}}^{- 1} (H'_{G_{i}}^{H} {F'}_{G_{i}} {W'}_{G_{i}} x_{G_{i}}))]} \\ = E {tr [(x_{G_{i}}^{H} - β_{G_{i}}^{- 1} {(H'_{G_{i}}^{H} {F'}_{G_{i}} {W'}_{G_{i}} x_{{G_{i}}_{i}})}^{H}) (x_{G_{i}} - β_{G_{i}}^{- 1} (H'_{G_{i}}^{H} {F'}_{G_{i}} {W'}_{G_{i}} x_{G_{i}}))]} \\ = E {tr (x_{G_{i}}^{H} x_{G_{i}})} - E {tr [β_{G_{i}}^{- 1} x_{G_{i}}^{H} ({\tilde{H}}_{G_{i}}^{H} {\tilde{F}}_{G_{i}} {\tilde{W}}_{G_{i}} x_{G_{i}})]} - \\ E {tr [β_{G_{i}}^{- 1} {(H'_{G_{i}}^{H} {F'}_{G_{i}} {W'}_{G_{i}} x_{G_{i}})}^{H} x_{G_{i}}]} + E {tr [β_{G_{i}}^{- 2} {(H'_{G_{i}}^{H} {F'}_{G_{i}} {W'}_{G_{i}} x_{G_{i}})}^{H} (H'_{G_{i}}^{H} {F'}_{G_{i}} {W'}_{G_{i}} x_{G_{i}})]} \end{matrix}

(43)

After simple mathematical derivation, equation (39) can be expressed as

\begin{matrix} ε_{G_{i}}^{(2)} = E [\sum_{k' = 1, k' \neq k}^{| G_{i} |} ‖ β_{G_{i}}^{- 1} H'_{G_{i}, k}^{H} {F'}_{{G_{i}}_{, k'}} {W'}_{G_{i}, k'} x_{G_{i}, k'} ‖] + \sum_{i' = 1, i' \neq i}^{L} E [‖ β_{G_{i}}^{- 1} H'_{G_{i}}^{H} {F'}_{G_{i'}} {W'}_{G_{i'}} x_{G_{i'}} ‖_{F}^{2} + n_{G_{i}}] \\ = \sum_{k' = 1, k' \neq k}^{| G_{i} |} β_{G_{i}}^{- 2} tr [H'_{G_{i}, k}^{H} {F'}_{G_{i, k'}} {W'}_{G_{i}, k'} E (x_{G_{i}, k'} x_{G_{i}, k'}^{H}) {W'}^{H}_{G_{i}, k'} {F'}^{H}_{G_{i, k'}} H'_{G_{i}, k}] \\ + \sum_{i' = 1, i' \neq i}^{L} β_{G_{i}}^{- 2} tr [H'_{G_{i'}}^{H} {F'}_{G_{i'}} {W'}_{G_{i'}} E (x_{G_{i'}} x_{G_{i'}}^{H}) W'_{G_{i'}}^{H} F'_{G_{i'}}^{H} {H'}_{G_{i'}}] + β_{G_{i}}^{- 2} g_{G_{i}} σ^{2} \\ = \sum_{k' = 1, k' \neq k}^{| G_{i} |} β_{G_{i}}^{- 2} g_{i} tr (H'_{G_{i}, k}^{H} {F'}_{G_{i, k'}} {W'}_{G_{i}, k'} {W'}^{H}_{G_{i}, k'} {F'}^{H}_{G_{i, k'}} H'_{G_{i}, k}) \\ + \sum_{i' = 1, i' \neq i}^{L} β_{G_{i}}^{- 2} g_{i'} tr (H'_{G_{i'}}^{H} {F'}_{G_{i'}} {W'}_{G_{i'}} W'_{G_{i'}}^{H} F'_{G_{i'}}^{H} {H'}_{G_{i'}}) + β_{G_{i}}^{- 2} g_{G_{i}} σ_{n}^{2} \end{matrix}

(44)

From the above analysis, it is essentially to find a radio frequency precoding matrix $F'_{G_{i}}$ so that the objective function obtains a minimum value of $ε_{G_{i}}^{(1)}$ and $ε_{G_{i}}^{(2)}$ . However, the above function is not a convex optimization, but this problem is equivalent to the unconstrained optimization problem, which can be solved using the manifold optimization method.⁴⁰

To obtain a minimum value, we discover both geometrical and discriminant embedding space $Ψ_{L}$ for each cluster L in the radio frequency precoding matrix $F'_{G_{i}}$ . We solve the following objective function with the choice of symmetric weight $B_{ϕ, f_{1} f_{2}}^{L}$ , where $f_{1}, f_{2} \in G_{i}$ are the users in the cluster L

min \sum_{f_{1} f_{2}} {‖ Ψ_{L}^{T} {F'}_{f_{1}} - Ψ_{L}^{T} {F'}_{f_{2}} ‖}^{2} B_{ϕ, f_{1} f_{2}}^{L}

(45)

max \sum_{f_{1} f_{2}} {‖ Ψ_{L}^{T} {F'}_{f_{1}} - Ψ_{L}^{T} {F'}_{f_{2}} ‖}^{2} B_{φ, f_{1} f_{2}}^{L}

(46)

For the Lth cluster, $B_{φ, f_{1} f_{2}}^{L}$ incurs a penalty if the users are close to each other but in different clusters, whereas $B_{ϕ, f_{1} f_{2}}^{L}$ encourages users $f_{1}$ and $f_{2}$ map closer if users $f_{1}$ and $f_{2}$ are in the same cluster. We define

B_{φ, f_{1} f_{2}}^{L} = {\begin{matrix} \exp (\frac{- ‖ {F'}_{f_{1}} - {F'}_{f_{2}} ‖}{t}), if {F'}_{f_{1}} \in N_{φ} ({F'}_{f_{2}}) or {F'}_{f_{2}} \in N_{φ} ({F'}_{f_{1}}) \\ 0, otherwise \end{matrix}

(47)

B_{ϕ, f_{1} f_{2}}^{L} = {\begin{matrix} \exp (\frac{- ‖ {F'}_{f_{1}} - {F'}_{f_{2}} ‖}{t}), if {F'}_{f_{1}} \in N_{ϕ} ({F'}_{f_{2}}) or {F'}_{f_{2}} \in N_{ϕ} ({F'}_{f_{1}}) \\ 0, otherwise \end{matrix}

(48)

where $N_{ϕ} (F'_{f_{1}})$ contains the $G_{i}^{'} - nearest$ users sharing the precoding with $F'_{f_{1}}$ , whereas $N_{φ} (F'_{f_{1}})$ contains the $G_{i}^{'} - nearest$ neighbors having different precoding, and t is the parameter of kernel function that follows a Gaussian distribution approximately.⁴²

Following some simple algebraic steps, equation (45) can be reduced to

\begin{matrix} S_{ϕ} = \frac{1}{2} \sum_{f_{1} f_{2}} {‖ Ψ_{L}^{T} {F'}_{f_{1}} - Ψ_{L}^{T} {F'}_{f_{2}} ‖}^{2} B_{ϕ, f_{1} f_{2}}^{L} \\ = \sum_{f_{1}} Ψ_{L}^{T} {F'}_{f_{1}} D_{ϕ, f_{1} f_{1}}^{L} {({F'}_{f_{1}})}^{T} Ψ_{L} \\ - \sum_{f_{1} f_{2}} Ψ_{L}^{T} {F'}_{f_{1}} B_{ϕ, f_{1} f_{2}}^{L} {({F'}_{f_{1}})}^{T} Ψ_{L} \\ = Ψ_{L}^{T} \tilde{F}' (D_{ϕ}^{L} - B_{ϕ}^{L}) {(\tilde{F}')}^{T} Ψ_{L} \\ = Ψ_{L}^{T} \tilde{F}' Ω_{ϕ} {(\tilde{F}')}^{T} Ψ_{L} \end{matrix}

(49)

where $D_{ϕ}^{L}$ a diagonal matrix with $D_{ϕ}^{L} (f_{1}, f_{1}) = \sum_{f_{2}} B_{ϕ, f_{1} f_{2}} (f_{1}, f_{2})$ , $Ω_{ϕ} = (D_{ϕ}^{L} - B_{ϕ}^{L})$ is the Laplacian matrix of the $B_{ϕ, f_{1} f_{2}}^{L}$ , and $\tilde{F}' = {1, 2, \dots, G_{i}}$ .

Similarly, equation (46) can be simplified as

\begin{matrix} S_{ϕ} = \frac{1}{2} \sum_{f_{1} f_{2}} {‖ Ψ_{L}^{T} {F'}_{f_{1}} - Ψ_{L}^{T} {F'}_{f_{2}} ‖}^{2} B_{φ, f_{1} f_{2}}^{L} \\ = Ψ_{L}^{T} \tilde{F}' (D_{φ}^{L} - B_{φ}^{L}) {(\tilde{F}')}^{T} Ψ_{L} \\ = Ψ_{L}^{T} \tilde{F}' Ω_{φ} {(\tilde{F}')}^{T} Ψ_{L} \end{matrix}

(50)

Thus, the discriminant embedding space $Ψ_{L}$ can be derived by maximizing the following objective function

J (Ψ_{L}) = \frac{tr (S_{ϕ})}{tr (S_{φ})}

(51)

This is equivalent to find the largest $ϑ$ eigenvalue of the following generalized eigenvalue problem

\tilde{F}' Ω_{ϕ} {(\tilde{F}')}^{T} v = α \tilde{F}' Ω_{φ} {(\tilde{F}')}^{T} v

(52)

Let the largest $ϑ$ eigenvalues ${α_{1}, α_{2}, \dots, α_{ϑ}}$ that correspond to the largest $ϑ$ eigenvectors ${v_{1}, v_{2}, \dots, v_{ϑ}}$ be the solutions of equation (52), which are chosen as the discriminant embedding space $Ψ_{L}$ , where $α_{1} \geq \dots \geq α_{2} \geq α_{ϑ}$ .

Once we have obtain the discriminant embedding space $Ψ_{L}$ , the most distinguishing precodings are preserved in the most suitable projection space. In the low-dimensional discriminative embedding space, neighboring users of the same cluster approach each other, while preventing users of other clusters from entering the neighborhood.

Local discriminant matrix is defined as follows

G_{local} = \frac{\frac{α'}{(1 - α') Ω_{ϕ}}}{Ω_{φ}}

(53)

where $α'$ is a scalar parameter. The local discriminant matrix is used as the input of kernel function.

In the general case, the distance in a Grassmann manifold (a particular class of manifolds) is the length of the shortest geodesic connecting two users in lineal subspaces $U_{f_{1}}$ and $U_{f_{2}}$ , this is

d_{G} (U_{f_{1}}, U_{f_{2}}) = ‖ Θ ‖_{2}

(54)

where $Θ = [τ_{1}, τ_{2}, \dots, τ_{κ}]$ is the principal angles $0 \leq τ_{1} \leq τ_{2} \leq \dots \leq τ_{κ} \leq π / 2$ .

Having defined projection distance over Grassmann manifold, the distance from user $f_{1}$ to $f_{2}$ in the discriminant embedding space $Ψ_{L}$ also needs to be taken into account. To cover the intrinsic linear subspace, the user-to-user distance is denoted by

\frac{{({({F'}_{f_{1}})}^{T} Ψ_{L} {(Ψ_{L})}^{T} {F'}_{f_{2}})}^{2}}{({({F'}_{f_{1}})}^{T} Ψ_{L} {(Ψ_{L})}^{T} {F'}_{f_{1}}) ({({F'}_{f_{2}})}^{T} Ψ_{L} {(Ψ_{L})}^{T} {F'}_{f_{2}})}

(55)

When the two distances are formalized as above, we arrive at the following form of user-to-user distance metric

\begin{matrix} d (f_{1}, f_{2}, Ψ_{L}) = (1 - σ) \\ \frac{{({({F'}_{f_{1}})}^{T} Ψ_{L} {(Ψ_{L})}^{T} {F'}_{f_{2}})}^{2}}{({({F'}_{f_{1}})}^{T} Ψ_{L} {(Ψ_{L})}^{T} {F'}_{f_{1}}) ({({F'}_{f_{2}})}^{T} Ψ_{L} {(Ψ_{L})}^{T} {F'}_{f_{2}})} \\ + σ ‖ U_{f_{1}} - U_{f_{2}} {U'}_{f_{2}} U_{f_{1}} ‖_{F}^{2} \end{matrix}

(56)

In equation (55), the former describes how far away the origins of the two coordinate systems $U_{f_{1}}$ and $U_{f_{2}}$ in the discriminant embedding space $Ψ_{L}$ , whereas the latter reflects the correlation between two orthonormal basis matrices.

The local discriminant matrix $G_{local}$ is chosen as the input of Kernel function

G_{f_{1} f_{2}} = {\begin{matrix} \exp (\frac{- \sum_{f_{1} f_{2}} {‖ (α_{f_{1}} + \sum_{f_{1}} β_{f_{1}} U_{f_{1}}) - (α_{f_{2}} + \sum_{f_{2}} β_{f_{2}} U_{f_{2}}) ‖}^{2} G_{local}}{t}) \\ 0, otherwise \end{matrix}

(57)

where t is the parameter of the kernel function, $α_{f_{1}}$ and $α_{f_{2}}$ are the mean vectors of the manifolds where the users are located, and $β_{f_{1}}$ and $β_{f_{2}}$ are the vector of free parameters of $F'_{f_{1}}$ and $F'_{f_{2}}$ .

By minimizing the objective function below, we can find a suitable mapping on which the manifolds belonging to the same subspace can be closer and the manifolds in different subspaces can be further apart

\frac{1}{2} \sum_{f_{1} f_{2}} ‖ Γ^{T} (α_{f_{1}} + \sum_{f_{1}} β_{f_{1}} U_{f_{1}}) - Γ^{T} (α_{f_{2}} + \sum_{f_{2}} β_{f_{2}} U_{f_{2}}) ‖ G_{f_{1} f_{2}}

(58)

The above formula can be simplified to

Γ^{T} D_{kernel} (D - G_{f_{1} f_{2}}) D_{kernel}^{T} Γ

(59)

where

D_{kernel} = \sum_{f_{1} f_{2}} ‖ (α_{f_{1}} + \sum_{f_{1}} β_{f_{1}} U_{f_{1}}) - (α_{f_{2}} + \sum_{f_{2}} β_{f_{2}} U_{f_{2}}) ‖

(60)

$D$ is a diagonal matrix, this is, $D (f_{1}, f_{2}) = \sum_{f_{2}} G_{f_{1} f_{2}}$ , $D$ establishes a natural metric between different subspaces. To limit z on a fixed scale, we add the following constraint

zD z^{T} = Γ^{T} D_{kernel} {DD}_{kernel}^{T} Γ = 1

(61)

Therefore, the objective function with constraint is obtained as follows

\begin{matrix} Γ = \arg min Γ^{T} D_{kernel} (D - G_{f_{1} f_{2}}) D_{kernel}^{T} Γ \\ s . t . Γ^{T} D_{kernel} {DD}_{kernel}^{T} Γ = 1 \end{matrix}

(62)

Let $J (Γ)$ represent the objective function. The hybrid precoding optimization problem based on interference leakage under orthogonal constraints is

\begin{matrix} \arg {min}_{Γ} J (Γ) = \sum_{i = 1}^{L} ε_{G_{i}}^{(1)} + \sum_{i = 1}^{L} ε_{G_{i}}^{(2)} \\ s . t . Γ^{T} D_{kernel} {DD}_{kernel}^{T} Γ = 1 \end{matrix}

(63)

Finally, we can obtain the optimal $Γ$ by minimizing eigenvalue solution to the generalized eigenvalue problem.

Therefore, solving the objective function can be transformed into a convex optimization problem. The optimal radio frequency precoding matrix $F'_{G_{i}}$ is found to obtain a minimum value of $ε_{G_{i}}^{(1)}$ and $ε_{G_{i}}^{(2)}$ .

The manifold algorithm to find the optimal radio frequency precoding matrix $F'_{G_{i}}$ is as follows:

Step 1: initialize the analog precoding matrix $F'_{G_{i}, 1}$ , error threshold $ε \in (0, 1)$ , the discriminant embedding space $Ψ_{L}$ .

Step 2: the learning discriminant embedding space $Ψ_{L}$ can be derived according to equations (45)–(51).

Step 3: find the largest $ϑ$ eigenvalue according to equation (52).

Step 4: compute local discriminant matrix $G_{local}$ according to equation (53).

Step 5: compute subspace-to-subspace distance according to equation (56).

Step 6: update $Γ$ to subject to $Γ^{T} D_{kernel} {DD}_{kernel}^{T} Γ = 1$ , and find the optimal radio frequency precoding matrix $F'_{G_{i}}$ according to equation (63). Update the analog precoding matrix until convergence to satisfy the error threshold condition, the algorithm ends.

For the intra-cluster, there is correlation between the channels between users. Users in non-adjacent clusters are much smaller than users in adjacent clusters, but the interference is equal. So, the impact of remote user clusters on users in the cluster can be ignored. The SNR of a user cluster $G_{i}$ in $b th$ cell is as follows

\begin{matrix} SIN R_{G_{i}} = \\ \frac{{| H'_{G_{i}}^{H} F'_{G_{i}} {W'}_{G_{i}} |}^{2} P_{G_{i}}}{\sum_{k' = 1, k' \neq k}^{| G_{i} |} {| {IN}_{G_{i}, k'} |}^{2} P_{G_{i}, k'} + \sum_{i' = 1, i' \neq i}^{L} {| {IN}_{G_{i'}} |}^{2} P_{G_{i'}} + σ_{G_{i}}^{2}} \end{matrix}

(64)

where ${IN}_{G_{i}, k'} = H'_{G_{i}, k}^{H} F'_{G_{i, k'}} W'_{G_{i}, k'}$ , ${IN}_{G_{i'}} = H'_{G_{i'}}^{H} F'_{G_{i'}} W'_{G_{i'}}$ , and $P_{G_{i}}$ are the transmit power of the $G_{i} th$ cluster, and $P_{G_{i}, k'}$ and $P_{G_{i'}}$ are the transmit power of the $k' th$ user in the $i th$ cluster and the transmit power of the $G_{i'} th$ cluster, respectively.

The capacity of mmWave massive MIMO system can be expressed as

SUM = \sum_{G_{i} = G_{1}}^{G_{L}} \log_{2} (1 + SIN R_{G_{i}})

(65)

Equation (65) can be written as

SUM = \sum_{G_{i} = G_{1}}^{G_{L}} \log_{2} (1 + \frac{{| H'_{G_{i}}^{H} F'_{G_{i}} {W'}_{G_{i}} |}^{2} P_{G_{i}}}{\sum_{k' = 1, k' \neq k}^{| G_{i} |} {| {IN}_{G_{i}, k'} |}^{2} P_{G_{i}, k'} + \sum_{i' = 1, i' \neq i}^{L} {| {IN}_{G_{i'}} |}^{2} P_{G_{i'}} + σ_{G_{i}}^{2}})

(66)

Simulation results

In this section, we will study the SE and bit error rate (BER) performance of the proposed hybrid precoder. We compare the method in this article with several traditional methods, that is, OMP, KDHB, AFHB, MO, and RTRNM. We also consider the performance of this method for non-mobile users and mobile users. The basic simulation parameters are as follows.

The carrier frequency is 60 GHz. The AoAs and AoDs are uniformly distributed in $[0, 2 π]$ , and a common Angle Spread (AS) $Δ = 8$ . The complex gain of each path obeys the distribution $CN (0, 1)$ . The ULA is adopted in simulations.²⁶ Among them, there are many overlapping parts between the channel power azimuth spectrum, which causes inter-cluster interference.

Figure 4 shows the differences of sum-rate performance with traditional schemes in the mmWave massive MIMO system of hybrid precoding. Let $N_{t} = 128$ , $N_{RF} = 32$ , and $K = 32$ . In Figure 4, we can see that the proposed scheme can achieve a very high sum-rate than other existing precodings against different SNR. The reason for this situation is that traditional methods have not solved the problem of non-linear resolution of multi-user high-dimensional channels. In our proposed scheme, each user set is represented by manifold, and the cluster-oriented multi-manifold learning scheme is used to solve the above problems. Aiming at the high-density hotspot scene of the cell, the geometric model of clustering users is studied. This method can better solve the interference within and between clusters. Through user clustering and hybrid precoding, the achievable summation rate of the mmWave massive MIMO system is improved.

Figure 4.

Sum-rate comparison of different schemes.

Figure 5 shows the differences of the BER performance of different hybrid precoding schemes, where the channel parameters in the single-cell scenario are the same as above. From Figure 5, in the case of different SNRs, a conclusion similar to Figure 4 can be drawn. We can also see that the proposed-based manifold discriminative learning scheme achieve a better BER performance than other schemes. The proposed scheme improves beamspace resolution and reduces the influence of power leakage on beamspace channel.

Figure 5.

BER performance comparison of different schemes.

Figure 6 compares the average sum-rate of this article and the traditional scheme, and other existing precodings with different numbers of users. We set $n_{RF, i} \geq g_{i}$ in each cluster. It can be seen from Figure 6 that this method is superior to other methods. Due to the continuous increase in the number of users, the traditional solution does not consider the problem of non-linear resolution of multi-user high-dimensional channels. This scheme can well solve the interference in clusters, between clusters and between cells. The solution proposed in this article can significantly improve the average and rate of mmWave massive MIMO systems.

Figure 6.

Two-tier system average and rate comparison.

Figure 7 shows the change trend of the system average SNR as the SNR changes. Figure 8 shows the average SE when the BS antenna changes. We can learn from the figure that the proposed method achieves an average SE that is significantly higher than other traditional schemes. From Figure 7 that through the proposed manifold learning scheme, each user and its adjacent high-dimensional channels are in a global and local non-linear neighborhood. Aiming at the high-density hotspot scene of the community, the geometric model of clustering users is studied. The proposed scheme manages the multi-user and inter-cell interference and improves the data rate for cell-edge users. From Figure 8, the proposed method can effectively and extensively use antennas in multiple low-dimensional manifolds.

Figure 7.

Average SE versus cell-edge SNR.

Figure 8.

The average SE of the number of different BS antennas.

Conclusion

A hybrid precoding scheme for user clustering is proposed, which can solve the problem of large-scale mmWave MIMO in multiple low-dimensional manifolds, avoiding the high-dimensional complex operations of traditional schemes. For the BS, mmWave massive MIMO obtains a low-dimensional learning channel matrix by manifold. Then user clustering hybrid precoding is studied for the transmitted signal based on the low-dimensional channel matrix. The manifold discriminative learning seek to learn the embedding low-dimensional subspace, where manifolds with different user group labels are easier to distinguish, and the local spatial correlation of high-dimensional channels in each manifold is enhanced. Through proper user clustering, the hybrid precoding is investigated for the sum-rate maximization problem by manifold quasi-conjugate gradient methods. The simulation results show that the method has good robustness on the basis of reducing the computational complexity of the mmWave mass MIMO system.

More realistic precoding for MIMO is expected in the future research. In popular learning after user clustering we just use the traditional method, the choice of user clustering center is crucial, if the user is in high speed of movement will make our method inaccurate. So, more advanced research is needed. In addition, our current method is applied in hybrid precoding of full connections; precoding research is still needed for sub-connections as well as dynamic connections.

Footnotes

Handling Editor: Yanjiao Chen

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship and/or publication of this article: This work was supported by the Shanghai Capacity Building Projects in Local Institutions under Grant 19070502900.

ORCID iDs

Xiaoping Zhou

Yang Wang

References

Wang

Jia

Guo

, et al. Joint power, original bandwidth, and detected hole bandwidth allocation for multi-homing heterogeneous networks based on cognitive radio. IEEE T Veh Technol 2019; 68(3): 2777–2790.

Wang

Song

, et al. Multi-gigabit millimeter wave wireless communications for 5G: from fixed access to cellular networks. IEEE Commun Mag 2015; 53(1): 168–178.

Rangan

Rappaport

Erkip

Millimeter-wave cellular wireless networks: potentials and challenges. P IEEE 2014; 102(3): 366–385.

Vlachos

Alexandropoulos

Thompson

Massive MIMO channel estimation for millimeter wave systems via matrix completion. IEEE Signal Proc Let 2018; 25(11): 1675–1679.

Sun

Weighted sum-rate maximization for analog beamforming and combining in millimeter wave massive MIMO communications. IEEE Commun Lett 2017; 21(8): 1883–1886.

Sohrabi

Hybrid analog and digital beamforming for mmWave OFDM large-scale antenna arrays. IEEE J Sel Area Comm 2017; 35(7): 1432–1443.

Liu

Lau

VKN

Zhao

Stochastic successive convex optimization for two-timescale hybrid precoding in massive MIMO. IEEE J Sel Top Signa 2018; 12(3): 432–444.

, et al. Energy-efficient transceiver design for hybrid sub-array architecture MIMO systems. IEEE Access 2016; 4: 9895–9905.

Alexandropoulos

Chouvardas

. Low complexity channel estimation for millimeter wave systems with hybrid A/D antenna processing. In: 2016 IEEE Globecom Workshops (GC Wkshps), Washington, DC, 4–8 December 2016, pp.1–6. New York: IEEE.

10.

Han

S, I C

, et al. Large-scale antenna systems with hybrid analog and digital beamforming for millimeter wave 5G. IEEE Commun Mag 2015; 53(1): 186–194.

11.

Zhang

Letaief

KB.

A hardware-efficient analog network structure for hybrid precoding in millimeter wave systems. IEEE J Sel Top Signa 2018; 12(2): 282–297.

12.

Ayach

Rajagopal

Abu-Surra

, et al. Spatially sparse precoding in millimeter wave MIMO systems. IEEE T Wirel Commun 2014; 13(3): 1499–1513.

13.

Huang

Liu

Yuen

, et al. A LSE and sparse message passing-based channel estimation for mmWave MIMO systems. In: 2016 IEEE Globecom Workshops (GC Wkshps), Washington, DC, 4–8 December 2016, pp.1–6. New York: IEEE.

14.

Gao

Dai

Han

, et al. Compressive sensing techniques for next-generation wireless communications. IEEE Wirel Commun 2018; 25(3): 144–153.

15.

Chen

Tsai

Liu

, et al. Compressive sensing (CS) assisted low-complexity beamspace hybrid precoding for millimeter-wave MIMO systems. IEEE T Signal Proces 2017; 65(6): 1412–1424.

16.

Huang

Zhang

Xiao

Constant envelope hybrid precoding for directional millimeter-wave communications. IEEE J Sel Area Comm 2018; 36(4): 845–859.

17.

Zhu

Huang

Lau

VKN

, et al. Hybrid beamforming via the Kronecker decomposition for the millimeter-wave massive MIMO systems. IEEE J Sel Area Comm 2017; 35(9): 2097–2114.

18.

Wang

Huang

, et al. Codebook-based hybrid precoding for millimeter wave multiuser systems. IEEE T Signal Proces 2017; 65(20): 5289–5304.

19.

Kim

Lee

YH.

MSE-based hybrid RF/baseband processing for millimeter-wave communication systems in MIMO interference channels. IEEE T Veh Technol 2015; 64(6): 2714–2720.

20.

Mir

Zain Siddiqi

Mir

, et al. Machine learning inspired hybrid precoding for wideband millimeter-wave massive MIMO systems. IEEE Access 2019; 7: 62852–62864.

21.

Zhang

Huang

Wang

, et al. Hybrid precoding for wideband millimeter-wave systems with finite resolution phase shifters. IEEE T Veh Technol 2018; 67(11): 11285–11290.

22.

Park

Alkhateeb

Heath

RW.

Dynamic subarrays for hybrid precoding in wideband mmWave MIMO systems. IEEE T Wirel Commun 2017; 16(5): 2907–2920.

23.

Liu

Hybrid beamforming with dynamic subarrays and low-resolution PSs for mmWave MU-MISO systems. IEEE T Commun 2020; 68(1): 602–614.

24.

Jiang

Yuan

Zhen

Multi-user hybrid precoding for dynamic subarrays in mmWave massive MIMO systems. IEEE Access 2019; 7: 101718–101728.

25.

Sun

Rappaport

Shafi

, et al. Analytical framework of hybrid beamforming in multi-cell millimeter-wave systems. IEEE T Wirel Commun 2018; 17(11): 7528–7543.

26.

Shen

Zhang

, et al. Alternating minimization algorithms for hybrid precoding in millimeter wave MIMO systems. IEEE J Sel Top Signa 2016; 10(3): 485–500.

27.

Chen

Low-PAPR precoding design for massive multiuser MIMO systems via Riemannian manifold optimization. IEEE Commun Lett 2017; 21(4): 945–948.

28.

Mai

Le-Ngoc

Nguyen

DHN

. Two-timescale hybrid RF-baseband precoding with MMSE-VP for multi-user massive MIMO broadcast channels. IEEE T Wirel Commun 2018; 17(7): 4462–4476.

29.

Lin

Cong

Zhu

, et al. Hybrid beamforming for millimeter wave systems using the MMSE criterion. IEEE T Commun 2019; 67(5): 3693–3708.

30.

Zhou

Wang

Yang

, et al. A manifold learning two-tier beamforming scheme optimizes resource management in massive MIMO networks. IEEE Access 2020; 8: 22976–22987.

31.

Moe Thet

Baykas

Ozdemir

. Performance analysis of user scheduling in massive MIMO with fast moving users. In: 2019 IEEE 30th annual international symposium on personal, indoor and mobile radio communications (PIMRC), Istanbul, 8–11 September 2019, pp.1–6. New York: IEEE.

32.

Salous

Esposti

Fuschini

, et al. Millimeter-wave propagation: characterization and modeling toward fifth-generation systems [Wireless Corner]. IEEE Antenn Propag M 2016; 58(6): 115–127.

33.

Feng

Wang

Zhang

, et al. Fault diagnosis method of joint fisher discriminant analysis based on the local and global manifold learning and its kernel version. IEEE T Autom Sci Eng 2016; 13(1): 122–133.

34.

Sun

Gao

Wang

, et al. Principal component analysis-based broadband hybrid precoding for millimeter-wave massive MIMO systems. IEEE T Wirel Commun 2020; 19(10): 6331–6346.

35.

Cao

LMDAPNet: a novel manifold-based deep learning network. IEEE Access 2020; 8: 65938–65946.

36.

Selvan

Amato

Gallivan

, et al. Descent algorithms on oblique manifold for source-adaptive ICA contrast. IEEE T Neur Net Lear 2012; 23(12): 1930–1947.

37.

Hong

Wang

A nonconvex splitting method for symmetric nonnegative matrix factorization: convergence analysis and optimality. IEEE T Signal Proces 2017; 65(12): 3120–3135.

38.

Pascual-García

Molina-García-Pardo

Martínez-Inglés

, et al. On the importance of diffuse scattering model parameterization in indoor wireless channels at mm-Wave frequencies. IEEE Access 2016; 4: 688–701.

39.

Meng

Cao

, et al. Supervised feature learning network based on the improved LLE for face recognition. In: 2016 international conference on audio, language and image processing (ICALIP), Shanghai, China, 11–12 July 2016, pp.306–311. New York: IEEE.

40.

Van Tulder

de Bruijne

. Combining generative and discriminative representation learning for lung CT analysis with convolutional restricted Boltzmann machines. IEEE T Med Imaging 2016; 35(5): 1262–1272.

41.

Ruan

Xiao

, et al. Joint iterative optimization-based low-complexity adaptive hybrid beamforming for massive MU-MIMO systems. IEEE T Commun 2021; 69(3): 1707–1722.

42.

Belkin

Niyogi

. Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Proceedings of the advances in neural information processing systems, 2002, pp.585–591. https://proceedings.neurips.cc/paper/2001/hash/f106b7f99d2cb30c3db1c3cc0fde9ccb-Abstract.html