Abstract
With the number of Internet of Things devices continually increasing, the endogenous security of Internet of Things communication systems is growingly critical. Physical layer authentication is a powerful means of resisting active attacks by exploiting the unique characteristics inherent in wireless signals and physical devices. Many existing physical layer authentication schemes usually assume physical layer attributes obey certain statistical distributions that are unknown to receivers. To overcome the uncertainty, machine learning–based authentication approaches have been employed to implement threshold-free authentication. In this article, we utilize an expectation–conditional maximization algorithm to provide the physical layer attribute estimates required for the authentication phase and a logistic regression model to achieve threshold-free physical layer authentication. Moreover, a Frank–Wolfe algorithm is considered to achieve fast convergence of the logistic regression parameters and multi-attributes are adopted to increase the differentiation of transmitters. Simulation results demonstrate that the obtained attribute estimates are sufficient to provide a reliable source of data for authentication and the proposed threshold-free multi-attributes physical layer authentication scheme can effectively improve authentication accuracy, with the false alarm rate P f reduced to 0.0263% and the miss detection rate P m reduced to 0.3466%.
Keywords
Introduction
With the innovation of wireless access technologies, the complex heterogeneity of network access architectures, and the proliferation of Internet of Things (IoT) devices, the security risks of IoT communications are increasing. In an open and time-varying wireless channel environment, communications become transparent and unstable, which makes information and data transmission more susceptible to eavesdropping, tampering, and forgery. 1 Nowadays, network security has evolved to the era of endogenous security, which requires the continuous growth of self-adaptive and autonomous security capabilities within IoT communication systems. Therefore, effective responses to wireless communication security issues are urgent and critical.
Authentication of the signaling entity is one of the core technologies for securing wireless communications. Traditional key-based high-level authentication techniques have been well researched and widely used over the last few decades. However, the explosive growth in the number of access devices complicates the distribution and management of keys for high-level authentication. The high transmission overhead of high-level authentication results in high latency, making it difficult to adapt to latency-sensitive industrial IoT communication systems.2,3 Accordingly, physical layer authentication, which uses the physical layer attributes of the signal source to verify whether the transmitting entity is legitimate, is receiving increasing attention.
In this article, we propose a physical layer authentication scheme using multi-antenna technology and multi-attributes combined as authentication fingerprints to improve the reliability. In the channel estimation phase, we use semi-blind estimation of the expectation–conditional maximization (ECM) algorithm, assisted by a few pilots, to avoid pilots taking up too much bandwidth resources while taking into account the complexity of the estimation algorithm. Physical layer attributes are always time-varying and random due to the environment and equipment, making them difficult for attackers to imitate, especially imitating multiple attributes simultaneously. Thus, we consider combining received signal strength indicator (RSSI), channel impulse response (CIR), and carrier frequency offset (CFO) as authentication fingerprints to enhance authentication performance. An attacker can launch a spoofing attack by imitating an attribute, but it is challenging to imitate multiple attributes simultaneously, so combining multiple attributes as authentication fingerprints reduces the communication security risk if an attribute fails. To avoid channel statistical model dependence of physical layer authentication system, we use a logistic regression model in machine learning (ML) to design the threshold-free authentication process. Logistic regression is a parametric learning method that requires less training data than non-parametric learning methods and has a way to avoid overfitting. In the convex optimization problem constructed with the loss function of logistic regression, we use the Frank–Wolfe (FW) algorithm to find the optimal solution of the parameters with a fast convergence rate. Moreover, we envisage improving the reliability of authentication using multi-antenna technology.
The major contributions of this work can be summarized as follows:
We obtain multiple physical layer attributes, including RSSI, CIR, and CFO, from the received signals, where an ECM algorithm aided by a few pilots is designed to get the values of CIR and CFO.
Multiple attributes are combined as authentication fingerprints and multi-antenna technology is utilized to improve authentication accuracy. And different contribution of each attribute to the authentication decision is considered due to its stability.
Logistic regression model is used to achieve threshold-free physical layer authentication and the FW algorithm is adopted to find the optimal solution of the parameters with a fast convergence rate.
Related works
Some physical layer authentication schemes are based on watermarking. The transmitter uses a hash function to fuse a shared key with the signal to generate a tag, allowing the tag to arrive at the receiver with the transmit signal and be authenticated by the receiver. Based on two attack scenarios with or without user–attacker complicity, several different tags overlay schemes for physical layer authentication in the Non-Orthogonal Multiple Access (NOMA) system are presented in Xie et al. 4 Optimal certified tag embedding and optimized power distribution between signal and tag are designed in Gu et al. 5 Xie and Chen 6 have developed a slope authentication that divides the transmit signal into two equal groups based on the secret key, with labels used to mark the time index of each group. These schemes are based on shared private keys, using hash function encryption and signal processing techniques for physical layer authentication, with the attendant need for complex signal preprocessing and perfect privacy of the shared private key.
In contrast to these schemes, channel characteristics and device attributes–based physical layer authentication schemes are also studied. Physical layer authentication techniques utilize characteristics carried by the signal during transmission about the transceiver and the channel, for example, RSSI, CIR, CFO, and so on; no additional authentication information needs to be transmitted. These characteristics are device- and environment-dependent, unpredictable, and difficult to imitate. Moreover, instantaneous measurements of the device and channel by the receiver can monitor the temporal variation of the authentication information, which is equivalent to providing a natural refreshing mechanism. 7
In the schemes proposed in Hou et al.,8,9 the time-varying CFO is used as a radio frequency (RF) feature for physical layer authentication. Hypothesis testing is established to verify that the estimated value of the CFO is consistent with the predicted value for the authentication decision. The CIR is quantified by Liu and colleagues,10,11 in the magnitude dimension and the multipath delay dimension to simplify the decision rules for authentication, and then based on the output of the quantizer, hypothesis testing is used to achieve physical layer authentication. RSSI of the signals measured by multiple landmarks is used as a verification basis input into the authentication system of Xiao et al. 12 to detect spoofing attacks in a wireless network. The quantized channel gain and phase noise are combined to implement physical layer authentication with thresholds to enhance authentication performance in Zhang et al., 13 which demonstrates the combination of two attributes provides a higher accuracy than using a single attribute as the basis for authentication. However, the optimal values of quantification thresholds and authentication decision thresholds in these methods are found using exhaustive search method, which appears to be less intelligent.
Pan et al. 14 conduct extensive simulation experiments and field experiments to verify the performance of different ML algorithms for threshold-free authentication, where the channel difference matrix or the channel state matrix is used as the input of each ML algorithm. However, the authentication accuracy of these experiments is still below 90%, which means that many attackers cannot be identified. Authentication methods based on kernel machines fusing multiple attributes to achieve reliable authentication in time-varying environments are presented in Fang et al. 1 The authors of this article also propose hierarchical authentication and progressive authorization with high complexity in Fang et al. 15 This strategy is used to resist the risks associated with missed detections with one-time binary authentication. Fang et al. 16 suggest selecting those physical layer attributes that have historically performed well for authentication as the basis for current authentication. However, it is also necessary to select the appropriate ML algorithm and carefully design the authentication process to improve authentication accuracy. The scheme based on ML to achieve threshold-free physical layer authentication with multiple landmarks is proposed in Xiao et al., 12 which can reach a higher authentication accuracy and reduce overhead but requires the deployment of a large number of peripheral devices. A comparison of statistical-based and ML-based physical layer authentication methods for different channel correlation coefficients is presented in Senigagliesi et al. 17 However, it is usually difficult for the receiver to be informed of the channel correlation coefficient between the attacker and the legitimate transmitter.
Only a few of the above physical layer authentication methods mention the issue of channel estimation prior to the authentication phase and they are implemented using pilots or training sequences. The problem of frequency synchronization, channel estimation, and data detection for all active users in the uplink of an orthogonal frequency division multiple access (OFDMA) system has been intensively studied in Wang and Liew 18 , Chen et al., 19 and Pun et al., 20 Since exact maximum likelihood estimation is complex in practical scenarios, an alternative scheme operating in an iterative manner, where each user’s signal is processed using the ECM algorithm, has been proposed. This approach to channel estimation makes sense for obtaining physical layer attributes as authentication fingerprints.
Multi-attributes threshold-free physical layer authentication schemes with higher authentication accuracy combined with receiver-side channel estimation are still to be investigated. Many existing physical layer authentication schemes are based on a statistical approach. This approach requires the assumption that physical layer attributes obey a specific distribution. However, in reality, receivers usually do not know the distribution model of physical layer attributes for wireless communication. ML-based authentication approaches can achieve model independence and overcome the difficulty of modeling the uncertainty and unknown dynamics of the authentication process. 3 While the aid of peripheral devices is beneficial in improving authentication accuracy, it sacrifices system deployment costs. Imperfect estimation and time-varying characteristics of physical layer attributes in wireless communication systems are inevitable, which affect the accuracy of physical layer authentication. In addition, ML-based authentication techniques can enhance authentication performance by analyzing and fusing multiple physical layer attributes.
System model
This section describes the system model for physical layer authentication, formulas for the received signals, and the calculation formula of path loss.
In this article, we consider the physical layer authentication (PLA) in a single-input multiple-output (SIMO) system, where the universal Alice–Bob–Eve model is employed. Alice and Eve are transmitters equipped with one antenna each and Bob is the receiver equipped with

System model.
To enhance the reliability of PLA at Bob, multiple attributes, including RSSI, CIR, and CFO, are used as authentication fingerprints in a combination method. To be specific, signals from different transmitters experience different path loss and multipath fading, which results in differences in the signal received power and channel impulse response, so RSSI and CIR can be exploited as the authentication fingerprints of the signal transmit entity. The CIR from Alice to Bob is illustrated as
It is assumed that the initial secure transmission has been established between Alice and Bob before Eve arrives so that Bob can collect the physical layer attributes of the legitimate transmitter. The initial transmission can be implemented by existing authentication methods or physical measures (such as manual setup during initial communication).11,17 Bob keeps a record of Alice’s authentication information, that is, a record of physical layer attributes for the previous M moments. Assume that Eve has been listening to the channel between Alice and Bob and takes the opportunity to send spoofing signals when the channel is free to avoid packet loss due to channel congestion. Bob first extracts information about the physical layer attributes of the transmitter from the received signal and then uses them as fingerprints in the subsequent authentication process. Assume that the signal received by the
where
where
Physical layer authentication algorithm
In this section, we first discuss how to obtain the physical attributes from the received signal by using an estimation algorithm, then design a logistic regression model to make the authentication decision based on the estimation results, and give performance metrics for evaluating authentication decisions.
Physical attribute estimation
RSSI calculation
RSSI can be calculated as follows
where
CIR and CFO estimation
The ECM algorithm is employed to estimate the CIR and CFO, that is,
Although channel estimation based on the EM algorithm has many advantages, its computational complexity increases exponentially with the number of transmitted signals. Therefore, a variant of the EM algorithm, the ECM algorithm, is exploited to iteratively estimate the parameters
The ECM algorithm is an iterative optimization strategy. In this estimation task, the observed data is
E-step
The expected log-likelihood function is defined as follows
where
Assume that the signal undergoes log-normal shadow loss, time delay expansion, and Rayleigh fading as it travels from the transmitter to the receiver. The process of estimating the parameters starts with estimating
where
M-step
Once the
To solve this optimization problem,
where
Logistic regression authentication model
Different attributes have different ranges of variation and magnitudes, and in order to fit the logistic regression model, the raw data need to be normalized. There are several reasons for doing this. First, some attributes have a much more extensive range of variation than others, and then the classification results depend mainly on this feature but maybe contrary to reality. Second, to remove the effect of magnitude and finally the normalized data helps to improve the convergence speed of the gradient descent algorithm. In order to give the normalized data better discrimination without losing the original data characteristics, we make different normalizations for the attributes assumed by the different distributions. In this article, we assume that the CFO follows an incremental distribution that the CFOs from Alice and Eve float around a constant value without outliers. The RSSI can vary relatively little over short periods and that the maximum and minimum values are relatively stable, so we normalize these two attributes using the
After normalization, CFO and RSSI values are distributed in the range
where
We utilize an ML approach to implement threshold-free physical layer authentication . The model for the authentication phase is described below. The authentication decision process is described as a binary classifier in the authentication phase. The receiver marks the current message sent by the transmitter as legal or illegal based on the attributes obtained in the channel estimation phase and records them as training data for the next message authentication. Assume that Bob marks the current authentication result as
The critical issue is to choose a suitable ML model to obtain better authentication performance. Here we choose a logistic regression model suitable for the binary classification problem to achieve threshold-free authentication. Suppose that the data in the same dataset are all estimated under the same signal to noise ratio (SNR). We use the attribute values of M historical records as the training dataset for the model, which are both from Alice and Eve. It is assumed that the training data labels are consistent with the actual situation, that is, the authentication results of all M previous messages are correct. We encapsulate the normalized attributes of all antennas in
where
Logistic regression uses the minimization of cross-entropy loss as the objective function. Logistic regression under maximum likelihood does not have an analytic solution and we often use algorithms such as gradient descent to optimize locally better parameter solutions iteratively. In conjunction with the optimization task of this article, we choose the FW algorithm to solve the parameter matrix
The following functional expressions can be obtained after substituting equations (11) and (12) into the loss function
The prediction results get closer to the actual category as the loss function decreases.
FW-based parameter optimization algorithm
In this subsection, the adjustment of the weight parameters
The risk of overfitting the model can be reduced by setting regular constraints, and
where
We decompose equation (15) into two more straightforward optimization problems to solve this optimization problem. First, treat
The gradient
In order to construct a feasible descent direction for the approximate linear programming, the index of the largest element of the gradient is denoted as
According to the rules of the FW algorithm, the feasible descent direction should be the difference between the solution of equation (20) and
If
The optimal step size
Once
When both
Performance metrics
To facilitate the analysis of PLA performance, specific indicators need to be selected to quantify the errors. Typically, the mean squared error (MSE) is used to quantify the error between the estimated attribute and the actual value, and it is calculated as
where
The performance of our proposed method is measured by the probability of miss detection (MD) and false alarm (FA). MD means the number of times Bob has marked physical layer attributes from Eve as legal and accepted it. FA means the number of times Bob has marked physical layer information from Alice as illegal, thus issuing spoof warning. In the test data, signal samples from Alice are correctly classified as true positives (TP); otherwise, they are referred to as false negatives (FN). Eve’s message samples are correctly classified as true negatives (TN); otherwise, they are called false positives (FP).
The miss detection rate
The false alarm rate
Algorithm complexity analysis
Floating point operations (FLOPS) represents the computational complexity to some extent. The number of needed FLOPS for each operation of Algorithm 1 is listed in the Table 1. After neglecting the constants, lower powers, and coefficients of the highest powers, the total computational complexity for Algorithm 1 is represented by total FLOPS of
Complexity analysis.
Performance evaluation
This section presents and analyzes the simulation results of the proposed multi-attributes physical layer authentication scheme.
The modulation method is QAM on the transmitter side and the access method is OFDM. The signal carrying the imperfect characteristics of the device oscillator passes through a multipath fading channel and interference by noise. On the receiver side, the sampling rate
Simulation parameters.
For the subsequent evaluation comparison, we define the following variables.
From the transmitter to the receiver, there is path loss in the transmission of the signal. The path loss index is set to 2.1 for the indoor model with obstacle occlusion. Define
where
Figure 2(a) shows the decelerating decline of the MSE of CFO with the increase of SNR, which means that the estimation accuracy of the hardware device attributes increases as the noise in the environment decreases, especially under the case of low SNR. Moreover, the estimation error of CFO of the eight-antenna system is smaller than that of the single-antenna system and this advantage is more obvious for the case of low SNR than the high SNR. From Figure 2(b), we can see that the MSE of CIR decreases continually in a fixed rate with the increase of SNR under both single-antenna and eight-antenna systems and the MSE under eight-antenna system is always lower than that of the single-antenna system. Take the case of SNR = 0 as an example, compared with the single antenna, the multi-antenna technology reduces the estimation error of CFO and CIR by 2.14% and 98.43%, respectively. Thus, we can conclude that the multi-antenna technology can improve the received signal quality and reduce the estimation errors of physical layer attributes.

Estimation errors versus SNR with the setting of (
In order to compare the authentication accuracy using multi-attributes, we also carried out the authentication experiment using each physical layer attribute separately and their combinations. Figure 3 shows the

Miss detection rate versus SNR with the setting of (Nr = 8,

False alarm rate versus SNR with the setting of (Nr = 8,
It can be seen from Figures 3 and 4 that the
The CIR-based
When SNR < 10 dB, as the SNR increases, the authentication error rates of CFO-based single-attribute physical layer authentication decrease. When SNR > 10 dB, authentication error rates of CFO-based authentication do not significantly reduce as the signal quality improved due to the MSE of CFO does not decrease significantly. The
The physical layer authentication based on CFO and CIR attributes perform better than the authentication using individual attributes as fingerprints separately, indicating that the combination of multiple attributes can indeed improve the accuracy of physical layer authentication. When an attribute fails in the current authentication, that is, this attribute of Eve and Alice is so similar that it is impossible to distinguish a legitimate transmitter from an illegitimate transmitter by this attribute; other attributes can be utilized as authentication fingerprints to distinguish them. The
The accuracy of CFO estimation and CIR estimation depends on the signal quality. Improved channel conditions improve the estimation accuracy, which is more favorable for Bob to distinguish between Eve and Alice, that is, the authentication accuracy is much higher. The multi-antenna technology can improve the signal reception quality and indirectly reduce the estimation error. Moreover, using multi-attributes is significantly more accurate than a single attribute as a fingerprint for authentication.
The distance between Alice and Bob is fixed at 5 m and a specific value of

Miss detection rate versus

False alarm rate versus
In the experiment, fix the RMS delay extension from Alice to Bob to 25 ns and obtain different values of

Miss detection rate versus

False alarm rate versus
The experiment assumes that Eve and Alice have the same variance of the Wiener process, and the impact of the change in CFO for the authentication performance is observed by changing the value of the variance. We set

Miss detection rate versus

False alarm rate versus
Figures 11 and 12 demonstrate the effect of different receive antennas on the multi-attributes authentication performance. It can be observed that the authentication error rate continues to decrease as the number of antennas increases. For the case of small SNR, the improvement of the authentication performance by increasing the number of antennas is significant. In contrast, when the signal conditions are inherently good, increasing the number of antennas only slightly improves the authentication performance. When SNR = 0,

Miss detection rate versus Nr with the setting of (

False alarm rate versus Nr with the setting of (
Figure 13 illustrates the relationship between the authentication error rate and the number of iterations of the logistic regression algorithm. It can be seen from the figure that the authentication error rate decreases as the number of iterations increases. When the number of iterations is less than 2, the error rates are high because the weight parameters of logistic regression are initialized to 0, equivalent to no attribute involved in authentication. When the number of iterations is greater than 5, the authentication error rates can remain stable. For example, after five iterations,

Physical layer authentication error rates for different number of iterations with the setting of (SNR = 20 dB,
Figure 14 characterizes the influence of different authentication schemes on authentication performance, which quantifies the

The solid line in the figure depicts the relationship between
Conclusion
We consider the threshold-free multi-attributes physical layer authentication based on the ECM channel estimation algorithm. Once received a signal, the receiver can directly calculate the RSSI and use the ECM algorithm to estimate the CFO and CIR from the received signals. The RSSI, CFO, and CIR obtained from the signal processing are exploited as fingerprints for the physical layer authentication and input of the logistic regression model during the authentication phase. The logistic regression model achieves threshold-free authentication and model parameters are optimized by the FW algorithm to achieve lower authentication error rates. Moreover, the combination of multi-attributes as authentication fingerprints enhances authentication reliability.
Experimental results show that the proposed threshold-free multi-attributes physical layer authentication scheme can effectively improve authentication accuracy, with
Footnotes
Handling Editor: Yanjiao Chen
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was partly supported by the National Natural Science Foundation of China (Grants 61931001 and 61871023) and Beijing Natural Science Foundation (Grant 4202054).
