Sage Journals: Discover world-class research

Abstract

At microwave frequencies, the electromagnetic (EM) characterization of materials is not possible through direct measurements. In this case, the Nicholson-Ross-Weir (NRW) algorithm is the standard phenomenological analytical approach for the inverse problem. The algorithm finds the equivalent complex permittivity and permeability, starting from scattering parameters at two terminals of a waveguide structure containing the material. To be successful, NRW needs a material sample thin enough, and careful operations with complex valued logarithm. When successful, extracted parameters have round-off errors, but when not, the results are wrong. Thus, the inverse problem of the EM material characterization at microwave frequencies is an interesting benchmark for optimization algorithms and machine learning alternatives. A neural network alternative is investigated in this paper for the case of non-magnetic materials, operating at a fixed frequency. This is a necessary first step before approaching neural network (NN) models valid for whole frequency ranges and magneto-dielectric materials. A feed forward NN with one hidden layer was used, its hyperparameters being tuned by employing a multi-objective optimization procedure. Numerical results show that a NN carefully chosen can provide accurate results for a relatively large domain of complex permittivity components, being successful in areas where NRW fails. The implementation, carried out in python with Optuna module, is available for free download.

Keywords

complex permittivity complex permeability frequency domain solving full-wave electromagnetic field neural networks

Introduction

The inverse problems in engineering can be seen as strategies searching for approximations of corresponding inverse functions or operators. Function approximation is also one of the possible learning tasks for neural networks (NNs).¹ NNs excel at approximating non-linear functions and learning complex mappings from data, making them well-suited for tasks where traditional algorithms struggle. One such algorithm is the Nicholson-Ross-Weir (NRW) computational method, for the extraction of electromagnetic (EM) material properties at high frequencies proposed in.^2,3

In this paper we investigate the use of NNs to solve the inverse problem of finding frequency dependent EM material properties, starting from a known frequency dependent terminal behavior. This is useful for EM characterization of materials at microwave frequency, where direct characterization from measurements is not possible. This is useful in technological advances where new materials and composites are devised for various applications in microwave engineering, such as antennas, sensors, tags,⁴ materials for EM shielding,⁵ etc). Among the determination methods that exist, the class based on the transmission/reflection (TR) measurements are widely used. A detailed review of TR based methods can be found e.g. in^4,6 and references therein. The NRW algorithm belongs to this class of methods, and it is considered a standard approach. NRW extracts material properties assuming the material is homogeneous, isotropic and the EM field propagates according to the transversal electric TE₁₀ mode. To be successful, NRW needs that the material under test (MUT) sample be thin enough. Moreover, careful evaluation of the logarithm function applied to complex numbers must be implemented.⁷ Improvements of the standard NRW algorithm were proposed so that to diminish the effect of uncertainties in measurements.^8,9 Other modes of propagation (e.g. TM₁₁) for microwave measurements were recently investigated.¹⁰ There is thus a high interest in material characterization at microwave frequency.¹¹ Finding an alternative for NRW is a good benchmark to start with, before investigating the characterization of anisotropic materials or other modes of propagation. The idea of using NNs for EM characterization of materials was also investigated in¹², where two separate multi-layer perceptron (MLP) NNs with one hidden layer having 10 neurons each have been used, one having as output $ε_{r}^{'}$ , and the other having as output $ε_{r}^{″}$ . The parameters are searched for in domains relatively small e.g. $(ε_{r}^{'}, ε_{r}^{″}) \in [5, 9] \times [0.2, 1]$ , or $[60, 90] \times [1, 20]$ , meaning that there is some apriori knowledge about the materials. In this paper our goal is to find a NN able to accurately approximate a larger domain, $(ε_{r}^{'}, ε_{r}^{″}) \in [1, 200] \times [0, 100]$ , that corresponds to new nano-composite materials used in EM shielding.⁵

Problem formulation

The NRW computational method starts from the scattering (S) parameters for a material sample assumed homogeneous and isotropic, inserted into a rectangular waveguide through which the TE₁₀ mode of propagation occurs (Figure 1). The method is based on closed form relationships that give the complex permittivity $ε_{r} = ε_{r}^{'} - j ε_{r}^{″}$ and permeability $μ_{r} = μ_{r}^{'} - j μ_{r}^{″}$ as a function of the S parameters. The MUT is of rectangular shape, of thickness d, and it is placed in a metallic waveguide with a rectangular cross-section of width a. This is seen as a two-port device for which the frequency dependance of the S parameters is known. The complex values of the material parameters are obtained from computation.

Figure 1.

Geometrical model for the NRW procedure: rectangular waveguide, of length d, and width a, filled with a homogenous and isotropic material. Physical model: full-wave EM field, TE₁₀ mode of propagation, PEC conditions on the lateral surface except for two waveguide ports.

It is useful to recall the direct problem, which is stated as follows.

Given: the width of the waveguide a, the thickness of the material sample d, the complex material properties (they can be frequency dependent, losses are included in the imaginary parts) – the permittivity $ε_{r} (f) = ε_{r}^{'} (f) - j ε_{r}^{″} (f)$ and the permeability $μ_{r} (f) = μ_{r}^{'} (f) - j μ_{r}^{″} (f)$ where f are frequency samples in a given frequency domain: $f \in [f_{\min}, f_{m a x}]$ , find the scattering parameters $S_{11} (f), S_{21} (f)$ .

The direct problem admits the following analytical solution.⁷ Let us denote by: $ω = 2 π f$ the angular frequency; $k_{0}^{2} = ω^{2} ε_{0} μ_{0}$ the square of the propagation constant in air; $k_{t 10} = π / a$ the separation constant for the $T E_{10}$ mode; $β_{e} = \sqrt{k_{0}^{2} - h_{t 10}^{2}}$ and $β_{s} = \sqrt{k_{0}^{2} ε_{r} μ_{r} - h_{t 10}^{2}}$ the propagation constants for air and sample, respectively; $Z_{e} = \frac{ω μ_{0}}{β_{e}}$ and $Z_{s} = \frac{ω μ_{0} μ_{r}}{β_{s}}$ the complex impedances in air and in the sample; the reflection coefficient at Port 1 $Γ$ and the transmission coefficient at Port 2 P, where $Γ = \frac{Z_{s} - Z_{e}}{Z_{s} + Z_{e}}$ and $P = \exp (- j β_{s} d)$ . Then, the S parameters are:

S_{11} = \frac{Γ (1 - P^{2})}{1 - Γ^{2} P^{2}}, S_{21} = \frac{P (1 - Γ^{2})}{1 - Γ^{2} P^{2}} .

(1)

The inverse problem is formulated as follows.

Given: the width of the waveguide a, the thickness of the material sample d, the scattering parameters $S_{11} (f), S_{21} (f)$ , where f are frequency samples in a given frequency domain $f \in [f_{\min}, f_{\max}]$ , find the complex material properties $ϵ_{r} (f) = ε_{r}^{'} (f) - j ε_{r}^{″} (f)$ and $μ_{r} (f) = μ_{r}^{'} (f) - j μ_{r}^{″} (f)$ .

The NRW computation is based on the analytical formulas above, written in a way that led to the material properties. The computation is carried out independently for each frequency sample in the chosen frequency range. The algorithm is as follows.

Step 1: Compute auxiliary quantities $V_{1} = S_{21} + S_{11}$ , $V_{2} = S_{21} - S_{11}$ ; $X = \frac{1 - V_{1} V_{2}}{V_{1} - V_{2}}$ ;

Step 2: Solve second order equation $Γ^{2} - 2 Γ X + 1 = 0$ and choose the solution with $| Γ | \leq 1$ , this gives $Γ$ ;

Step 3: Compute the transmission coefficient $P = \frac{V_{1} - Γ}{1 - Γ V_{1}}$ ;

Step 4: Compute the propagation constant for the sample $β_{s} = - \frac{\log P}{j d}$ ;

Step 5: Compute material parameters as in (2):

μ_{r} = \frac{1 + Γ}{1 - Γ} \frac{β_{s}}{β_{e}}, ε_{r} = \frac{β_{s}^{2} + k_{t 1, 0}^{2}}{k_{0}^{2} μ_{r}} .

(2)

One sensitive part of the algorithm is Step 1, where numerical instabilities are expected if $V_{1}$ is close to $V_{2}$ , which correspond to $S_{11} \approx 0,$ i.e. the sample has properties close to vacuum. Another sensitive part of this algorithm is Step 4, where a phase correction might be needed if the sample is not long enough. The phase correction at a frequency sample is done with respect to the phase of the previous sample, assumed correct. Figure 2 shows this idea of phase correction illustrated for a material with $ε_{r} = 2.8 - 0.015 j$ , $μ_{r} = 1$ , $d = 10.22 mm$ , $a = 22.86 mm$ , $f \in [8.2; 12.4] GHz$ (frequency range of the X band). The imaginary part of the logarithm of the transmission coefficient computed for 100 frequency points is shown in dashed blue. Examining its continuity, and assuming that for the minimum frequency the computation is correct, incorrect computations are identified from the jumps, and the imaginary part is corrected by subtracting $2 π .$ Figure 2 left shows the imaginary part of the natural logarithm of the complex value of the transmission coefficient P for 100 frequencies in the X band. The blue dashed curve is the value before correction (obtained with the output of the log function in Matlab, applied to a complex number). By examining this curve, the discontinuity frequency is identified, and the imaginary part corrected. The final, continuous correct curve is displayed with dotted red. A possible implementation of this correction step in MATLAB is shown in Figure 2 right. However, if the first frequency point is not correct, the whole dependency, even if continuous, is not correct. That is why it is important to have a correct result for the minimum frequency. The study that follows is only for $f = 8.2 GHz$ , the minimum frequency value of the X band.

Figure 2.

If the logarithm of the first frequency in the frequency range is correct, then phase jumps for the next frequencies can be corrected if needed (MATLAB code is shown on the right).

Let us now consider the frequency fixed at its minimum value, consider a sample with $d = 1.5 mm$ . We will investigate the behavior of the NRW algorithm for a broader range of permittivity values $ε_{r}^{'} \in [1, 200]$ , $ε_{r}^{″} \in [0, 100]$ , and non-magnetic material $(μ_{r} = 1)$ . Some newly devised materials are known to be non-magnetic (e.g. in⁶ or¹³). In such cases, the extracted permeability with NRW can serve as validation of the extraction process, because it must be close to 1, and if not, it is a clear indication of NRW failure. However, if this information is known, we can use it from the very beginning in methodologies that are not based on NRW, as is the case of this paper.

The permittivity domain $(ε_{r}^{'}, ε_{r}^{″})$ is sampled in 200 × 100 equidistant points. For each point of this grid, the S parameters computed with the direct problem (1) and the obtained S parameters are used as input to NRW, whereas the material parameters are recomputed with (2). The aim is to recover the initial values of the material properties. Results are shown in Figure 3, only a part of the NRW computation is successful, up to roundoff errors, as can be seen in Figure 3(b) where only successful points are shown. In failure points the relative errors are over 200%. For other frequencies and for other values of the sample thickness, a similar NRW failure map in the complex permittivity plane is obtained, only the position of the success-failure interface is different. That is why we consider that the procedure that follows, aiming to build a NN to solve the inverse problem, can be applied for any frequency and material thickness. Our goal is to investigate if such a NN is able to escape the difficulties of the NRW. The training of the NN will be done with data generated with the direct analytical method (1), which is robust and does not have any difficulties.

Figure 3.

Testing NRW for a fixed frequency f = 8.2 GHz. (a) Materials corresponding to points in the “OK” region led to successful computation, the “Not OK” region corresponds to NRW failures; (b) relative errors that correspond to successful points are round-off errors, in the failure zone the relative errors are over 100%.

NN-MLP based approximation

MLP are a class of feedforward artificial neural networks that have been widely used for function approximation, classification, and regression tasks. A MLP consists of an input layer, one or more hidden layers, and an output layer, each layer being fully connected to the subsequent layer. The universal approximation theorem demonstrates that a MLP with a single hidden layer can approximate any continuous function to arbitrary accuracy, given enough neurons.¹⁴ That is why, in the context of material property extraction we will consider a MLP with one hidden layer, with 4 inputs (the real and imaginary parts of the scattering parameters $S_{11}$ , $S_{21}$ ) and 2 outputs (the real and imaginary parts of the permittivity, assuming that the material is nonmagnetic). The network can be defined as follows

N_{θ} : R^{4} \to R^{2}, N_{θ} (ℜ (S_{11}), ℑ (S_{11}), ℜ (S_{21}), ℑ (S_{21})) = (\hat{ε}^{'}, \hat{ε} ″),

where the input vector is constructed from the real

ℜ

and imaginary

ℑ

components of the reflection (

S_{11}

) and transmission (

S_{21}

). Coefficients

(\hat{ε}^{'}, \hat{ε} ″)

denotes the predicted permittivity components and,

θ

represents the learnable network parameters.

Network architecture 4-H-2

The network consists of one single hidden layer containing H neurons (also known as the width of the layer). The input vector $x \in R^{4}$ can be expressed as $x = [ℜ (S_{11}), ℑ (S_{11}), ℜ (S_{21}), ℑ (S_{21})]^{T}$ . The analytical expression for the forward pass yields the predicted permittivity $\hat{y} = [{\hat{ϵ}}^{^{'}}, {\hat{ϵ}}^{″}]^{T}$ as in (3):

\hat{y} = W_{2} \cdot ϕ (Norm (W_{1}^{T} x + b_{1})) + b_{2},

(3)

where

W_{1} \in R^{H \times 4}

is the input-to-hidden weight matrix,

b_{1} \in R^{H}

is the hidden bias,

W_{2} \in R^{2 \times H}

is the hidden-to-output weight matrix, and

b_{2} \in R^{2}

is the output bias.¹⁵ The total number of trainable parameters scales linearly with the hidden size according to

N_{params} = 9 H + 2

. The function

ϕ (\cdot)

denotes a nonlinear activation. We evaluated several functions including ReLU, Tanh, and GELU (Gaussian Error Linear Unit), with GELU ultimately proving optimal due to its smooth approximation of the rectifier.¹⁶ The architecture incorporates Layer Normalization (

Norm (\cdot)

) to stabilize the distribution of hidden activations. For a feature vector h, the normalization is defined as

Norm (h) = γ ⊙ \frac{h - μ (h)}{\sqrt{σ^{2} (h) + ε}} + β

where

μ

and

σ

are the mean and standard deviation computed over a chosen axis (e.g., feature or batch dimension), while

γ

and

β

are learnable affine parameters. The symbol

⊙

represents component-wise multiplication.¹⁷

Data generation and preprocessing

Training data is generated synthetically using the robust EM forward model described in Section 2, ensuring noise-free ground truth labels. The permittivity domain covers $ϵ^{'} \in [1, 200]$ and $ϵ^{″} \in [0, 100]$ . To rigorously assess generalization rather than interpolation, we employ a shifted-grid strategy for dataset splitting, as in Table 1.

Table 1.

Description of the training, validation, and test datasets, including grid size, number of samples, offsets, and their respective purposes.

Split	Grid size	Samples	Offset
Training	$200 \times 100$	20,000	$(0, 0)$
Validation	$70 \times 35$	2,450	$(0.2, 0.1)$
Test	$70 \times 35$	2,450	$(10, 5)$

Given that the permittivity values span two orders of magnitude, numerical stability is enforced via z-score standardization applied to both inputs and targets.¹⁸ The standardized value ${\tilde{x}}_{i}$ is computed using statistics from the training set ${\tilde{x}}_{i} = \frac{x_{i} - μ_{i}}{σ_{i}}$ where $μ_{i}$ and $σ_{i}$ are the mean and standard deviation of feature i, computed from the training set only. The same transformation is applied to validation and test sets. Predictions are inverse transformed before evaluation.

Training and backpropagation

The training phase is formulated as a global optimization problem where the objective is to minimize a scalar loss function, $L$ , which quantifies the difference between the model's predictions and the ground-truth targets.¹⁹ The goal is to identify the optimal set of parameters $θ^{*}$ that minimizes the expected loss over the data distribution $D$ :

θ^{*} = \arg \min_{θ} E_{(x, y) \sim D} [L (\hat{y}, y)] .

(4)

In this formulation: $θ^{*}$ represents the optimal network weights and biases. $D$ denotes the probability distribution of the input-target pairs $(x, y)$ . $\hat{y} = N_{θ} (x)$ is the output predicted by the network $N$ parameterized by $θ$ given input x. $E$ denotes the expected value over the dataset. To solve this problem, we employ an iterative gradient-based approach. The training procedure processes the dataset in mini-batches, and for each iteration, the learning process follows a systematic four-step cycle¹⁵:

Forward Propagation. A batch of input vectors is propagated through the network's layers. The input is successively transformed by the current parameters (weights and biases) and non-linear activation functions to generate the predicted output $\hat{y}$ .

Loss Evaluation. A generic loss function $L (\hat{y}, y)$ is evaluated to measure the fidelity of the reconstruction against the ground truth. This function condenses the prediction error of the entire batch into a single scalar value.

Backward Propagation. The sensitivity of the loss with respect to each trainable parameter is computed using the chain rule of calculus. This step calculates the gradient vector $\nabla_{θ} L$ , effectively propagating the error signal backward from the output layer to the input.

Parameter Update. The network parameters are updated using a specific optimization algorithm $O$ . This step functions as a computational “black box” where the exact update rule depends on the chosen solver (e.g., SGD, AdamW, or L-BFGS). Mathematically, this operation abstracts the update logic, mapping the current parameters, gradients, and internal optimizer state to a new configuration: $θ_{t + 1} \leftarrow O (θ_{t}, \nabla_{θ} L, S_{t}, η)$ . $θ_{t}$ and $θ_{t + 1}$ are the parameter vectors at the current and next time steps, respectively where $\nabla_{θ} L$ is the gradient of the loss function computed in step 3, $S_{t}$ represents the auxiliary internal state of the optimizer (e.g., momentum buffers, velocity vectors, or Hessian approximations) at step t, $η$ denotes the learning rate, a hyperparameter scaling the magnitude of the update. This cycle repeats for multiple epochs until the validation loss stabilizes, triggering early stopping to prevent overfitting.

Hyperparameters optimization

The performance of neural networks is critically dependent on the selection of hyperparameters, where suboptimal choices can lead to poor convergence or overfitting. Manual tuning of these parameters is often inefficient and may fail to uncover the optimal configuration. To systematically explore the configuration space, we employed Optuna,²⁰ an automated hyperparameter optimization framework. We defined a comprehensive search space encompassing both architectural and training parameters. Rather than optimizing a single metric, we formulated a bi-objective problem to balance predictive performance against MLP architecture complexity. Let $λ \in Λ$ represent a hyperparameter configuration vector drawn from the search space $Λ$ . The search space consists of architectural parameters (defining the network structure) and optimization parameters (governing the training dynamics). The specific ranges and distributions were chosen based on established practices for small regression networks.¹⁹

Architectural Parameters determine the capacity and nonlinearity of the neural network. Hidden Width ( $H$ ): the number of neurons in the hidden layer. We explored discrete values of ${8, 16, 32, 64}$ .¹⁴ Activation Function: the nonlinear function applied after the linear transformation. The search included ReLU, Tanh, GELU, Leaky_ReLU, and ELU to evaluate different saturation and gradient behaviors.¹⁶ Normalization: techniques to stabilize training dynamics. The options included None, LayerNorm, and BatchNorm.²¹ Dropout Rate: a regularization technique to prevent overfitting. The rate was sampled uniformly from the continuous range $[0.0, 0.4]$ .²²

Optimization Parameters control the gradient descent process and loss landscape navigation. Optimizer $O$ : the algorithm used to update network weights. The categorical choice included Adam, AdamW (for decoupled weight decay), RMSprop, SGD (with momentum), and LBFGS (for full-batch updates). Learning Rate ( $η$ ): the step size for parameter updates. Sampled from a log-uniform distribution in the range $[10^{- 4}, 5 \times 10^{- 2}]$ to cover multiple orders of magnitude.¹⁹ Batch Size ( $B$ ): the number of samples processed before a parameter update. Options ranged from stochastic mini-batches to full-batch training: ${8, 16, 32, 64, 128, 256, 512,^{″} ALL$ "}.¹⁵ Weight Decay ( $κ$ ): The coefficient for L2 regularization, sampled log-uniformly from $[10^{- 10}, 10^{- 2}]$ .²³ Gradient Clipping: a threshold to prevent exploding gradients, selecting from ${None, 0.5, 1.0, 2.0, 5.0}$ .²⁴ Learning Rate Scheduler: strategies to adapt the learning rate during training, including None, Plateau (reduce on stagnation), and Cosine annealing.²⁵ Loss Function ( $L$ ): she objective function minimized during training, choosing between MSE and Smooth L1 (Huber loss).²⁶ Input Noise ( $σ_{n o i s e}$ ): Gaussian noise added to inputs for data augmentation, sampled uniformly from $[0.0, 0.01]$ .²⁷

We define two competing objective functions: 1. Maximize Strict Accuracy ( $f_{1}$ ): The percentage of test samples satisfying the strict 1% error threshold. 2. Minimize Complexity ( $f_{2}$ ): The width of the hidden layer H, which serves as a proxy for inference latency and memory footprint. The optimization problem is formally stated as:

\begin{matrix} \underset{λ \in Λ}{maximize} & f_{1} (λ) = O K_{1} (λ) = \frac{100}{N_{t e s t}} \sum_{i = 1}^{N_{t e s t}} I (e_{i} (λ) \leq 1 %) \\ \underset{λ \in Λ}{minimize} & f_{2} (λ) = H (λ) \\ subject to & g (λ) = \max_{i \in {1, \dots, N_{t e s t}}} e_{i} (λ) \leq 10 % \end{matrix}

(5)

where: $e_{i} (λ)$ denotes the local relative error

e_{i} = \frac{| {\hat{y}}_{i} - y_{i} |}{| y_{i} |}

(6)

of the i -th test sample predicted by the model trained with configuration

λ

, y represents the true complex value and

\hat{y}

is predicted complex value,

I (\cdot)

is the indicator function, evaluating to 1 if the condition is met and 0 otherwise,

g (λ)

represents the feasibility constraint. This “hard” constraint ensures that valid solutions must not only be accurate on average but must also be robust, preventing catastrophic failures (outliers

> 10 %

) anywhere in the test domain.

The optimization was conducted using the NSGA-II,²⁸ which efficiently explores multi-objective spaces by maintaining a population of diverse solutions. To ensure practical utility, we imposed a hard feasibility constraint: any trial yielding a maximum per-sample error exceeding 10% was marked as infeasible and excluded from consideration.

Numerical results

The NN described above was implemented in python with the Optuna module and it is available at https://github.com/bumbeneciconstantinbogdan/S-Param-Inversion, together with the database used for training, validation and testing.

The experimental evaluation assesses the efficacy of the proposed neural network approach against the classical NRW algorithm. The results are derived from a comprehensive hyperparameter study comprising 10,000 trials, of which 6,104 (61.0%) satisfied the feasibility constraint of keeping maximum errors below 10%. The multi-objective optimization identified a Pareto front shown in Figure 4, consisting of four non-dominated configurations, representing the optimal trade-offs between model size ( $H$ ) and strict accuracy ( $O K_{1}$ ). These configurations, detailed below, outline the “efficient frontier” where no single objective can be improved without degrading the other.

Figure 4.

The Pareto front illustrates a steep efficiency curve that flattens rapidly. Upgrading from an ultra-compact $H = 8$ model to $H = 16$ yields a substantial 14.4% accuracy improvement ( $80.2 % \to 94.6$ . In contrast, further scaling to $H = 32$ and $H = 64$ provides diminishing returns, adding only 2.6% and 2.2% accuracy respectively.

Based on this, we distinguish three deployment tiers: Ultra-Low Resource ( $H = 8$ ) – viable for extreme constraints with only $N_{p a r a m s} =$ 74, offering basic estimation (80.2% accuracy), Balanced Efficiency ( $H = 32$ ) – the sweet spot for embedded systems, achieving 97.2% accuracy with 290 parameters, and Maximum Precision ( $H = 64$ ) – the superior choice for high-fidelity tasks, achieving 99.4% accuracy and minimizing maximum error to 2.93%.

Optimum analysis

While the $H = 64$ model offers maximum precision, the configuration with 32 hidden neurons represents a solid alternative, this configuration achieves a strict accuracy of 97.22%, meaning nearly all samples satisfy the 1% error threshold. The mean relative error is held to a low 0.314%, and the $R^{2}$ coefficient of determination is 0.99997 for both real and imaginary permittivity components (Figure 5(a)). Figure 5(b) shows the classification map quantifying the distribution of samples exceeding the 1% error threshold. We can note that using the NN, there are no failures in finding the complex permitivity in the area where NRW failed, shown in Figure 3(a). Figure 5(c) shows the 3D relief map visualizing the magnitude of relative errors across the domain. The accuracy is not as strong as the one offered by NRW, but the model is valid for the whole searching domain, the maximum local relative error being less than 6%.

Figure 5.

Accuracy assessment for the network configuration. (a) Scatter plots demonstrating the high correlation between predicted and true permittivity values; (b) Classification map quantifying the distribution of samples exceeding the 1% error threshold; (c) 3D relief map visualizing the magnitude of relative errors across the domain.

Hyperparameters final values

The systematic search done with Optuna identified a consistent “recipe” for success shared across the top-performing models. This consensus configuration provides a robust template for training regression networks in this domain, being summarized in Table 2. The search for 10,000 trials was distributed across 8 parallel processes on a 14 inch MacBook (M4 Pro, 24GB Ram). Accounting for process synchronization locks and varying convergence times, the total wall-clock time was approximately 7 h. The average execution time for a single optimization trial was however approximately 13 s, and the typical loss variation during the epochs is shown in Figure 6.

Figure 6.

Typical variation of the loss for one single trial, during hyperparameter optimization.

Table 2.

Summary of the optimal training components and hyperparameters used in the model.

Component	Optimal choice
Activation Function	GELU
Normalization	Layer Normalization
Optimizer	AdamW
Learning Rate Scheduler	Cosine Annealing
Loss Function	MSE (Mean Squared Error)
Train Batch Size	16
Dropout Rate	$\approx 1 %$
Weight Decay	$\approx 10^{- 7}$

The preference for GELU activation and Layer Normalization was particularly strong, appearing in all Pareto-optimal trials. This consistency suggests that the smooth non-linearity of the GELU function, which avoids the vanishing gradient issues often encountered with standard ReLUs, is essential for accurately approximating the continuous physical mapping of complex permittivity. Furthermore, the dominance of Layer Normalization in the sensitivity analysis indicates that stabilizing activations on a per-sample basis is far more effective than batch-dependent methods for this specific inverse problem.

To understand the drivers of model performance, we conducted a sensitivity analysis using the fANOVA framework.²⁹ The result is shown in Figure 7.

Figure 7.

The analysis reveals that layer normalization is the single most important factor for achieving high accuracy, followed by network capacity (hidden size) and dropout regularization. Learning rate and weight decay, often considered critical, show relatively low importance—suggesting the optimal ranges are broad and robust.

Conclusions

This paper proposes a methodology based on NN to replace the standard NRW computational algorithm in case of non-magnetic, isotropic materials. Our methodology uses a feed forward network to find a good approximation of the dependency between the input – the S parameters (2 complex numbers) and the output – the components of the complex permittivity. The chosen architecture for the NN was a MLP with one hidden layer, its parameter being tuned automatically. This strategy proved to be successful for S inputs for which NRW fails.

The use of the NN can be seen as an exploration strategy of the searching space for the inverse problem. This exploration could also be done with a stochastic optimization. However, the availability of the NN, valid for a large range of the parameters, saves the time needed by the stochastic optimization, which needs to be performed for any new S input parameters. The directions to be explored in the future are: the extension to frequency domain, then the case of magneto-dielectric materials, where both the complex permittivity and permeability need to be found; the use of a training dataset where points are placed in the searching domain in an adaptive manner; explore alternative NN architectures and training strategies.

Footnotes

ORCID iDs

Constantin Bogdan Bumbeneci

Ruxandra Barbulescu

Anton Duca

Gabriela Ciuprina

Ethical considerations

Ethical approval was not required.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Author contributions statement

GC and BB proposed and presented the main idea of the paper.

BB was involved in the design and implementation of the inverse problem, the NRW procedure, simulations and data collection.

GC wrote the initial version of the paper and code. BB implemented the python code with the optuna module version, did simulations, data collection and results analysis.

AD and RB supervised the design process, configuration and parameters tuning for the inverse problem, data collection and results analysis.

All authors were involved in the writing and reviewing process of the manuscript.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Haykin

. Neural networks and learning machines. 3rd ed. Prentice Hall, 2009.

Nicolson

Ross

. Measurement of the intrinsic properties of materials by time-domain techniques. IEEE Trans Instrum Meas 1970; 19: 377–382.

Weir

. Automatic measurement of complex dielectric constant and permeability at microwave frequencies. Proc IEEE 1974; 62: 33–36.

Costa

Borgese

Degiorgi

, et al. Electromagnetic characterisation of materials by using transmission/reflection (T/R) devices. Electronics (Basel) 2017; 6: 95.

Necolau

Biru

Aldrigo

, et al. A ternary multiscale nanocomposite system based on functionalized graphene oxide, carbon fibers and bio-based polybenzoxazine for electromagnetic shielding. Mater Adv 2025; 23. DOI: 10.1039/D5MA00343A.

Yang

Huang

. Determination of complex permittivity of low-loss materials from reference-plane invariant transmission/reflection measurements. IEEE Access 2019; 7: 131865–131872.

Crowgey

. Rectangular waveguide material characterization: anisotropic property extraction and measurement validation . PhD thesis, Michigan State University, 2013. https://d.lib.msu.edu/etd/2235 .

de Paula

Rezende

Barroso

. Modified Nicolson-Ross-Weir (NRW) method to retrieve the constitutive parameters of low-loss materials. In: 2011 SBMO/IEEE MTT-S international microwave and optoelectronics conference (IMOC 2011), 2011, pp.488–492.

Chen

Zhang

Wang

, et al. An improved NRW method for thin material characterization using dielectric filled waveguide and numerical compensation. IEEE Trans Instrum Meas 2022; 71: 1–9.

10.

Susek

Dukata

Pomaranska

. A formal approach to the extraction of permittivity and permeability of isotropic and anisotropic media using the TM11 mode in rectangular waveguides. Electronics (Basel) 2023; 12: 2899.

11.

Requena-Perez

Albero-Ortiz

Monzo-Cabrera

, et al. Combined use of genetic algorithms and gradient descent optimization methods for accurate inverse permittivity measurement. IEEE Trans Microwave Theory Tech 2006; 54: 615–624.

12.

Eugene

Kopyt

Yakovlev

. Determination of complex permittivity with neural networks and FDTD modeling. Microw Opt Technol Lett 2004; 40: 183–188.

13.

Ciuprina

Aldrigo

Iordănescu

, et al. Optimization-based algorithm with a robust de-embedding technique for microwave characterization of isotropic materials. In: 2025 international semiconductor conference (CAS), Sinaia, Romania, 2025, pp.83–86. DOI: 10.1109/CAS66707.2025.11222237.

14.

Hornik

Stinchcombe

White

. Multilayer feedforward networks are universal approximators. Neural Netw 1989; 2: 359–366.

15.

Goodfellow

Bengio

Courville

. Deep learning. Cambridge, MA, USA: MIT Press, 2016, pp.267–290. http://www.deeplearningbook.org .

16.

Hendrycks

Gimpel

. Gaussian Error Linear Units (GELUs). arXiv preprint arXiv:1606.08415. 2016. https://arxiv.org/abs/1606.08415 .

17.

Kiros

Hinton

. Layer normalization. arXiv preprint arXiv:1607.06450. 2016. https://arxiv.org/abs/1607.06450 .

18.

LeCun

Bottou

Orr

, et al. Efficient backprop. In: Montavon

Orr

Müller

(eds) Neural networks: tricks of the trade. 2nd ed. Berlin, Heidelberg: Springer, 2012, pp.9–48.

19.

Bergstra

Bengio

. Random search for hyper-parameter optimization. J Mach Learn Res 2012; 13: 281–305.

20.

Akiba

Sano

Yanase

, et al. Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25th {ACM} {SIGKDD} international conference on knowledge discovery and data mining, 2019.

21.

Ioffe

Szegedy

. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd international conference on machine learning (ICML), vol. 37, Lille, France, 2015, pp.448–456.

22.

Srivastava

Hinton

Krizhevsky

, et al. Dropout: A simple way to prevent neural networks from overfitting. J Mach Learn Res 2014; 15: 1929–1958.

23.

Krogh

Hertz

. A simple weight decay can improve generalization. In: Advances in neural information processing systems (NIPS), vol. 4, Denver, CO, USA, 1992, pp.950–957.

24.

Pascanu

Mikolov

Bengio

. On the difficulty of training recurrent neural networks. In: Proceedings of the 30th international conference on machine learning (ICML), vol. 28, Atlanta, GA, USA, 2013, pp.1310–1318.

25.

Loshchilov

Hutter

. SGDR: stochastic gradient descent with warm restarts. In: Proceedings of the international conference on learning representations (ICLR), Toulon, France, 2017. https://arxiv.org/abs/1608.03983 .

26.

Huber

. Robust estimation of a location parameter. The Annals of Mathematical Statistics 1964; 35: 73–101.

27.

Bishop

. Training with noise is equivalent to Tikhonov regularization. Neural Comput 1995; 7: 108–116.

28.

Deb

Pratap

Agarwal

, et al. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 2002; 6: 182–197.

29.

Hutter

Hoos

Leyton-Brown

. An efficient approach for assessing hyperparameter importance. In: Proceedings of the 31st international conference on machine learning (ICML), Vol. 32, Beijing, China, 2014, pp. 754–762. https://proceedings.mlr.press/v32/hutter14.html .

Neural network alternative to the Nicholson-Ross-Weir algorithm for complex permittivity extraction of non-magnetic materials at microwave frequencies

Abstract

Keywords

Introduction

Problem formulation

NN-MLP based approximation

Network architecture 4-H-2

Data generation and preprocessing

Training and backpropagation

Hyperparameters optimization

Numerical results

Optimum analysis

Hyperparameters final values

Conclusions

Footnotes

ORCID iDs

Ethical considerations

Consent to participate

Consent for publication

Author contributions statement

Funding

Declaration of conflicting interests

References