Sage Journals: Discover world-class research

Abstract

In this study, we introduce a machine learning-based method to predict the modeling parameters of superelastic shape memory alloys (SMAs). Our goal is to simultaneously determine and fine-tune all internal and material-related parameters, including thermodynamic ones, for a specific constitutive model using only cyclic tensile tests. We employ feedforward neural networks (FNNs) for their versatile structure. First, we sample the searched parameters within a predefined parameter space using the Latin hypercube sampling method. Then, using the constitutive model with the sampled parameters and representative strain loading, we generate the corresponding stress responses and finally train the FNN. To address the ill-posed nature of this inverse parameter identification problem and ensure a unique parameter set, during training, we use a dual network architecture with an additional FNN-based surrogate of the constitutive model. We also utilize transfer learning to accelerate the training process through knowledge transfer and handle multiple load cases simultaneously, ensuring consistent parameter identification across different scenarios. We validate the method by comparing the numerical results with the experimental data and demonstrate the importance of accurately identified parameter sets by numerical investigations on a SMA-retrofitted frame structure.

Keywords

Parameter identification feedforward neural networks shape memory alloys superelastic constitutive model ill-posed problems transfer learning

1. Introduction

Shape memory alloys (SMAs) constitute a class of two-phase polycrystalline metals that respond superelastically (pseudoelastically) during austenitic steady-state. Under dynamic loading, energy dissipation occurs due to repeated phase transformations between the austenite and martensite states, as described in detail by Altay (2021). In addition to their exceptional energy dissipation capabilities, SMAs possess distinct attributes, such as high deformation recovery, resistance to corrosion, and low fatigue. Compared to the shape memory effect, superelastic behavior is solely driven by mechanical loads without the need for additional temperature input. Consequently, superelastic SMAs have attracted substantial interest in the field of civil engineering, particularly in structural vibration control. For instance, Han et al. (2005) developed an SMA wire-based damper, which simultaneously reduces vibrations under tension, compression, and torsion loading. Furthermore, AlSaleh et al. (2012) studied the vibration behavior of SMA wires in retrofitting applications. More recently, Nekouei et al. (2020) conducted numerical studies on the dynamic behavior of hybrid laminated composite cylindrical shells reinforced with SMA fibers. A comprehensive overview of applications related to seismic protection can be found in the book of Fang and Wang (2020). The recent review of Tabrizikahou et al. (2022) includes examples of further applications.

The design of control devices incorporating SMAs requires accurate constitutive models. A notable uniaxial model was introduced by Auricchio et al. (2008). For numerical efficiency, macroscopic models are generally favored. Common forms of SMAs, such as wires, cables, or rods, are preferred in control devices due to their ability to facilitate efficient heat transfer. While these forms are prevalent, other SMA configurations exist, with examples provided by Fang (2022). For a comprehensive review of various SMA modeling techniques, the reader is referred to the work of Cisse et al. (2016) and the references therein.

Particularly during dynamic loading, SMAs may exhibit extensive thermodynamic effects. Their accurate representation in constitutive models poses a significant challenge and necessitates the precise identification of model parameters. Beyond conventional cyclic tensile tests, additional experiments, such as those for thermodynamic parameters, might be necessary (Elwaleed (2018)). Furthermore, for internal model parameters that lack a direct correlation with any physical quantity, a subsequent tuning procedure is usually required. Gradient-based optimization algorithms are often employed to solve these problems, such as in Hartloper et al. (2021). However, for high-dimensional parameter spaces and ill-posed problems, gradient-based algorithms can become computational inefficient due to ill-conditioned objective functions. To overcome these challenges, machine learning techniques, such as artificial neural networks (ANNs), harbor efficient solutions. In this context, ANNs are advantageous as they are robust to noisy data, can handle non-linear problems, offer end-to-end learning, and are easy to implement. Although weights and biases of ANNs must be optimized, the optimization problem is often easier to solve as in this case the gradients are not directly computed with respect to the desired parameters themselves. Here, the ANN learns the mapping between the desired functions and the corresponding parameters over a whole parameter space. Therefore, the gradients of the objective function are computed with respect to the weights and biases of the network. To train ANNs effectively, the data set must be of high quality with a sufficient quantity of respresentative data pairs. In addition, an appropriate set of hyperparameters has to be selected and is crucial to provide accurate predictions.

Similar techniques have been adopted for the forward modeling of SMAs. For instance, Ozbulut and Hurlebaus (2010) employed an adaptive neuro-fuzzy inference system (ANFIS) for seismic applications. Another approach, recently proposed by Shao and Andrawes (2022), utilized feedforward neural networks (FNNs), a type of ANNs, to predict the seismic drift of a reinforced concrete column retrofitted with superelastic SMA wires. Additional examples of forward modeling in other SMA applications, such as actuators, can be found in the review by Hmede et al. (2022).

For the identification of SMA parameters, optimization-based methods have been proposed, such as by Meraghni et al. (2014). Although these methods have proven to be powerful, the advent of ANNs has introduced techniques that are efficient and easier to implement. A pioneering example of this was presented by Helm (2004), who employed FNNs with stress as an input for SMAs under quasi-static loading. Furthermore, for SMA actuators, Henrickson et al. (2013) employed an FNN architecture, which required strain and temperature responses as input.

In our prior study (Lenzen and Altay (2022)), we introduced a deep FNN architecture that required only stress responses as input to identify SMA parameters under dynamic loading. We intentionally chose FNNs in our method primarily due to their adaptability across different scenarios and their simplicity in terms of hyperparameter tuning and model fitting. Compared to other network architectures, FNNs offer a less complex setup, which significantly eases the process of training. Additionally, FNNs are known for their computational efficiency, making them more practical and faster to deploy than their more complex counterparts. In this case, we implemented the methodology for a constitutive model proposed by Zhu and Zhang (2007), identifying its three thermodynamic parameters and one internal parameter. Our subsequent studies (Lenzen et al. (2023)) have revealed that as the number of parameters increases, the precision can be compromised due to the ill-posed nature of the identification process. One potential solution could be to constrain the parameter space. However, this approach would require expert knowledge about the SMA specimens, which might not always be available.

In the present study, we address this challenge by pairing the initial FNN (inverse model) with a second FNN (forward model) that has been pretrained to serve as a surrogate for the constitutive model. Supervised training of these coupled FNNs requires only stress responses. Originally adopted in other fields (Long et al. (2019); Kumar et al. (2020)), this dual network architecture enables the inverse model to discern unique solutions to identify model parameters based on stress responses of the SMAs. We train for each relevant load case separately and employ transfer learning (Bozinovski (2020)) to accommodate multiple load cases in the identification process. As a result, we simultaneously ensure consistent parameter identification across different load cases. The key contributions of this study are:

• Simultaneous identification of a vast array of parameters through tensile testing, eliminating the need for additional thermodynamic experiments and calibration.

• Resolution of the ill-posed nature of the parameter identification problem through the implementation of a dual network architecture, which is critical for robust parameter estimation in SMA models.

• By employing transfer learning, our approach handles multiple load cases simultaneously, ensuring consistent parameter identification across different scenarios.

We would like to emphasize that the proposed method does not aim to model the responses directly; instead, it provides parameters for existing material models. Our work addresses the critical gap of parameter identification of SMA models, an area that continues to be an open research issue. Existing methods, such as gradient-based approaches, often struggle due to the ill-posed nature of this problem. Accurate parameter identification is particularly essential for the successful implementation of complex material models, such as in applications involving dampers in civil engineering structures.

The structure of this paper is organized as follows. The section titled Parameter Identification Methodology introduces the identification method and details its implementation within the macroscopic models proposed by Auricchio et al. (2008) and Zhu and Zhang (2007), models that require numerous parameters. In the Results and Discussion section, we explore various influences on the training performance and accuracy of our method, including network architecture, sample size, and transfer learning. This is followed by the presentation of experimental results, serving to validate our findings. We also demonstrate the importance of accurately identified parameter sets by numerical investigations on a shear frame structure retrofitted with dampers incorporating superelastic SMA wires. We conclude with the Conclusions section, summarizing the key insights from our study and suggesting directions for future research.

2. Parameter Identification Methodology

2.1. Rate-dependent superelastic material response

SMAs are distinguished by their two crystalline phase states; austenite (A) and martensite (M). When the material temperature exceeds the austenite finish temperature A_f, SMAs exhibit superelastic behavior with austenite as the parent phase. Under high mechanical stresses, a forward phase transformation (A → M) occurs, causing SMA crystals to reorient their atomic lattice from a body-centered structure to a monoclinic structure, which is more stable at higher stresses. During unloading, a reverse phase transformation (M → A) follows, restoring the material to its initial shape with no residual deformation. Both forward and reverse phase transformations lead to pseudoelastic deformations, which are manifested as stress plateaus in the stress–strain response.

The forward transformation is exothermic and generates internal heat, which is dissipated to the environment through heat convection and conduction. On the contrary, the reverse transformation is accompanied by endothermic processes, causing a reduction in the material temperature during austenite formation. For high-rate cyclic loading, delays in heat transfer to the surrounding cause an increase in material temperature and simultaneously suppress the forward transformation as described by the Clausius–Clapeyron relation and evidenced by studies, such as those conducted by Kaup et al. (2021a).

During high strain rate, SMAs start forward transformation at higher stress levels. In this case, the martensite formation is suppressed by the increase in the material temperature. During the unloading, austenite formation is favored so that reverse transformation is initiated at higher stress levels. As a result, the hysteretic surface shows significant changes, indicating a direct effect on the energy dissipation. Predicting this dynamic response requires complex models with multiple parameters, which need to be identified and tuned accurately.

2.2. Constitutive modeling

To demonstrate the implementation of parameter identification, we use two different uniaxial thermomechanical models. The first model was originally developed by Auricchio et al. (2008) and was later extended by Kaup et al. (2019). The second model origins from Zhu and Zhang (2007) and was updated in Kaup et al. (2021b).

2.2.1. Auricchio model

The model computes from strain ɛ and strain rate $\dot{ε}$ the resulting stress σ, material temperature T, and the volume fraction of martensite ξ. In this study, we focus only on wires subjected to tensile strain and accordingly formulate the free energy as

\begin{array}{l} ψ = [(u_{A} - T η_{A}) - ξ (Δ u - T Δ η)] \\ + C [(T - T_{0}) - T \ln \frac{T}{T_{0}}] + \frac{1}{2} E {(ε - ε_{l} ξ)}^{2}, \end{array}

(1)

where u_A is the internal energy of the austenite phase; η_A is the initial entropy of the austenite phase; Δu is the internal energy change between the austenite and martensite phases; Δη is the time-dependent change in entropy between the austenite and martensite phases; C is the heat capacity; T₀ is the ambient/reference temperature; E is the Young’s modulus; ɛ_l is the maximum inelastic strain at ξ = 100%. The ambient temperature T₀ is treated as a variable parameter. This design allows the models to accurately simulate the responses of the SMAs under varying ambient temperature conditions. By incorporating T₀ as a variable, the models can adapt to different temperature scenarios without the need for recalibration or adjustment of the core model parameters. According to Kaup et al. (2019), the change in entropy is expressed as

Δ η = - \frac{\partial ψ}{\partial T} - η_{A} = \frac{C}{1 + ξ} \ln \frac{T}{T_{0}}

(2)

depending on the variables ξ and T. The Young’s modulus is formulated according to Auricchio and Sacco (1997) as a function of the martensite volume fraction to allow for a transition between the two phases as

E = \frac{E_{A} E_{M}}{E_{M} + ξ (E_{A} - E_{M})},

(3)

where E_A and E_M are constants representing the associated elastic moduli of the austenite and martensite phases, respectively. The heat equation is derived from the first law of thermodynamics as

\begin{array}{l} C \dot{T} = - \frac{\partial ψ}{\partial ξ} \dot{ξ} + T \frac{\partial^{2} ψ}{\partial T \partial ε} \dot{ε} + T \frac{\partial^{2} ψ}{\partial T \partial ξ} \dot{ξ} - γ (T - T_{0}) \\ = σ ε_{l} \dot{ξ} + Δ u \dot{ξ} - γ (T - T_{0}), \end{array}

(4)

where γ is the heat convection coefficient. The stress response is derived from the free energy as

σ = \frac{\partial ψ}{\partial ε} = E (ε - ε_{l} ξ) .

(5)

The evolution of the martensite fraction ξ is expressed through the following first-order differential equations

A \to M : \dot{ξ} = β^{A M} (1 - ξ) \frac{\dot{F}}{{(F - R_{f}^{A M})}^{2}} H^{A M},

(6a)

M \to A : \dot{ξ} = β^{M A} ξ \frac{\dot{F}}{{(F - R_{f}^{M A})}^{2}} H^{M A},

(6b)

where F is the driving force; $R_{f}^{A M / M A}$ are the thermo-coupled limit stress levels; β^AM/MA are the speed parameters controlling the rate effects on the phase transformation; and $H^{A M / M A}$ are defined as the activation factors that are expressed as

\begin{array}{l} H^{A M} & = {\begin{cases} 1 & when \dot{F} > 0 and R_{s}^{A M} < F < R_{f}^{A M}, \\ 0 & otherwise, \end{cases} \end{array}

(7a)

\begin{array}{l} H^{M A} & = {\begin{cases} 1 & when \dot{F} < 0 and R_{f}^{M A} < F < R_{s}^{M A}, \\ 0 & otherwise, \end{cases} \end{array}

(7b)

where

F = σ - T \frac{Δ η}{ε_{l}},

(8)

\begin{array}{l} R_{s / f}^{A M} = σ_{s / f}^{A M} - T_{0} \frac{Δ η}{ε_{l}} \end{array}

(9a)

\begin{array}{l} R_{s / f}^{M A} = σ_{s / f}^{M A} - T_{0} \frac{Δ η}{ε_{l}} . \end{array}

(9b)

Here, $σ_{s / f}^{A M / M A}$ represent the critical stress levels at which phase transformations start/finish at the ambient/reference temperature T₀. Moreover, the speed parameter β^AM is an internal parameter, while β^MA is formulated entropy dependent as

β^{M A} = 1 + \frac{Δ η}{η_{A}}

(10)

The required model parameters are detailed in Table 1. To ensure accurate modeling of the dynamic response, these parameters need precise identification. It is worth mentioning that the internal energy of the austenite phase, u_A, is not required to calculate the response of the material and is therefore not included here. Some material parameters (E_A, E_M, ɛ_l) can be determined from quasi-static tensile tests and are usually provided by manufacturers. On the other hand, others (

σ_{s / f}^{A M / M A}

, C, γ) may require further tests, including thermodynamic studies, to identify them. Furthermore, there are also three parameters (Δu, η_A, and β^AM) that cannot be directly derived from experiments and thus necessitate tuning. It is also important to note that multiple combinations of parameters can represent the same stress response. This can be observed in Equations (6a) and (9a), which demonstrate a direct interaction between the effects of speed parameters, critical stresses, and the difference in entropy. Consequently, the parameter identification process becomes highly ill-posed.

Table 1.

Parameters required by the Auricchio model.

Material Parameters	Young’s modulus-austenite	E _A
	Young’s modulus-martensite	E _M
	Max. inelastic strain at ξ = 1	ɛ _l
	Critical stress levels	$σ_{s / f}^{A M / M A}$
	Heat capacity	C
	Heat convection coefficient	γ
Internal Parameters	Internal energy change	Δu
	Initial entropy-austenite	η _A
	Speed parameter A → M	β ^AM

2.2.2. Zhu and Zhang model

The model is strain- and strain rate-driven and computes the resulting temperature T, martensite volume fraction ξ and stress σ responses based on the following free energy formulation:

\begin{array}{l} ψ = \frac{E}{2 ρ} {(ε - ε_{l} ξ)}^{2} + \frac{L}{T_{c r}} (T - T_{c r}) ξ \\ - C_{p} T \ln (\frac{T}{T_{0}}), \end{array}

(11)

where E is the homogenized Young’s modulus; ρ is the density; ɛ_l is the maximum strain at ξ = 100%; L is the latent heat; T_cr is the critical temperature relevant for the critical stress levels of the phase transformation; C_p is the specific heat; and T₀ is the ambient temperature. In contrast to the Auricchio model, the homogenized Young’s modulus is formulated according to Liang (1990), and Sato and Tanaka (1988) as

E = E_{A} + ξ (E_{M} - E_{A}),

(12)

where E_A and E_M are the elastic moduli of the austenite and martensite phases (cf. Equation (3)). The heat equation is derived according to Equation (4) as

\begin{array}{l} C_{p} \dot{T} = \frac{σ}{ρ} ε_{l} \dot{ξ} + \frac{E_{A} - E_{M}}{2 ρ} {(ε - ε_{l} ξ)}^{2} \dot{ξ} + L \dot{ξ} \\ - \frac{k}{V ρ} (T - T_{0}), \end{array}

(13)

where k and V are the heat-transfer coefficient and the specimen volume, respectively. According to Kaup et al. (2021b), to improve the thermal energy calculation, the model uses the latent heat L as internal variable, which is formulated as a function of the strain:

L = c_{1} e^{c_{2} ε} + L_{0},

(14)

where c₁ and c₂ are internal coefficients; and L₀ is the initial latent heat. The stress response is computed as given in Equation (5). Furthermore, contrary to the Auricchio model, the martensite evolution is formulated as defined in Liang and Rogers (1997). Accordingly, its rate form is computed by

\begin{array}{l} A \to M : & \dot{ξ} = - \frac{e^{x}}{{(1 + e^{x})}^{2}} (\dot{T} - \frac{\dot{σ}}{c_{M}}) a_{M} b, \end{array}

(15a)

\begin{array}{l} M \to A : & \dot{ξ} = - \frac{e^{x}}{{(1 + e^{x})}^{2}} (\dot{T} - \frac{\dot{σ}}{c_{A}}) a_{A} b \end{array},

(15b)

where

x = a_{A / M} (T - T_{c r} - \frac{σ}{c_{A / M}}),

(16)

with

\begin{array}{l} a_{A / M} & = \frac{\ln (10000)}{| T_{f}^{A M / M A} - T_{s}^{A M / M A} |}, \end{array}

(17a)

\begin{array}{l} T_{c r} & = \frac{T_{s}^{A M / M A} + T_{f}^{A M / M A}}{2} . \end{array}

(17b)

Here,

T_{s / f}^{A M / M A}

correspond to start and finish temperatures of the austenite and martensite phase transformations, respectively; c_A/M denotes the critical stress–temperature σ-T slope of the austenite and martensite phase transformations; and b is the initial austenite or martensite fraction. To indicate the start and finish stresses of the phase transformations, four critical stress levels are defined as

σ_{s / f}^{A M / M A} = c_{A / M} (T - T_{s / f}^{A M / M A}) .

(18)

To simultaneously solve the differential equations, time integration algorithms can be utilized, such as the fourth-order Runge–Kutta method. The model parameters are listed in Table 2. As stated before, the parameters E_A, E_M, and ɛ_l can be determined from quasi-static tensile tests. Those parameters as well as the density ρ are usually provided by the manufacturers. However, to enable accurate modeling of the dynamic response, precise calibration of the remaining thermodynamic and internal parameters is required. Similar to the Auricchio model, the parameter identification process is highly ill-posed due to multiple possible combinations of parameters. This can be observed in Equation (13), where for instance the heat capacity directly interacts with the latent heat.

Table 2.

Parameters required by the Zhu and Zhang model.

Material parameters	Young’s modulus-austenite	E _A
	Young’s modulus-martensite	E _M
	Max. inelastic strain at ξ = 1	ɛ _l
	Density	ρ
	Volume	V
	Start/finish temperatures	$T_{s / f}^{A M / M A}$
	Critical σ-T slope-austenite	c _A
	Critical σ-T slope-martensite	c _M
	Specific heat	C _p
	Heat-transfer coefficient	k
Internal parameters	Latent heat coefficients	c₁, c₂
Internal parameters	Initial latent heat	L ₀

2.3. Parameter identification

The constitutive model can be expressed as $M ((ε, \dot{ε}), p)$ , where the vectors ɛ and $\dot{ε}$ represent the time histories of the strain and the strain rate, respectively. The vector p contains all the parameters required by the model. Our goal is to develop an FNN-based approach to identify these parameters using only cyclic tensile tests. As depicted in Figure 1, the preparation of the corresponding FNN requires two steps: (a) data generation, and (b) training phase.

Figure 1.

Steps involved in preparation of the inverse model. (a) Data generation: Parameter samples are fed into the constitutive model to produce the corresponding stress outputs. (b) Training: The inverse model $I$ is trained to estimate the parameters from stress responses. A pretrained forward model $F$ is coupled to reconstruct the stress responses from the estimated parameters. The estimated stresses are then compared with the target stresses.

The data set $D$ for FNN training is generated using the constitutive model for different combinations of parameters and representative load cases. In total, I combinations of searched parameters p_i are sampled from a predefined parameter space using the Latin hypercube sampling (LHS) (McKay et al. (2000)). The LHS is chosen as it strategically balances the exploration of the entire parameter space by ensuring a representative and well-distributed set of samples across intervals. However, other sampling strategies can be chosen, such as Sobol sequence (Sobol (1967)).

To cover all characteristics of the investigated SMA wires, such as strain rate effects, a variety of representative load cases has to be considered. To optimize the choice of load cases, active learning strategies can be applied, as proposed by Milicevic and Altay (2023). These strategies can be quite useful to reduce the experimental and computational costs in cases where a large variety of loads is required or the parameter space spans a wide area. However, in this study, as the training data is not generated experimentally and can be efficiently obtained by the material model, active learning strategies are not required. Each set of sampled parameters p_i along with the strain and strain rate time histories of the J instants are applied to the constitutive model to generate the corresponding stress response vectors σ_i. As a result, the training data set $D = {p_{i}, σ_{i}}$ is obtained. To improve training performance, min–max normalization is applied to stress responses and parameter samples. For greater stability against measurement errors, a Gaussian noise is added to the stress responses, where the mean and variance are chosen based on the magnitude of the noise in the experimental test data.

Due to the ill-posed nature of the problem, training an FNN to directly map the stress response to parameters can yield poor results, as the FNN might inadvertently penalize valid parameter combinations. To avoid this, a dual network architecture as illustrated in Figure 1(b) is employed, where the training phase is divided into two steps. First, a forward model $F$ parameterized by weights and biases ω is trained, which maps the parameters to stress responses $(F_{ω} : p_{i} \to σ_{i})$ . To train the forward model, a loss function is defined by the mean squared error (MSE):

L_{F} = \frac{1}{I J} \sum_{i = 1}^{I} \sum_{j = 1}^{J} {(σ_{i j} - F_{ω} [p_{i}])}^{2},

(19)

where σ_ij denotes the target stress response for the ith parameter combination at the jth time instant. This is compared with the predicted stress response

F_{ω} [p_{i}]

. Particularly for data sets without outliers, which is generally the case since it can be controlled by the user, the MSE is chosen as the default configuration. However, apart from the MSE, other loss functions can be used as well, such as the mean absolute error.

Subsequently, we introduce the inverse model $I$ , which is parameterized by weights and biases θ. This model is designed to map stress responses to their respective parameters, such that $I_{θ} : σ_{i} \to p_{i}$ . During its training phase, the pretrained forward model is leveraged to reconstruct the stress response, utilizing the estimated parameters derived from the inverse model (Long et al. (2019); Kumar et al. (2020)). In this framework, the forward model operates as a surrogate of the constitutive model. Its true advantage is realized within the backpropagation algorithm, as it can seamlessly track gradients via automatic differentiation. Consequently, the inverse model is trained using the MSE:

L_{I} = \frac{1}{I J} \sum_{i = 1}^{I} \sum_{j = 1}^{J} {(σ_{i j} - F_{ω} [I_{θ} [σ_{i j}]])}^{2},

(20)

where

F_{ω} [I_{θ} [σ_{i j}]]

is the reconstructed stress response. During the training of the inverse model, the weights and biases ω of the forward model are frozen. In this way, only deviations from the target stress response are penalized during training. Simultaneously, the dual network architecture minimizes the reliance on expert knowledge for both the constitutive model and the choice of parameter space. As the networks are trained to reconstruct the target stresses, this training procedure ensures that the constitutive model, regardless of the physical interpretations of the identified parameters, accurately mimics the material responses.

It is important to note that the forward model is limited to producing the stress responses corresponding to the strain and strain rate time histories utilized for the data generation. Therefore, to accommodate multiple load cases, distinct forward and inverse models are trained for each specific case. To accelerate the training process, transfer learning (Bozinovski (2020)) is employed. Once the initial forward and inverse models are trained, this technique is applied to train the models for subsequent load cases. More importantly, this approach ensures consistent parameter identification across different loading scenarios, as the weights and biases originate from a consistent initial region within the predefined parameter space.

Several additional key features are crucial during training. We incorporate the batch normalization algorithm (Ioffe and Szegedy (2015)) along with the Adam optimizer (Kingma and Ba (2014)). In the hidden layers, the ReLU activation function is paired with the He initializer, in line with the suggestions by He et al. (2015). The sigmoid activation function is specifically employed in the output layer of the inverse model to ensure that its outputs lie between 0 and 1 and initialized by the Glorot initializer (Glorot and Bengio (2010)). This constraint is vital to avoid exceeding the boundaries of the parameter space. Furthermore, a linear activation function is implemented in the final layer of the forward model.

3. Results and Discussion

3.1. Auricchio model

We aim to identify the parameters of the constitutive model for a Nitinol wire sample with alloy composition Ni-55.8% and Ti-43.55%, manufactured by SAES Getters S.p.A. The known parameters are as follows:

• Young’s modulus-austenite: E_A = 32,350 MPa

• Young’s modulus-martensite: E_M = 18,550 MPa

• Max. inelastic strain at ξ = 1: ɛ_l = 3.3%

As outlined in the Methodology section, nine parameters need to be identified using an appropriate FNN (inverse model). These include the four critical stress levels $σ_{s / f}^{A M / M A}$ , heat capacity C, heat convection coefficient γ, internal energy difference Δu, initial entropy of the austenite phase η_A, and speed parameter β^AM. In the following subsections, we describe the generation of data required for FNN training, detail the hyperparameters, and discuss the training results. Subsequently, we validated the performance of the FNN through experiments. Finally, in the concluding subsection, we present an application example utilizing the identified parameters.

3.1.1. Data generation

From the parameter space detailed in Table 3, we sample I = 40 ⋅ 10³ parameter combinations using LHS. The effect of I will be discussed in the subsequent Training subsection. This parameter space is defined drawing from experience and values obtained from previous SMA wire studies (Kaup et al. (2019)). Notably, we deliberately choose a broad parameter space to evaluate the accuracy of our proposed method comprehensively. Particularly the critical stress levels are defined within overlapping boundaries, such that the following rules might be disrespected

σ_{s}^{A M} < σ < σ_{f}^{A M},

(21a)

σ_{s}^{M A} < σ < σ_{f}^{A M},

(21b)

σ_{f}^{M A} < σ < σ_{s}^{M A},

(21c)

σ_{f}^{M A} < σ < σ_{s}^{A M}

(21d)

Table 3.

Sample space of the searched parameters of the Auricchio model.

Parameter		min	Max	Unit
Critical stress levels	$σ_{s}^{A M}$	85	300	MPa
	$σ_{f}^{A M}$	300	450	MPa
	$σ_{s}^{M A}$	200	350	MPa
	$σ_{f}^{M A}$	70	250	MPa
Heat capacity	C	0.1	10.0	MPa/K
Heat convection coefficient	γ	0	1	-
Internal energy change	Δu	500	3500	MPa
Initial entropy-austenite	η _A	0.0	0.01	MPa/K
Speed parameter A → M	β ^AM	1	10	-

However, as the FNNs will be trained to reconstruct the target stresses, our parameter identification method can deal with unreasonably chosen samplings. More precisely, the FNNs will be able to identify a parameter set that can represent the desired stress response whether this set respects these rules or not. The only requirement is that the desired stress response is covered by the parameter space during training. In fact, our parameter identification method provides the user flexibility in defining the parameter space, such that a suitable parameter set can be found without requiring expert knowledge about the material model. In general, the parameter space could also be chosen such that the rules are respected. For this purpose, the user must inspect the experiments and also have knowledge of the material model.

Utilizing the constitutive model, we compute the stress responses for these combinations of sampled parameters, assuming an ambient temperature of T₀ = 296 K (approximately 23°C). We employ strain and strain rate time histories with J = 200 time instants and a time step size of Δt = 0.001 s. Given that these computed 200 stress responses serve as inputs for the inverse model and outputs for the forward model, the value of J dictates the number of neurons in the respective layers. In Training subsection, we provide further information on the choice of the remaining network architectures.

For the training process, we consider three distinct dynamic load cases, referred to as LC1–LC3 (cf. Table 4). To cover the thermodynamic effects of the SMA wire, the load cases have to be generated for the maximum desired strain amplitude. To underline the impact of these selected parameter combinations on stress responses, Figure 2 presents the wide range of stress response variations within the data set associated with LC1.

Table 4.

Load cases used for testing the Auricchio model. Load cases LC1–LC3 were also used for training.

	LC1	LC2	LC3	LC4	LC5	LC6
$ε [%]$	4	4	4	2	2	2
$\dot{ε} [% / s]$	12.57	25.13	50.27	6.28	12.57	25.13

Figure 2.

Stress response range covered by 10 of sampled parameter combinations for LC1. Each response time history has J = 200 time instants.

Of the total generated data, 90% is used for training purposes, reserving the remaining 10% for validation. After training, we test the models using $\tilde{I} = 1000$ additional parameter combinations that are sampled from the parameter space using again LHS. Analogous to the training data, stress responses for these parameter combinations are derived using the constitutive model. As a result, we use a large and diverse data set, directly generated from the material model, providing us unlimited data for validation and testing, thus eliminating the need for cross-validation.

To enhance robustness and provide a buffer against potential measurement inaccuracies, we incorporate a Gaussian noise with a zero mean and a variance of 1 ⋅ 10⁻³ to all the computed stress responses.

3.1.2. Training

The hyperparameters for both the inverse and the forward models are fine-tuned through testing various configurations. For all studies, the mini-batch gradient descent method is employed for training with a batch size of n_b = 1000. As the training data set is generated synthetically by the constitutive model and enriched with Gaussian noise, the data set does not contain any outliers such that the MSE is chosen for the training process.

3.1.2.1. Effects of network architecture

To explore the impact of network architecture, we vary the number of hidden layers and adjust the number of neurons in each layer while maintaining a consistent learning rate of η = 1.0 ⋅ 10⁻³. We utilize a sample size of I = 40 ⋅ 10³ parameter combinations for this study and train the models over n_e = 2000 epochs. For this training process, the load case labeled LC1, characterized by ɛ = 4% and

\dot{ε} = 12.57 % / s

, is selected. Upon completion of training, the MSE values for both forward and inverse models are calculated based on Equations (19) and (20), and the results are detailed in Table 5. Our investigations show that the optimal architecture for both models is a 6-layered FNN with 100 neurons in each layer. From the trend in MSE values, we can predict that a further increase in the number of neurons will not lead to a significant improvement in the accuracy of the models.

Table 5.

Training performance for various FNN architectures, represented by MSE · 10⁵ values. Both models achieve highest accuracy using six layers with 100 neurons each.

	Neurons
Models	Layers	20	50	100
$F$	2	5.40	3.54	2.75
	4	3.17	3.15	1.98
	6	2.65	2.40	1.79
$I$	2	2.70	2.35	1.71
	4	2.44	1.62	1.04
	6	2.31	1.38	0.98

3.1.2.2. Effects of learning rate

Using the optimal FNN architecture determined above, we assess the influence of different learning rates. Initial weights and biases remain consistent across all configurations. For this evaluation, we sample I = 40 ⋅ 10³ parameter combinations and utilize n_e = 2000 epochs with load case LC1. The MSE outcomes for both forward and inverse models are studied with the learning rates η = 0.1 ⋅ 10⁻³ to η = 1 ⋅ 10⁻³. The most effective learning rate is identified as η = 1 ⋅ 10⁻³ with a corresponding MSE value of 1.79 ⋅ 10⁻⁵ for the forward model and 0.98 ⋅ 10⁻⁵ for the inverse model.

3.1.2.3. Effects of number of sampled parameter combinations

We evaluate the MSE values of both models across various numbers of sampled parameter combinations, ranging from I = 10 to I = 100 ⋅ 10³. This is depicted in Figure 3. The network architecture and learning rate selections are based on previous studies. For this analysis, we use n_e = 2000 epochs with the load case LC1. The data reveals that by the time I reaches 40 ⋅ 10³, the MSE values for both models converge to a notably low value.

Figure 3.

MSE values corresponding to varying numbers of training samples. Both models converge to a low MSE value at I = 40 ⋅ 10³.

3.1.2.4. Effects of number of epochs

Figure 4 illustrates the impact of the number of epochs on the training loss for both the forward and the inverse models. The best architecture and learning rate, determined from previous studies, are used along with the load case LC1. For this analysis, I = 40 ⋅ 10³ sampled parameter combinations are employed. Observing the results, it becomes evident that both models achieve convergence by n_e = 2000 epochs.

Figure 4.

Effect of number of epochs on the training of the forward $F$ and inverse $I$ models. Both models reach convergence at n_e = 2000 epochs.

3.1.2.5. Effects of transfer learning

Training the models from scratch without shared weights and biases might result in parameter sets with notable disparities across different load cases. To derive a consistent parameter set across all load cases, we employ transfer learning. Additionally, transfer learning is also pursued to reduce the overall training effort.

For training, we select the optimal architecture comprising six layers with 100 neurons each. The learning rate is set at η = 1.0 ⋅ 10⁻³, and n_e = 2000 epochs are used. Initially, the forward model is trained using load case LC1 for I = 40 ⋅ 10³ parameter combinations. Subsequently, the weights and biases of this forward model are retained and employed for the training of the inverse model. This training methodology is then extended to load cases LC2 and LC3, leveraging the weights and biases from the LC1-trained models without reinitialization. Remarkably, this approach reduces the necessary epochs for LC2 and LC3 to a mere n_e = 20, representing only 1.0% of the training effort compared to LC1.

Figure 5 proves the precision of the models for LC1 by juxtaposing the target stresses (obtained via the constitutive model) with the estimated values. In the first scenario, denoted as $F$ , the forward model computes the estimated stresses using the sampled $\tilde{I} = 1000$ test parameter combinations. In the subsequent scenario, labeled $I + F$ , to evaluate the inverse model, the associated forward model estimates stresses using parameters inferred by the inverse model. As input for the inverse model, stress responses are generated by the constitutive model with the sampled $\tilde{I} = 1000$ test parameter combinations. In both scenarios, the models present commendable accuracy, closely aligning with the target stress values.

Figure 5.

Comparison of stress estimation accuracy between the forward model $(F)$ and the combined inverse-forward model $(I + F)$ for load case LC1.

Figure 6 compares the target stresses, determined by the constitutive model, with the estimated stresses for the load case LC3. In the first case, $F$ , the forward model estimates the stresses using the sampled $\tilde{I} = 1000$ test parameter combinations. Meanwhile, in the second case, $I + F$ , the estimated stresses are determined by the coupled forward model using the estimated parameters. For comparison, we also consider models trained from scratch without shared weights and biases labeled as $\tilde{F}$ and $\tilde{I} + \tilde{F}$ . For the training of LC3, we utilize n_e = 20 epochs. Notably, models trained without transfer learning exhibit substantially lower accuracy compared to those trained via transfer learning.

Figure 6.

Comparison of transfer learning effects. Models $F$ (forward) and $I + F$ (coupled inverse-forward) trained with transfer learning are juxtaposed against models $\tilde{F}$ and $\tilde{I} + \tilde{F}$ , which are trained without transfer learning.

To evaluate the accuracy in more detail, the coefficient of determination R² is used and the results are reported in Table 6. This statistical metric indicates the accuracy of each model in approximating the target data. Its value ranges from 0 to 1. The closer the value gets to 1, the more precise the estimation of the model. It is defined as

R^{2} = 1 - \frac{\sum_{j = 1}^{200} {(σ_{j} - {\hat{σ}}_{j})}^{2}}{\sum_{j = 1}^{200} {(σ_{j} - {\bar{σ}}_{j})}^{2}},

(22)

where σ_j is the target value,

{\hat{σ}}_{j}

is the estimated value, and

{\bar{σ}}_{j}

is the mean of the target data. In Table 6, also the corresponding MSE values (Equations (19) and (20)) are reported to highlight the effect of transfer learning on model training performance. Models trained through transfer learning are found to be significantly more accurate than those trained from scratch. Additionally, during training, they exhibit rapid convergence. In contrast, the models trained from scratch do not converge within the chosen number of epochs, as evidenced by the high MSE values. Moreover, when comparing the coupled inverse–forward models to the forward models alone, the former exhibits superior accuracy. This is attributed to the reduced complexity of the problem, as the mappings become unique (from stress to stress) once the pretrained forward model is coupled. Consequently, training the coupled models also converges faster.

Table 6.

Model accuracy (R² [%]) and training performance (MSE ⋅ 10⁵) for each load case. The table compares results for the forward and inverse models trained with transfer learning ( $F$ and $I + F$ ) against those trained without transfer learning ( $\tilde{F}$ and $\tilde{I} + \tilde{F}$ ).

	LC1		LC2		LC3
Model	R²	MSE	R²	MSE	R²	MSE
$F$	99.89	1.79	99.90	3.17	99.93	5.49
$I + F$	99.95	0.98	99.96	0.91	99.95	0.96
$\tilde{F}$			93.44	101.88	93.01	116.07
$\tilde{I} + \tilde{F}$			97.16	58.75	96.84	75.72

3.1.3. Experimental validation

To validate the proposed methodology, we conducted stress–strain experiments and compared the outcomes with our estimations. The test setup, as illustrated in Figure 7, comprised a shaking table that can generate intricate uniaxial motion signals relative to a fixed test rig. An SMA wire specimen, with dimensions l = 150 mm in length and D = 0.2 mm in diameter, was examined. We mounted a load cell on the test rig to capture the stress response. To ensure that the load is applied centrally to the load cell and to prevent any buckling, a pulley system was utilized to redirect the wires. Weights were used to apply a pre-stress of σ₀ = 134.9 MPa, facilitating the desired phase transformation at a strain level that matched the limits of the test setup. Additionally, the shaking table motion was continuously monitored using a laser sensor.

Figure 7.

Test setup used for the validation study: shaking table (a), test rig (b), SMA wire specimen (c), load cell (d), pulley (e), weights (f), motion direction (g), and laser sensor (h).

Throughout the experiments, the ambient temperature was maintained at approximately T₀ = 296 K (approximately 23°C), which is consistent with the conditions assumed during data generation. For the identification of the parameters, we first executed experiments for the same load cases, namely LC1–LC3, that were employed during the model training. Moreover, the number of time instants and the time step sizes selected for the experiments corresponded to those used in the training of the models.

To identify the parameters detailed in Table 3, we utilized the inverse models trained specifically for each load case. These stress responses were entered into the inverse models, leading to the determination of specific parameters. The results of this identification process are presented in Table 7. Notably, due to the application of transfer learning, there is minimal deviation in the parameters identified across different load cases. Consequently, we derive the final parameter values, denoted as p_pro, by calculating the mean across the examined load cases.

Table 7.

Parameters of the Auricchio model derived from measured cyclic stress responses of the SMA wire specimen using the proposed method.

Parameter	LC1	LC2	LC3	Final result p_pro	Unit
${\hat{σ}}_{s}^{A M}$	168.07	207.48	190.69	188.75	MPa
${\hat{σ}}_{f}^{A M}$	307.67	308.44	329.89	315.34	MPa
${\hat{σ}}_{s}^{M A}$	337.44	346.77	293.00	325.74	MPa
${\hat{σ}}_{f}^{M A}$	199.92	199.94	199.32	199.73	MPa
$\hat{C}$	4.38	3.57	3.24	3.73	MPa/K
$\hat{γ}$	0.37	0.33	0.38	0.36	-
$Δ \hat{u}$	3374.25	3460.75	3412.94	3415.98	MPa
${\hat{η}}_{A}$	0.0036	0.0032	0.0026	0.0031	MPa/K
${\hat{β}}^{A M}$	9.44	9.76	8.97	9.39	-

Furthermore, to validate the accuracy of the identified parameter set, we executed experiments for load cases different from model training, referred to as LC4–LC6 (cf. Table 4). For validation purposes, Figure 8 juxtaposes the measured stress–strain responses with those predicted by the constitutive model using the identified parameters for the load cases LC3 and LC6. These predictions closely mirror the dynamic behavior observed in the experimental results.

Figure 8.

Comparison of the experimental results (EXP) to the predictions (COM) generated by the Auricchio model using parameter set p_pro identified by the proposed method (Table 7).

The accuracy of the stress responses predicted by the constitutive model is quantified for the load cases LC1–LC3 and LC4–LC6 by the coefficient of determination as reported in Table 8. Moreover, it can be observed that the rules defined in (Equations (21b) and (21d)) are not respected, cf. Table 7. However, the high accuracy of the predicted stress demonstrates that the method is still able to identify suitable parameter sets across the chosen parameter space.

Table 8.

Auricchio model accuracy (R² [%]) computed using parameters identified by the proposed method p_pro and manually chosen p_manual for the load cases LC1–LC6.

	LC1	LC2	LC3	LC4	LC5	LC6
Using p_pro	99.26	99.62	98.12	94.40	93.65	93.64
Using p_manual	65.87	55.84	43.37	67.60	86.22	87.31

Moreover, to highlight the effectiveness of the methodology, Figure 9 juxtaposes the measured stress–strain responses for the load cases LC3 and LC6 with those predicted by the constitutive model using the identified parameters, where the internal energy change between the austenite and martensite phases is manually determined as Δu = 650.18 MPa. This parameter set is denoted as p_manual. As a result, the constitutive model accuracy reduces to the R² values reported in Table 8 for the load cases LC1–LC3 and LC4–LC6. Especially for LC1–LC3, due to imprecise computations of thermodynamic effects, the accuracy of the material model decreases significantly. It is important to note that the accuracy is inherently bound by the fidelity of the constitutive model itself. Certain phenomena, such as residual deformations, which are not accounted for in the constitutive model, cannot be captured by the parameter identification process.

Figure 9.

Comparison of the experimental results (EXP) to the predictions (COM) generated by the Auricchio model using a manually identified parameter set p_manual (Δu = 650.18 MPa, other parameters according to Table 7).

3.2. Zhu and Zhang model

For the Zhu and Zhang model, our objective is to identify the parameters for another Nitinol wire sample with alloy composition Ni-55.90% and Ti-43.95%, manufactured by Baoji Hanz Metal Material Co. The known parameters are as follows:

• Young’s modulus-austenite: E_A = 29,000 MPa

• Young’s modulus-martensite: E_M = 14,100 MPa

• Max. inelastic strain at ξ = 1: ɛ_l = 4.0%

• Density: ρ = 6500 kg/m³

In this case, 11 parameters need to be identified including the four start and finish temperatures $T_{s / f}^{A M / M A}$ , the critical σ-T slopes of the austenite and martensite phases c_A and c_M, the specific heat C_p, the heat-transfer coefficient k, the latent heat coefficients c₁ and c₂, and the initial latent heat L₀.

3.2.1. Data generation

As for the Auricchio model, we sample I = 40 ⋅ 10³ parameter combinations using the LHS. The parameter space is reported in Table 9, where the upper and lower boundaries are chosen based on previous SMA wire studies (Kaup et al. (2021b)). To demonstrate the adaptability and robustness of our method across diverse and non-standard parameter spaces, we intentionally expand the boundaries of the parameter space beyond typical ranges. While this design choice demonstrates flexibility, it may result in physically unconventional parameter sets. However, as the dual network architecture will be trained to reconstruct the target stresses, our method will be able to identify a parameter set that accurately mimics the material responses, provided the parameter space covers the desired stress response during training. Employing the Zhu and Zhang constitutive model, and assuming an ambient temperature of T₀ = 296 K (approximately 23°C), we compute the stress responses for these parameter combinations. Again, we use strain and strain rate time histories with J = 200 time instants and a time step size of Δt = 0.001 s. For the training process, we consider three dynamic load cases, denoted as LC7–LC9 (cf. Table 10). To increase the robustness of the model, we add a Gaussian noise with a zero mean and a variance of 1 ⋅ 10⁻³ to the computed stress responses. For training, we split the generated data into training and validation data sets, where 90% is used for training and 10% for validation. Additionally, we test all models using

\tilde{I} = 1000

unseen parameter combinations sampled from the parameter space via LHS resulting in a large and diverse data set, which is directly generated from the material model, providing us unlimited data for validation and testing.

Table 9.

Sample space of the searched parameters of the Zhu and Zhang model.

Parameter		min	Max	Unit
Start/finish temperatures	$T_{s}^{A M}$	230	270	K
	$T_{f}^{A M}$	210	250	K
	$T_{s}^{M A}$	250	290	K
	$T_{f}^{M A}$	250	300	K
Critical σ-T slope-austenite	c _A	4	12	MPa/K
Critical σ-T slope-martensite	c _M	4	12	MPa/K
Specific heat	C _p	800	20,000	J/(kgK)
Heat-transfer coefficient	k	0.001	0.1	W/K
Latent heat coefficients	c ₁	500	2000	-
	c ₂	20	200	-
Initial latent heat	L ₀	400	10,000	J/kg

Table 10.

Load cases used for testing the Zhu and Zhang model. Load cases LC7–LC9 were also used for training.

	LC7	LC8	LC9	LC10	LC11	LC12	LC13	LC14	LC15	LC16	LC17	LC18
$ε [%]$	6	6	6	4	4	4	6	6	6	4	4	4
$\dot{ε} [% / s]$	18.85	37.70	75.40	12.57	25.13	50.27	18.85	37.70	56.55	12.57	25.13	37.70
$T_{0} [K]$	296	296	296	296	296	296	289	289	289	289	289	289

3.2.2.Training

For training, we choose the same architecture as optimized for the Auricchio model consisting of six layers with 100 neurons each. The learning rate is set at η = 1.0 ⋅ 10⁻³. Moreover, the networks are trained for n_e = 2000 epochs using the mini-batch gradient descent method with a batch size of n_b = 1000. Beginning with LC7, the training of the forward model is conducted for I = 40 ⋅ 10³ parameter combinations. In the next step, the weights and biases of this forward model are saved and utilized for the training of the inverse model. After training the models for LC7, the forward and inverse models corresponding to LC8 and LC9 are trained for n_e = 20 epochs, exploiting the weights and biases from the LC7-trained models.

To determine the accuracy, the trained networks are tested using

\tilde{I} = 1000

test parameter combinations. In Table 11, the R² and MSE values are reported for the forward models

F

, and the coupled forward and inverse models

I + F

. For comparison, also the models trained from scratch without shared weights and biases (

\tilde{F}

and

\tilde{I} + \tilde{F}

) are considered. As in the prior study, the models trained through transfer learning are significantly more accurate than those trained from scratch.

Table 11.

	LC7		LC8		LC9
Model	R²	MSE	R²	MSE	R²	MSE
$F$	97.70	4.53	99.00	27.17	98.65	29.00
$I + F$	99.94	3.35	99.90	5.62	99.78	6.36
$\tilde{F}$			91.75	369.49	89.54	432.01
$\tilde{I} + \tilde{F}$			96.55	113.61	96.31	133.82

3.2.3. Experimental validation

For validation, we conduct stress–strain experiments with the test setup shown in Figure 7 and compare the measurement data with our predictions. In this study, the SMA wire was tested with a length of l = 150 mm and a diameter of D = 0.2 mm. Besides, a pre-stress of σ₀ = 139.4 MPa was applied, enabling the desired phase transformation at a strain level that matched the limits of the test setup. Throughout the experiments, the ambient temperature was constant.

For the identification of parameters, we first execute experiments for the same load cases that were used for model training, namely LC7–LC9, and insert them into the inverse model. Higher strains have been avoided to protect the SMA wire from tearing. The experiments were conducted at approximately T₀ = 296 K (approximately 23°C), which is coherent with the temperature assumed for data generation. The results are presented in Table 12, where the final parameter set p_pro is obtained by calculating the mean across all considered load cases. Next, to validate the accuracy of the identified parameter set, we also conduct experiments according to LC10–LC12 at approximately T₀ = 296 K (approximately 23°C) as well as LC13–LC18 at approximately T₀ = 289 K (approximately 16°C) for this wire composition, cf. Table 10.

Table 12.

Parameters of the Zhu and Zhang model derived from measured cyclic stress responses of the SMA wire specimen using the proposed method.

Parameter	LC7	LC8	LC9	Final result p_pro	Unit
${\hat{T}}_{s}^{A M}$	256.21	256.71	253.49	255.47	K
${\hat{T}}_{f}^{A M}$	234.77	233.49	232.76	233.67	K
${\hat{T}}_{s}^{M A}$	251.72	251.68	251.67	251.69	K
${\hat{T}}_{f}^{M A}$	297.05	294.50	297.00	296.18	K
${\hat{c}}_{A}$	7.56	7.48	7.27	7.44	MPa/K
${\hat{c}}_{M}$	7.26	6.93	9.27	7.82	MPa/K
${\hat{C}}_{p}$	6379.46	7359.53	11,762.64	8620.55	J/(kgK)
$\hat{k}$	0.052	0.038	0.013	0.034	W/K
${\hat{c}}_{1}$	999.54	1054.95	1408.92	1154.47	-
${\hat{c}}_{2}$	101.34	86.56	62.86	83.58	-
${\hat{L}}_{0}$	9304.64	8664.48	9842.19	9270.44	J/kg

Figure 10 juxtaposes the measured stress–strain responses with those predicted by the Zhu and Zhang model employing the identified parameter set for the load cases LC7 and LC10. Both predictions accurately map the stress–strain behavior of the measurement data. Furthermore, Figure 11 compares the measured stress–strain responses with those predicted by the Zhu and Zhang model using the identified parameter set for the load cases LC15 and LC18. The results show that the constitutive model employing the identified parameter set is able to accurately map the experimental stress–strain behavior even at different ambient temperatures, demonstrating the generality of the parameter set identified by the proposed method.

Figure 10.

Comparison of the experimental results (EXP) to the predictions (COM) generated by the Zhu and Zhang model using parameter set p_pro identified by the proposed method (Table 12).

Figure 11.

Comparison of the experimental results (EXP) to the predictions (COM) generated by the Zhu and Zhang model using parameter set p_pro identified by the proposed method (Table 12).

To quantify the accuracy of the stress responses predicted by the Zhu and Zhang model, the coefficient of determination is used. As a result, the R² values for the load cases LC7–LC9 and LC10–LC18 are reported in Table 13.

Table 13.

Zhu and Zhang model accuracy (R² [%]) computed using parameters identified by the proposed method p_pro and manually chosen p_manual for the load cases LC7–LC18.

	LC7	LC8	LC9	LC10	LC11	LC12	LC13	LC14	LC15	LC16	LC17	LC18
Using p_pro	98.16	98.06	96.39	96.64	94.52	91.46	95.83	98.51	97.73	97.63	97.45	98.96
Using p_manual	41.66	51.66	48.05	90.71	87.60	84.23	21.98	38.54	24.40	92.03	91.22	96.22

Additionally, to demonstrate the efficiency of the methodology, Figure 12 compares the measured stress–strain responses for load cases LC7 and LC10 with those predicted by the constitutive model using the identified parameters, where the latent heat coefficient c₂ is manually set to c₂ = 146.36. This parameter set is denoted as p_manual. The accuracy of the constitutive model prediction reduces to the R² values reported in Table 13 for the load cases LC7–LC9 as well as for the load cases LC10–LC18. Particularly for LC7–LC9 and LC13–LC15, the inaccurate computation of the latent heat influences the stress–strain behavior.

Figure 12.

Comparison of the experimental results (EXP) to the predictions (COM) generated by the Zhu and Zhang model using a manually identified parameter set p_manual (c₂ = 146.36, other parameters according to Table 12).

3.3. Application example

To demonstrate the importance of accurately identified model parameters, we use the Auricchio model with the determined model parameter sets p_pro and p_manual. The case in point is a shear frame structure with three degrees of freedom (DoFs), as illustrated in Figure 13. The design assumes that axial deformations of the columns are negligible and treats the floors as rigid entities. As a result, only horizontal DoFs come into play.

Figure 13.

Shear frame structure with its degrees of freedom (x_i for i = 1, 2, 3) and retrofitted with dampers made of SMA wire bundles. The structure faces the challenge of ground acceleration, represented as ${\ddot{x}}_{g}$ . Floor mass (m_i), stiffness (k_i), and damping (c_i) are depicted for each respective degree of freedom.

To protect the structure from seismic activities, particularly the historical El Centro, May 1940 earthquake, two dampers made up of SMA wire bundles are envisaged to be affixed between each floor. Designed to operate exclusively under tension, these SMA wires match the properties of the previously identified Nitinol alloy (comprising Ni-55.8% and Ti-43.55%). Each wire bundle spans a length of l = 600 mm and possesses a combined diameter of D = 77.46 mm.

The equation of motion of the structure reads

M \ddot{x} + C \dot{x} + K x + f_{S} (x, \dot{x}) = f (t),

(23)

where

\ddot{x}

\dot{x}

, and x are the acceleration, velocity, and displacement vectors. The mass and stiffness matrices are given by

M = diag [\begin{array}{l} 1000 & 1000 & 1000 \end{array}] [t],

(24)

and

K = [\begin{array}{l} 2.0 & - 1.0 & 0 \\ - 1.0 & 2.0 & - 1.0 \\ 0 & - 1.0 & 1.0 \end{array}] \cdot 10^{6} [kN / m] .

(25)

The damping matrix is formulated using the Rayleigh damping as

C = α M + β K,

(26)

where the coefficients are computed with periods T_i = {0.45; 0.16} s and damping ratios D_i = {0.5; 0.5}% as α = 10.37 ⋅ 10⁻² and β = 0.02 ⋅ 10⁻². The vector

f_{S} (x, \dot{x})

represents the nonlinear forces induced by the SMA dampers as a function of the interstory displacements and velocities of the structure:

f_{S} (x, \dot{x}) = {[f_{S_{1}}, f_{S_{2}}, f_{S_{3}}]}^{⊤} .

(27)

The structural responses are derived using the Newmark-beta algorithm. In this approach, the restoring force for each damper is calculated based on the stress responses given by the constitutive model, represented as $f_{S_{i}} = σ A$ . Here, σ denotes the stress according to Equation (5), while A represents the cross-sectional area of the SMA bundles. To guarantee the numerical stability of the constitutive model, a time step of Δt = 0.001 s is selected, consistent with the choice made during data generation.

The time history of the earthquake is shown in Figure 14. In Figure 15, the displacement responses of the third floor are presented, showing both scenarios: with and without the dampers. Here, the displacement responses of the third floor are calculated for both parameter sets: p_pro, which was identified by the proposed identification method as well as p_manual, where compared to p_pro only the internal energy change between the austenite and martensite phases is manually determined as Δu = 650.18 MPa.

Figure 14.

Ground acceleration of the El Centro earthquake applied to the structure.

Figure 15.

Displacement response on the third floor. The SMA responses are computed using the parameters identified by the proposed method p_pro and manually chosen p_manual. At the background the uncontrolled case is shown.

Additionally, to evaluate the control effect of the dampers, the root mean squared (RMS) values of the floor displacements are determined as

R M S = \sqrt{\frac{1}{N} \sum_{n = 1}^{N} x_{i, n}^{2},}

(28)

where N = 1 ⋅ 10⁵ denotes the number of time steps and x_i is the absolute displacement response of the corresponding i-th DoF. Compared to the uncontrolled configuration, the reduction of the RMS values for the floor displacements computed using parameters identified by the proposed method p_pro and manually chosen p_manual are reported in Table 14. We observe that the control effect computed using p_manual is overestimated by more than 19%. These results show that for an effective design, it is important to precisely identify the constitutive model parameters.

Table 14.

Reduction of the RMS values [%] of the floor displacements computed using parameters identified by the proposed method p_pro and manually chosen p_manual. Control effect is overestimated by more than 19% by using p_manual.

	x ₁	x ₂	x ₃
Using p_pro	35.6	34.5	35.2
Using p_manual	53.6	54.2	54.7

4. Conclusions

This paper introduced a method for parameter identification and tuning utilizing FNNs for uniaxial superelastic SMA models under dynamic loading. Given the ill-posed nature of the inverse parameter identification problem, a dual network architecture is used, where a surrogate forward model is paired with the inverse model. Simultaneously, this dual network architecture minimizes reliance on expert knowledge for constitutive model and parameter space selection. The networks are trained to reconstruct the target stresses, ensuring that the constitutive model accurately mimics the material responses, regardless of the physical interpretations of the identified parameters. This coupling benefits from transfer learning, ensuring consistency in parameter sets across various load scenarios. In our case study, we also accelerated the training speed with transfer learning. After an initial training of the first load case for 2000 epochs, only 20 epochs sufficed for the subsequent two load cases. Upon completion of training, the inverse models accurately identified and fine-tuned multiple parameters of two different constitutive models as well as two different SMA wire compositions. Experimental validations using in total 18 load cases including two different ambient temperatures confirmed that, with the identified parameters, the constitutive models reproduced stress responses with high precision, demonstrating generality of the parameter set identified by the proposed method. For the first SMA wire composition, the Auricchio model exceeded an accuracy of 98% for the trained load cases and 93% for the test load cases. Similar accuracy was achieved by the Zhu and Zhang model using a second SMA wire composition, where 96% for the trained load cases and 91% for the test load cases were reached. Furthermore, the application of the method to a numerical example with an SMA-retrofitted structure demonstrated the importance of precisely identified parameter sets. Two scenarios were tested: parameters identified by the proposed method and parameters chosen manually. The results show that even by varying one parameter the responses can be miscalculated significantly.

Looking forward, the study opened several directions for further exploration. While the current study focused on a specific Nitinol, the method is adaptable to other SMA compositions. To achieve this, expansions to the parameter space, including considerations for Young’s moduli and maximum inelastic strain, will be necessary, alongside additional experimental validations. Furthermore, the data in this study catered to a single SMA wire with experiments designed accordingly. However, for SMA bundles, thermodynamic behavior alterations (Casciati et al. (2018)) suggest potential model refinements to enhance accuracy. Experimental validation of such models requires adequate cyclic tensile tests, demanding our test setup to potentially be adapted, such that, for example, thicker SMA wires or bundles can be tested. Lastly, as for other machine learning applications, the training performance of the models depends on the representativeness of the data, which is related in this study to the choice and discretization of the load cases. This aspect deserves further research, such as using methods from active learning.

With continuous advancements in the field of materials science and computational methods, the integration of FNNs with constitutive modeling holds promising potential to discover and harness the unique properties of SMAs in practical applications.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Niklas Lenzen

Okyay Altay

References

AlSaleh

Casciati

El-Attar

, et al. (2012) Experimental validation of a shape memory alloy retrofitting application. Journal of Vibration and Control 18(1): 28–41. DOI: 10.1177/1077546311399946.

Altay

(2021) Vibration Mitigation Systems in Structural Engineering. Boca Raton: CRC Press. DOI: 10.1201/9781315122243.

Auricchio

Sacco

(1997) A one-dimensional model for superelastic shape-memory alloys with different elastic properties between austenite and martensite. International Journal of Non-linear Mechanics 32(6): 1101–1114. DOI: 10.1016/S0020-7462(96)00130-8.

Auricchio

Fugazza

Desroches

(2008) Rate-dependent thermo-mechanical modelling of superelastic shape-memory alloys for seismic applications. Journal of Intelligent Material Systems and Structures 19(1): 47–61. DOI: 10.1177/1045389X06073426.

Bozinovski

(2020) Reminder of the first paper on transfer learning in neural networks, 1976. Informatica 44(3): 2828. DOI: 10.31449/inf.v44i3.2828.

Casciati

Torra

Vece

(2018) Local effects induced by dynamic load self-heating in NiTi wires of shape memory alloys. Structural Control and Health Monitoring 25(4): e2134. DOI: 10.1002/stc.2134.

Cisse

Zaki

Zineb

(2016) A review of constitutive models and modeling techniques for shape memory alloys. International Journal of Plasticity 76: 244–284. DOI: 10.1016/j.ijplas.2015.08.006.

Elwaleed

(2018) Shape memory alloys: identification of the parameters necessary for constitutive models. In: IOP Conference Series: Materials Science and Engineering. Bristol, England: IOP Publishing; 012027. DOI: 10.1088/1757-899X/453/1/012027.

Fang

(2022) SMAs for infrastructures in seismic zones: a critical review of latest trends and future needs. Journal of Building Engineering 57: 104918. DOI: 10.1016/j.jobe.2022.104918.

10.

Fang

Wang

(2020) Shape Memory Alloys for Seismic Resilience. Berlin, Germany: Springer. DOI: 10.1007/978-981-13-7040-3.

11.

Glorot

Bengio

(2010) Understanding the difficulty of training deep feedforward neural networks. Proceedings of the 13th International Conference on Artificial Intelligence and Statistics. Italy: JMLR Workshop and Conference Proceedings, 249–256.

12.

Han

Xing

Xiao

, et al. (2005) NiTi-wire shape memory alloy dampers to simultaneously damp tension, compression, and torsion. Journal of Vibration and Control 11(8): 1067–1084. DOI: 10.1177/1077546305055773.

13.

Hartloper

de Castro e Sousa

Lignos

(2021) Constitutive modeling of structural steels: nonlinear isotropic/kinematic hardening material model and its calibration. Journal of Structural Engineering 147(4): 04021031. DOI: 10.1061/(ASCE)ST.1943-541X.0002964.

14.

Zhang

Ren

, et al. (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE, 1026–1034.

15.

Helm

(2004) Pseudoelasticity: experimental observations, thermomechanical modeling, and identification of the material parameters. In: Smart Structures and Materials 2004: Active Materials: Behavior and Mechanics. France: SPIE, Vol. 5387, 198–209. DOI: 10.1117/12.539787.

16.

Henrickson

Kirkpatrick

Valasek

(2013) Characterization of shape memory alloys using artificial neural networks. In: 51st AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace Exposition. Texas: ARC, 129. DOI: 10.2514/6.2013-129.

17.

Hmede

Chapelle

Lapusta

(2022) Review of neural network modeling of shape memory alloys. Sensors 22(15): 5610. DOI: 10.3390/s22155610.

18.

Ioffe

Szegedy

(2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, 448–456. DOI: 10.5555/3045118.3045167.

19.

Kaup

Altay

Klinkel

(2019) Macroscopic modeling of strain-rate dependent energy dissipation of superelastic SMA dampers considering destabilization of martensitic lattice. Smart Materials and Structures 29(2): 025005. DOI: 10.1088/1361-665X/ab5e42.

20.

Kaup

Altay

Klinkel

(2021a) Strain amplitude effects on the seismic performance of dampers utilizing shape memory alloy wires. Engineering Structures 244: 112708. DOI: 10.1016/j.engstruct.2021.112708.

21.

Kaup

Ding

Wang

, et al. (2021b) Strain rate dependent formulation of the latent heat evolution of superelastic shape memory alloy wires incorporated in multistory frame structures. Journal of Intelligent Material Systems and Structures 32(11): 1198–1214. DOI: 10.1177/1045389X20975473.

22.

Kingma

(2014) Adam: a method for stochastic optimization. arXiv:1412.6980 doi:10.48550/arXiv.1412.6980.

23.

Kumar

Tan

Zheng

, et al. (2020) Inverse-designed spinodoid metamaterials. Npj Computational Materials 6(1): 73. DOI: 10.1038/s41524-020-0341-6.

24.

Lenzen

Altay

(2022) Machine learning enhanced dynamic response modelling of superelastic shape memory alloy wires. Materials 15(1): 304. DOI: 10.3390/ma15010304.

25.

Lenzen

Klinkel

Altay

(2023) Parameter identification and modeling of superelastic shape memory alloy wires subjected to dynamic loads. In: 10th ECCOMAS Thematic Conference on Smart Structures and Materials. Greece: Department of Mechanical Engineering and Aeronautics University of Patras, 1768–1777. DOI: 10.7712/150123.9947.444214.

26.

Liang

(1990) The Constitutive Modeling of Shape Memory Alloys. Blacksburg: Virginia Polytechnic Institute and State University.

27.

Liang

Rogers

(1997) One-dimensional thermomechanical constitutive relations for shape memory materials. Journal of Intelligent Material Systems and Structures 8: 285–302. DOI: 10.1177/1045389X9700800402.

28.

Long

Ren

, et al. (2019) Inverse design of photonic topological state via machine learning. Applied Physics Letters 114(18): 181105. DOI: 10.1063/1.5094838.

29.

McKay

Beckman

Conover

(2000) A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 42(1): 55–61. DOI: 10.1080/00401706.2000.10485979.

30.

Meraghni

Chemisky

Piotrowski

, et al. (2014) Parameter identification of a thermodynamic model for superelastic shape memory alloys using analytical calculation of the sensitivity matrix. European Journal of Mechanics - A: Solids 45: 226–237. DOI: 10.1016/j.euromechsol.2013.12.010.

31.

Milicevic

Altay

(2023) Data generation framework for inverse modeling of nonlinear systems in structural dynamics applications. Acta Mechanica; 235: 1–23. DOI: 10.1007/s00707-023-03532-3.

32.

Nekouei

Raghebi

Mohammadi

(2020) Free vibration analysis of hybrid laminated composite cylindrical shells reinforced with shape memory alloy fibers. Journal of Vibration and Control 26(7-8): 610–626. DOI: 10.1177/1077546319889857.

33.

Ozbulut

Hurlebaus

(2010) Neuro-fuzzy modeling of temperature-and strain-rate-dependent behavior of NiTi shape memory alloys for seismic applications. Journal of Intelligent Material Systems and Structures 21(8): 837–849. DOI: 10.1177/1045389X10369720.

34.

Sato

Tanaka

(1988) Estimation of energy dissipation in alloys due to stress-induced martensitic transformation. Res Mechanica 23: 381–393.

35.

Shao

Andrawes

(2022) Using machine learning to predict the seismic response of an SDOF RC structure with superelastic dampers. International Journal of Civil Engineering 20(10): 1165–1180. DOI: 10.1007/s40999-022-00724-1.

36.

Sobol

(1967) On the distribution of points in a cube and the approximate evaluation of integrals. USSR Computational Mathematics and Mathematical Physics 7(4): 86–112. DOI: 10.1016/0041-5553(67)90144-9.

37.

Tabrizikahou

Kuczma

Łasecka-Plura

, et al. (2022) Application and modelling of shape-memory alloys for structural vibration control: state-of-the-art review. Construction and Building Materials 342: 127975. DOI: 10.1016/j.conbuildmat.2022.127975.

38.

Zhu

Zhang

(2007) A thermomechanical constitutive model for superelastic SMA wire with strain-rate dependence. Smart Materials and Structures 16(5): 1696. DOI: 10.1088/0964-1726/16/5/023.

Feedforward neural network-assisted parameter identification and tuning for uniaxial superelastic shape memory alloy models under dynamic loads

Abstract

Keywords

1. Introduction

2. Parameter Identification Methodology

2.1. Rate-dependent superelastic material response

2.2. Constitutive modeling

2.2.1. Auricchio model

2.2.2. Zhu and Zhang model

2.3. Parameter identification

3. Results and Discussion

3.1. Auricchio model

3.1.1. Data generation

3.1.2. Training

3.1.2.1. Effects of network architecture

3.1.2.2. Effects of learning rate

3.1.2.3. Effects of number of sampled parameter combinations

3.1.2.4. Effects of number of epochs

3.1.2.5. Effects of transfer learning

3.1.3. Experimental validation

3.2. Zhu and Zhang model

3.2.1. Data generation

3.2.2.Training

3.2.3. Experimental validation

3.3. Application example

4. Conclusions

Footnotes

Declaration of conflicting interests

Funding

ORCID iDs

References