Abstract
Following an earthquake, ground motion time series are needed to carry out site-specific nonlinear response history analysis. However, the number of currently available recording instruments is sparse; thus, the ground motion time series at uninstrumented sites must be estimated. Tamhidi et al. developed a Gaussian process regression (GPR) model to generate ground motion time series given a set of recorded ground motions surrounding the target site. This GPR model interpolates the observed ground motions’ Fourier Transform coefficients to generate the target site’s Fourier spectrum and the corresponding time series. The robustness of the optimized hyperparameter of the model depends on the surrounding observation density. In this study, we carried out sensitivity analysis and tuned the hyperparameter of the GPR model for various observation densities. The 2019
Keywords
Introduction
The current number of ground-level recording instruments is approximately 2000 in California over multiple recording networks (Southern California Earthquake Data Center, 2022). Consequently, following a major event, ground motion time series at locations devoid of recording instruments are needed to carry out site-specific nonlinear response history analysis. Currently, important products “ShakeCast” and “ShakeMap” established by US Geological Survey (Fraser et al., 2008; Lin et al., 2018; Wald et al., 2008; Worden et al., 2018) provide ground motion intensity measures (GMIM), such as peak ground acceleration and response spectral ordinates. However, as stated above, ground motion time series at uninstrumented sites are also required for seismic performance evaluation of specific structures.
Two commonly used methodologies for generating ground motion time series at uninstrumented sites can be mentioned here. First, the physics-based simulations employing the finite-fault and seismic velocity models consider the source, path, and site effects (e.g. Aagaard et al., 2008; Atkinson and Assatourians, 2015) and the topography of the Earth’s surface (Rodgers et al., 2019; Thomson et al., 2020). Second, the coherency function-based simulations employing cross-spectral density (CSD) and auto-spectral density (ASD) functions (e.g. Kameda and Morikawa, 1992; Konakli and Der Kiureghian, 2012; Zentner, 2013; Rodda and Basu, 2018). Several research studies have also been conducted to simulate unconditional spatially varying ground motions. Deodatis (1996) and Shinozuka and Deodatis (1996) employed the spectral representation method (SRM) to simulate non-stationary stochastic ground motions. Furthermore, conditional simulation of non-stationary random fields was extensively investigated (Cui and Hong, 2020; Heredia-Zavoni and Santa-Cruz, 2000; Hu et al., 2012; Vanmarcke and Fenton, 1991; Wu et al., 2015). The physics-based approaches require detailed information regarding the subsurface properties and fault features. However, The CSDs are driven using empirical or semi-empirical coherency functions, whose coefficients are often determined via data-driven methods (Abrahamson et al., 1991). The CSD functions might need some detailed site properties and wave propagation characteristics. Therefore, both methods have challenges for a rapid post-earthquake structural damage assessment in real time (Loos et al., 2020; Mangalathu and Jeon, 2020) as they are computationally expensive and time-consuming.
Tamhidi et al. (2021, 2022b) recently developed a method to simulate ground motion time series using a trained Gaussian process regression (GPR) model. This GPR model interpolates the discrete Fourier transform (DFT) coefficients of the observed nearby recorded ground motions to construct the time series at the target uninstrumented sites. The GPR model performs conditioned simulation of ground motions at an ensemble of target locations, imposing a comparatively low computational cost (Rasmussen and Williams, 2006).
The intrinsic uncertainty of ground motions affects earthquake engineering disciplines, such as the performance-based post-earthquake assessment and decision-making (Aghababaei et al., 2021; Roohi and Hernandez, 2020; Weatherill et al., 2015). Several studies attempted to quantify and model this randomness in various seismic problems (Alamilla et al., 2001; Wen et al., 2003; Yazdi et al., 2022). In this study, we focus on quantifying the uncertainty and validity of the generated motions using the GPR model introduced by Tamhidi et al. (2021). The model’s hyperparameter (the regularization factor) is fine-tuned based on observation densities, enabling the users to choose the optimum hyperparameter corresponding to the existing observed data set. Then, a methodology to generate random realizations of ground motions using the trained GPR and an inter-frequency correlation model (Bayless and Abrahamson, 2019) is reviewed. This random realization methodology provides a means to quantify the uncertainty of the generated motions at the target sites. We implemented this methodology to investigate the simulated motions’ accuracy and uncertainty using the 2019
Theoretical background
Suppose the ground motion acceleration time series at site s, as(t), is constructed with N discrete data points, as(t i ), i = 1, …, N. The time series can be decomposed into its DFT coefficients Ak (e.g. Oppenheim et al., 1997) as follows:
where
In Equations 1 and 2,
Gaussian Process Regression
A Gaussian process (GP) is a set of indexed random variables, with every finite subset following multivariate Gaussian distribution (Rasmussen and Williams, 2006). The standard form of GP is shown in Equation 3:
In Equation 3, m(
where
In Equation 4,
In Equation 7,
Tamhidi et al. (2020, 2021) demonstrated that a four-dimension input vector,
In Equation 9, T stands for the transpose operator and
The regularization factor,
Hyperparameter optimization
A higher penalty is needed when there are sparse observations (Li and Sudjianto, 2005). In other words, the optimum hyperparameter,
The observation density for the 252 CSN recording sites distributed over a 464 km2 region (CSN domain in Figure 1) is 0.54 sites/km2. To tune the optimum

Distribution of the randomly chosen subsets from CSN’s recorded 2019
In Equation 11,
1. For each site, s, within the data set, s = 1,…, Nsites (Nsites = number of sites in the data set)
1.1. Obtain the optimum parameters
1.2. Generate the ground motion time series at site s using posterior mean (Equation 5) of
1.3. Obtain the RotD50 spectrum of the mean estimated and recorded ground motions at site s and calculate NRMSE between them, called as Errors.
2. Compute the average of Errors among all sites, s = 1, …, Nsites, and store it as Erroravg corresponding to
Eventually, the
The optimum regularization factor,
NRMSE: normalized root mean square error.
We have employed the

Distribution of the RotD50 NRMSE between the recorded and mean estimated ground motions using corresponding
Ground motion random realizations
It is desirable to quantify the uncertainty of the mean estimated ground motions at any target site. The posterior mean vector and covariance matrix for all target sites’kth frequency DFT coefficients, k = 0, …, N−1, are given by Equations 5 and 6. In this study, we generate ground motion time series at one target site at each process. Therefore, Equations 5 and 6 can be converted to Equations 12 and 13, providing the scalar posterior mean,
In Equations 12 and 13,
At each kth frequency, k = 0, …, N−1:
1.1. The posterior mean and standard deviation of 1.2. The correlation between the 1.3. A set of random samples of 2 × 1 vectors of ( 1.4. The logarithmic mean and standard deviation of
The N × N covariance matrix of
Random Gaussian N × 1 vector samples of
The phase spectrum of the mean estimated ground motion is coherent with the nearby observed ground motions. Therefore, the generated samples of Fourier amplitude spectrum (FAS) in Step 3 are combined with the Phase spectrum constructed with the posterior mean DFT coefficients to generate ground motion time series realizations.
We also examined the randomization of Fourier phase spectra; yet, the results were not as promising as the outcomes stated in Step 4 above. We employ the 2019
Site properties of the Test Site 1 shown in Figure 2
Two different observed sets are considered to generate ground motion realizations at Test Site 1; first, all 251 CSN sites in Figure 2a, and second, all 24 CSN sites in Figure 2f. The

Estimated mean and five generated ground motion realizations time series along EW direction at the Test Site 1 within the
Figure 4 depicts the mean estimated and 100 ground motion realizations’ 5%-damped RotD50 spectra at Test Site 1 using 251 and 24 observed sites. It is acknowledged that the generated motions’ uncertainty is lower at long periods than those at short periods. Moreover, Figure 4 indicates that the long-period prediction (longer than 1 s) has minor variation and error for having 251 observed sites than those estimated with 24 observations. However, neither 251 nor 24 observations are dense enough to provide informative detail of short-length waves corresponding to the short periods. That is why it is seen in Figures 4a and b that the short periods’ variation is high and does not differ significantly from having 251 observations to 24 ones.

Figure 5 presents the 68% confidence interval (CI), mean ± standard deviation, for the RotD50 spectra of generated realizations at Test Site 1 employing 251 and 24 observed sites. In addition, Figure 5 demonstrates the average RotD50 spectrum provided by CB14 (Campbell and Bozorgnia, 2014), ASK14 (Abrahamson et al., 2014), and BSSA14 (Boore et al., 2014) ground motion models (GMMs) and their average within-event standard deviation.

Figure 5a indicates that the recorded ground motion response spectrum falls inside 68% CI of the generated motions using 251 observations for the majority of periods. Furthermore, Figure 5a shows that within-event uncertainty for the average GMMs is greater than that of generated motions using 251 observations. Figure 5a also displays that the logarithmic CI of the estimated motions narrows at longer periods. In contrast, the within-event standard deviation of GMMs does not change considerably. In other words, the estimated ground motions’ variability is less than that of GMMs, especially at long periods. However, Figure 5b demonstrates that the recorded ground motion response spectrum falls either outside or on the edge of 68% CI for having 24 observed sites. In addition, Figure 5 indicates that the standard deviation from short to long periods does not alter considerably for having fewer observations. Interested readers are referred to Tamhidi et al. (2022a) to know more about the observation density’s effect on 68% CI of estimated motions at other target sites of CSN.
Uncertainty quantification and sensitivity analysis
Accuracy and uncertainty of the generated time series are quantified in this section. We employed 252 CSN sites’ LOO analysis results for the 2019

The PSA logarithmic standard deviation of the estimated ground motions along EW direction at two periods T = 0.4 and 2.0 s for the
Figure 6 illustrates that estimated ground motion realizations at CSN sites on the Los Angeles basin show minor variations at long periods (T = 2.0 s) compared to those located outside the basin. However, the generated motions’ uncertainty at short period, T = 0.4 s, changes insignificantly between CSN sites inside and outside the Los Angeles basin. Comparing results at T = 0.4 with 2.0 s in Figure 6 reveals that the PSA’s logarithmic standard deviation at long period is smaller than those of short period for sites located on the basin. However, the PSA logarithmic standard deviation does not vary considerably from the short to the long period for sites located outside the basin. This is primarily because the observation density surrounding the target sites in southern part of the CSN is high enough to produce reliable long-period motions. Furthermore, the sites atop the Los Angeles basin receives more coherent long-period motions as evidenced by Kohler et al. (2020). Thus, the estimated motions at long periods are less uncertain for the target sites on the basin.
As a metric of observation density surrounding each target site, we determined the average distance (inside the 4D space established in previous sections) between each target site and its four nearest observed neighbors. In this article, we refer to this distance as “average separation distance.” The shorter average separation distance indicates a higher observation density surrounding the target site. Figure 7 depicts the scatter plot of the mean estimated motions’ PSA NRMSE within usable bandwidth along EW, North–South (NS), and RotD50 concerning the average separation distance. Figure 7 also depicts the fitted lines to the scatter plots and their R-squared, R2. The separation distance in Figure 7 is unitless as the feature vectors are all normalized, as elaborated in the “Theoretical background” section.

Scatter plot of the PSA NRMSE with respect to the average separation distance for the 2019
Figure 7 shows that estimation error and average separation distance have a general direct correlation. One can recognize that having more observations closer to the target site results in more accurate ground motion prediction, as expected and now quantified in Figure 7. Figure 8 indicates the scatter plot of the PSA’s logarithmic standard deviation along EW and NS at T = 0.4 and 2.0 s relative to the average separation distance. Figure 8 displays that the estimation uncertainty at T = 2.0 s grows as the average separation distance increases. In addition, it is noticeable that in a general trend, the uncertainty of the long-period estimation is sensitive to the observation density; but, the short-period estimation (T = 0.4 s) is not significantly correlated with the number of observations surrounding the target site. This phenomenon is due to the complexities and intrinsic unpredictability of the short-period motions, making added observations less useful to produce reliable short-period waves. Comparing the scatter plots at T = 0.4 and 2.0 s in Figure 8 reveals that the long-period motions have less variability than short-period ones at shorter average separation distances. Figure 8 demonstrates that for a target site with average distance of 0.2 from its four nearest neighbors, the estimated logarithmic standard deviation for PSA is around 0.45 and 0.30 at T = 0.4 and 2.0 s, respectively. Furthermore, Figure 8 depicts that both short and long periods’ uncertainty saturates for very long average separation distances. In other words, the GPR model produces random estimations with similar variance at short and long periods where there are too few observations.

Scatter plot of the logarithmic standard deviation of PSA along (a) EW and (b) NS for the
Figure 9 demonstrates the stacked bar plots for the proportion of target sites where the recorded PSA falls inside (or outside) the estimated motions’ 68% CI with regard to the average separation distance. The eight spans of average separation distance shown in Figure 9 are selected so that each span includes an approximately same number of target sites. Figure 9 indicates that the percentage of sites where the recorded PSA locates outside of the 68% CI rises as average separation distance grows. This pattern becomes more apparent at T = 2.0 s. The percentage of sites where their 68% CI includes the recorded PSA decreases steadily for average distances greater than 0.3 and 0.2 for T = 0.4 and 2.0 s, respectively.

Stacked bar plots of the percentage of target sites where the EW recorded PSA falls inside the 68% CI with respect to average separation distance for 2019
Consequently, it may be concluded that the target sites with higher observation densities close to them are more likely to have the recorded PSA within their 68% CI estimation. About 76% and 74% of the 252 target sites’ estimated PSA at T = 0.4 s captures the recorded one within 68% CI along EW and NS, respectively. Similarly, 70% and 76% of the target sites’ generated PSA at T = 2.0 s includes the recorded spectral ordinate within 68% CI along EW and NS, respectively. The effect of other governing parameters, such as uncertainty of the predicted site conditions (
In summary, Figure 8 shows that increasing the density of instrumentation closer to the target site reduces the variability of the generated ground motions. As a result, the higher observation density is expected to decrease uncertainty of structural engineering demand parameters derived from nonlinear response history analyses. Figure 9 indicates that for the target sites with a greater observation density, the uncertainty of long-period estimated motions is smaller than that for short periods; however, the probability that the recorded spectrum falls within 68% CI is almost the same at both short and long periods. In other words, for target sites with average distances shorter than 0.3, the PSA realizations are about 80% likely to capture the recorded spectra within their mean ± standard deviation bandwidth.
Performance evaluation on combined network data sets
Herein, we study the potential improvement of the ground motion prediction using combined observations from different seismic networks. There are various seismic networks in California, and the combined network is called California Integrated Seismic Network (CISN). First, we execute LOO ground motion prediction at each CISN station as a target site using all other CISN sites (except the target site) as observation. Second, we perform the same procedure to estimate the ground motion time series at each CISN site using all other CISN and CSN sites as observation. Comparing the predicted motions resulting from these two observed sets with the recorded ones reveals the improvement of the GPR model’s output. Ground motions recorded in two recent earthquakes are employed for this purpose: (1) 2019
2019 M7.1 Ridgecrest earthquake
We selected 121 ground-level sites from CISN that recorded the 2019

Distribution of (a) CISN sites, (b) CISN and CSN sites, (c) RotD50 spectrum NRMSE for having CISN sites as observation, and (d) RotD50 spectrum NRMSE for having both CISN and CSN sites as observation for 2019
The implemented
CISN: California Integrated Seismic Network; CSN: Community Seismic Network.
Table 3 demonstrates how the sites within the Exterior domain (Figure 10b) require
We need to make the recorded ground motions at the CSN and CISN sites consistent with each other. First, all CISN and CSN motions are rotated to line up with the EW and NS directions. In addition, zero padding at the records’ beginning and end is implemented to ensure that all motions start and finish at the same Universal Time Coordinated (UTC). Finally, the lowest sampling rate among all recorded motions is chosen as the target site’s generated motion’s sampling rate. Figures 10c and d show the distribution of the mean estimated motions’ RotD50 NRMSE. The average RotD50 NRMSE for all target sites using just CISN as observation and both CISN and CSN as observation is 0.48 and 0.39, respectively. This means that the average RotD50 NRMSE is reduced by 19% due to the added CSN sites. In general, the NRMSE below a judgmental value 0.3 indicates a reasonably precise estimation in terms of both time series and response spectrum. However, an NRMSE larger than 0.4 demonstrates a poor estimation. Interested readers are referred to the Appendix section of Tamhidi et al. (2022a), which contains a variety of instances for predictions’ NRMSE.
About 80% of the target sites inside the inner domain had mean estimated motions’ RotD50 NRMSE lower than 0.34. However, there are two sites within the inner domain with RotD50 NRMSE values of 0.52 and 0.9 (orange and red points in Figure 10d). Figure 10b indicates that the majority of added CSN observations are positioned on almost one side of these two target points, resulting in a non-uniform observation distribution around them, which might lead to an inaccurate ground motion estimations as evidenced by Tamhidi et al. (2021). Table 4 compares the average of the mean estimated ground motions’ NRMSE along each horizontal component and RotD50 spectra for various domains.
The prediction error along EW, NS, and RotD50 response spectra in different domains for the 2019
NRMSE: normalized root mean square error; CISN: California Integrated Seismic Network; CSN: Community Seismic Network.
Error reduction shows the reduction in the average NRMSE among all CISN target sites due to the added CSN sites.
Table 4 demonstrates that additional CSN sites generally improve the generated motions’ accuracy along both horizontal components for Inner domain target sites. Furthermore, Table 4 reveals that the added CSN sites had the least impact on the predictions for the target sites in the Exterior domain. Therefore, the prediction for the target sites inside the added network’s borders is improved as more observations become available. However, this effect is less substantial for the target sites outside the added network’s domain. The effect of having more observations on the predictions’ error and uncertainty at different periods is investigated by Tamhidi et al. (2022a).
Three CISN target sites are chosen (Figures 10c and d) to indicate the improvement of the generated motions after adding CSN sites. Figure 11 displays the predicted motions’ RotD50 spectra and velocity time series along EW. Figures 11a and b illustrate how the amplitude of the velocity time series fits closer to the recorded one after observing additional sites from CSN. Similarly, the response spectrum of the prediction matches more precisely to the recorded one, having more observations.

The RotD50 and velocity time series of the prediction using CISN and CISN plus CSN observation along EW direction for the test sites (a) No. 1, (b) No. 2, and (c) No. 3 for 2019
2020 M4.5 South El Monte earthquake
In addition, we evaluate the influence of added observations for the recently recorded ground motions of the 2020

Distribution of (a) CISN sites, (b) CISN and CSN sites, (c) RotD50 spectrum NRMSE for having CISN sites as observation, and (d) RotD50 spectrum NRMSE for having both CISN and CSN sites as observation for 2020 South El Monte earthquake.
The 2020 M4.5 South El Monte earthquake features (USGS, 2020)
UTC: Universal Time Coordinated.
Table 6 summarizes the observation density and the corresponding employed
The implemented
CISN: California Integrated Seismic Network; CSN: Community Seismic Network.
Figures 12c and d demonstrate the distribution of the mean estimated motions’ RotD50 NRMSE at each CISN site. The average RotD50 NRMSE among all target sites for CISN-only and CISN-plus-CSN observed sites is 0.80 and 0.75, respectively. Approximately 67% (12 sites) of the target sites inside the Inner domain had an NRMSE smaller than 0.32 (Figure 12d). There are three target sites inside the Inner domain with an NRMSE larger than 0.5, indicating that their estimates worsened after adding more ground motions from CSN sites.
Table 7 compares the NRMSE of the mean estimated ground motions for each target domain. Table 7 and Figure 12 indicate that the addition of observed sites from CSN, generally improved the prediction of the ground motions inside the Inner domain (30% reduction in RotD50 NRMSE); yet, there are a few sites within the Inner domain where the estimation deteriorated after observing more sites from CSN (orange sites in Figure 12d). Comparing Table 7 and Table 4 reveals that the influence of added CSN sites for the
The prediction error along EW, NS, and RotD50 response spectra in different domains for the 2020
NRMSE: normalized root mean square error; CISN: California Integrated Seismic Network; CSN: Community Seismic Network.
Error reduction shows the reduction in the average NRMSE among all target sites due to the added CSN sites.
The influence of added observations is negligible for the sites within the Middle or Exterior domains and, in some cases, can worsen the estimations. It should be noted that the number of available CISN sites within the Middle domain is sparse (9 sites), which can affect the statistical inference of the added CSN observations’ effect in that region.
Three CISN target sites are selected to demonstrate the estimated ground motion velocity time series using CISN-only, and CISN-plus-CSN observed sites (Figures 12c and d). Figure 13 illustrates the estimated velocity time series along the NS direction using two sets of observations. Figure 13 shows how the velocity time series for the CISN target sites inside the Inner domain fits closer to the recorded one’s amplitude after the GPR model observed more CSN sites.

The RotD50 and velocity time series of the prediction using CISN and CISN plus CSN observation along NS direction for the test sites (a) No. 1, (b) No. 2, and (c) No. 3 for 2020
Concluding remarks
This research aimed to generate post-earthquake ground motion time series at uninstrumented (“target”) sites. We developed a GPR model to generate such ground motions. We explored the influence of observation spatial density (instrumented sites) on the model’s optimal hyperparameter. The optimized hyperparameter allows users to implement the GPR model for various observation densities. It was demonstrated that the required regularization factor is smaller for the region with a higher observation density. In contrast, greater regularization is needed where there are fewer observations.
We also produced ground motion realizations at the target points. This approach offers an ensemble of estimated ground motion time series for uninstrumented sites to conduct the site-specific nonlinear response history analysis, which reveals the uncertainty of the predicted structural damage resulting from record-to-record variation. Quantification of uncertainty of structural responses is a future study which is under development by the authors. The uncertainty of the estimated ground motions is assessed using the generated ground motion realizations at different CSN sites using the 2019
The effect of having additional observations from various recording networks on prediction accuracy was also investigated using the motions recorded during the 2019
Footnotes
Acknowledgements
The authors thank Prof. Tadahiro Kishida for his efforts in signal processing of the recorded ground motions. Prof. Chukwuebuka Nweke and Prof. Pengfei Wang have kindly provided the estimation of the Vs30, Z1.0, and Z2.5 values at the recording stations. Cooperation of Prof. Monica Kohler and Dr Richard Guy on Community Seismic Network (CSN) data is greatly acknowledged. The comments from two Earthquake Spectra anonymous reviewers are appreciated.
Data and Resources
The raw
(last accessed May 2022).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported by the University of California, Los Angeles (UCLA) Graduate Fellowship to the first author, which is highly appreciated. Partial support from the National Science Foundation (award no. 2025310), California Department of Transportation, and Pacific Gas and Electric Company is also gratefully acknowledged. Any opinions, findings, conclusions, or recommendations expressed are those of the authors and do not necessarily reflect those of the supporting agencies.
