Abstract
In this study, we derived a regionalized partially nonergodic empirical ground-motion model (GMM) for subduction interface and intraslab earthquakes using an extensive global database compiled as part of the NGA-Subduction project. The model can be used to estimate peak ground acceleration (PGA), peak ground velocity (PGV), and ordinates of 5%-damped pseudo-spectral acceleration (PSA) at periods ranging from 0.01 to 10 s for
Introduction
Subduction zones can produce very large megathrust earthquakes and, thus, pose a significant seismic risk in many regions of the world. Since ground-motion models (GMMs) are an integral part of probabilistic seismic hazard analysis (PSHA), the development of GMMs for subduction earthquakes is of major importance to accurately quantify the seismic hazard from such events. This has led to the development of several empirical and simulation-based GMMs over the years for both interface and intraslab subduction earthquakes. The models based on stochastic or physics-based ground-motion simulations are typically developed for a specific application to a localized region, such as the CASC in the Pacific Northwest of the United States (Atkinson and Macias, 2009; Gregor et al., 2002), whereas many empirical models are developed for more global applications (Abrahamson et al., 2016; Abrahamson and Gulerce, 2022; Atkinson and Boore, 2003, 2008; Parker et al., 2022) or for regions where there are an abundance of recordings, such as Chile (Fayaz et al., 2023; Idini et al., 2017; Macedo and Liu, 2022; Montalva et al., 2017, 2022), Ecuador (Arteta et al., 2021), Greece (Kkallas et al., 2018), JP (Campbell et al., 2022; Ghofrani and Atkinson, 2014; Hassani and Atkinson, 2021; Morikawa and Fujiwara, 2013; Si et al., 2022; Zhao et al., 2006, 2016a, 2016b), and TW (Chao et al., 2020; Phung et al., 2020).
In this article, we summarize a regionalized partially nonergodic empirical GMM that was developed by Kuehn et al. (2020a) as part of the Next Generation Attenuation (NGA) research project on subduction earthquakes (Bozorgnia et al., 2022), hereafter referred to as NGA-Sub. We refer to this model as KBCG20 throughout the article. KBCG20 can be used to predict the RotD50 horizontal component (Boore, 2010) of peak ground acceleration (PGA), peak ground velocity (PGV), and ordinates of 5%-damped pseudo-spectral acceleration (PSA) at 21 periods ranging from 0.01 to 10 s. Details about the model and modeling approach is available in the study by Kuehn et al. (2020a), although we note that some of the coefficients from that study have since been modified. We regionalized the GMM for seven subduction-zone regions with special attention to CASC, because of the limited number of earthquakes from this latter region and its importance to seismic hazards in the Pacific Northwest of the United States (Petersen et al., 2020). In particular, KBCG20 was developed as a partially nonergodic model similar to the approach described in Stafford (2014), which means that some terms in the model vary by region while other terms are shared among the regions. The model parameters are estimated using Bayesian inference (Gelman et al., 2013; Spiegelhalter and Rice, 2009), which allows the inclusion of prior information in a probabilistic way. We also place strong emphasis on the estimation of within-model epistemic uncertainty, which is estimated from a posterior distribution of model parameters derived from the Bayesian regression.
Database
We used a subset of the NGA-Sub database based on a set of selection criteria designed to include only those data that are deemed to be accurate, reliable, and usable for GMM development (Kuehn et al., 2020a). The predictor variables included in KBCG20 are summarized in Table 1. Our data selection and exclusion criteria are listed below.
Definition of predictor variables
Inclusion criteria:
Interface (FS = 0); intraslab (FS = 1), or the lower part of a double seismicity intraslab zone (FS = 5). The latter two are combined and referred to simply as intraslab or slab events (FS = 1). Contreras et al. (2022) provides details on event classification in the NGA-Sub database.
Magnitudes of
Ratios of the largest to smallest rupture distance larger than 2, which ensures that each event has a reasonable distance range that allows a reliable estimate of event terms and attenuation parameters.
Recordings with non-missing values of moment magnitude (
Recordings with a minimum rupture distance of RRUP < 800 km or RRUP < RMAX if smaller, where RMAX is the minimum distance to non-triggered recordings for each event, which arises when an instrument triggers only above a certain amplitude threshold (see Contreras et al., 2022).
Exclusion criteria:
6. Predictor variables other than basin depths with missing values.
7. Stations with an instrument depth > 2 m to avoid depletion of high-frequencies due to embedment effects.
8. Recordings with PGA > 10 g, which likely represent incorrect instrument gains or other errors.
9. Recordings with a multiple event flag equal to 1 representing recordings with more than one recorded earthquake.
10. Recordings with a visual quality flag equal to 2 representing a late S-trigger or 9 representing non-useable data.
11. Recordings with a GMX site classification first letter code of N, Z, or F representing non-free-field stations.
We do not discard recordings from stations with no estimate of Z1.0 or Z2.5. These so-called basin or sediment depths are only available for a limited number of stations and regions. Discarding these stations would severely reduce the number of usable data. Furthermore, where they are available, including them in the regression was found to significantly reduce the within-event aleatory variability at long periods.
For CASC, two events were classified as interface earthquakes in the database. Located off the central Oregon coast, they have magnitudes of
We also excluded the 4 October 1994
The selected database has a maximum of 16,045 three-component recordings from 238 events and 3769 recording stations. There are 6864 recordings from 113 interface events and 9181 recordings from 125 intraslab events. At long periods, the number of recordings is reduced due to a smaller useable bandwidth. Table 2 lists the number of recordings, number of events, number of stations, and region abbreviation for each of the subduction regions for the subset of NGA-Sub data selected for analysis. JP dominates the data set with over 50% of the recordings. TW and SA also have a significant number of events but fewer recordings per event than JP. The number of available recordings for CASC, CAM, and NZ is relatively sparse. The magnitude–distance distribution of the data is shown in Figure 1.
Selected number of recordings, events, and stations by region

Magnitude and distance distribution of data from (a) interface and (b) intraslab events with separate colors for the seven modeled subduction regions.
Ground-motion model
The GMM is developed as a Bayesian multi-level/hierarchical model (Gelman and Hill, 2006). This means that we recognize that there are different levels in the data that can be exploited during the model-building process. Each level consists of data that can be grouped according to some criterion, such as by event or geographic region. The levels form the following hierarchy from broadest to narrowest: global → regional → event → recording. Within each level, some parameters are shared within groups, such as the event term for all recordings from the same event, while others are taken from higher levels. The inclusion of the regional, event, and recording levels allows us to relax the ergodic assumption that all recordings come from the same population by associating them with groups or hierarchical levels (Anderson and Brune, 1999), leading to what is referred to as a partially nonergodic model (Stafford, 2014).
The coefficients that are regionalized in KBCG20 are those related to the overall amplitude (constant), anelastic attenuation, linear site response, and basin response. The constant, linear site response, and anelastic attenuation are modeled as regional random effects (Stafford, 2014). The basin response is modeled as an adjustment to the final regional GMM. Because basin depths are not available for all regions and for all sites within a region, they were estimated from the regional within-event residuals of the model. We describe the regionalization of the model in more detail in the “Geographic regionalization” section of the article.
Base model
At the recording level, the model assumes that the target ground-motion variable
where µ is the sum of the natural logarithmic median base function
The mean prediction
Description of model coefficients
Table 3 also identifies which of the coefficients are regionalized. In addition to the estimated coefficients, there are a few coefficients that are fixed (see Table 4). To ensure a physically meaningful spectrum, the predicted median PSA at short periods is not allowed to be smaller than PGA as given by the equation:
where
Fixed model parameters
Logistic hinge function
A bilinear logistic hinge function is used in several of the terms listed in Equation 4. It provides a smooth transition from one slope to another. It is defined by the equation:
where x0 is the slope breakpoint, b0 is the slope for x < x0, b1 is the slope for x > x0, and δ controls the smoothness of the transition between slope b0 and b1 with smaller values of δ leading to a sharper transition. The bilinear logistic hinge function is used to model the magnitude-scaling term
Constant term
The constant term is given by the equation:
which allows a different constant depending on whether the earthquake is an interface or an intraslab event.
Magnitude term
The magnitude-scaling term is modeled by the following bilinear logistic hinge function:
The magnitude-scaling functional form is the same for interface and intraslab events, but the model coefficients and fixed parameters are different (see Tables 3 to 5). In particular, the breakpoint magnitude (
Example breakpoint magnitudes
During an initial regression, it became apparent that at long periods (T > 1 s), the event terms for interface events at large magnitudes were biased low compared to the observations. As a result, an adjustment to
This adjustment is only applied to JP and SA interface events in the regression, since these regions are the only ones with large-enough magnitudes to be effected by this adjustment. However, we believe that there is no reason why it should not be applied to every subduction region in a forward prediction of large-magnitude ground motions.
Geometric attenuation term
The geometric attenuation term (i.e. geometrical spreading) is modeled by the equation:
where the finite-fault or “pseudo-depth” term is defined by the equation:
The regression coefficients θnft,1 and θnft,2 are given a relatively strong prior distribution because they are not well-constrained by the data.
Source-depth term
The source-depth term is given by a bilinear logistic hinge function with different slopes and depth breakpoints for interface and intraslab events as follows:
where Zif,ref and Zslab,ref are reference depths (see Table 4) and δZ is the smoothness transition parameter between the two linear slopes. The bilinear DSR has breakpoints at ZB,if + δZB,if and ZB,slab + δZB,slab determined from the regression. The breakpoints were set to ZB,if = 30 km and ZB,slab = 80 km based on initial regressions. The period-dependent adjustment coefficients δZB,if and δZB,slab were determined by regression because the DSR breakpoints have no theoretical basis. The coefficients for the DSR up to the depth breakpoints are θ9,if and θ9,slab. Both the coefficients for the DSR above the break points were fixed at θ10 = 0, consistent with the study by Abrahamson et al. (2016, 2018), because there are not many events with depths larger than these breakpoints and those depths that do exist do not show any systematic trend with ground-motion amplitude.
Site-response term
The site-response (site-amplification) term is parameterized as in the study by Campbell and Bozorgnia (2014):
The linear site-scaling coefficient θ7 is a regression coefficient that varies by geographic region. The period-dependent parameters k1 and k2, as well as the period-independent constants c and n, are fixed to the values of Campbell and Bozorgnia (2014). Hence, we make the implicit assumption that the nonlinear site amplification for subduction events is similar to that of crustal events, consistent with the assumptions of Abrahamson et al. (2016) and Montalva et al. (2017), although we note that Zhao et al. (2016a, 2016b) Campbell et al. (2022), Parker et al. (2022), and Si et al. (2022) suggest a different site-response term for Japanese recordings.
Anelastic attenuation term
The anelastic attenuation term is defined by the equation:
where FX = 1 when the travel path passes through a volcanic arc (i.e. the transition between the forearc and backarc attenuation subregions) and FX = 0 otherwise. The regression coefficient θ6,i represents the anelastic attenuation coefficient for attenuation subregion i before it crosses a volcanic arc. The regression coefficient θ6,xi represents the anelastic attenuation coefficient for attenuation subregion i after it crosses a volcanic arc. The distance Ri is the distance traveled within each subregion, the sum of which equals RRUP. The coefficient θxc is the offset as the travel path crosses the volcanic arc and θ6,2 is the attenuation rate for those subduction regions that do not have different attenuation subregions.
The difference in how the subduction regions are modeled is based on inspection of residual plots and ground-motion scaling of trial regressions for each subregion that did not include forearc and backarc attenuation coefficients. Subregion index 1 indicates a travel path within the backarc of JP, CAM, or SA; Subregion 2 indicates a travel path within the forearc of CAM, SA, or northeastern JP; and Subregion 3 indicates a travel path within the forearc of southeastern JP. The northeastern JP forearc sits atop the JP Trench megathrust and the southeastern JP forearc sits atop the Nankai Trough megathrust. Figure 2 shows the attenuation subregions for JP, SA, and CAM.

Maps showing anelastic attenuation subregions for (a) Japan and (b) South America and Central America and Mexico.
Basin-response term
The effect of the deep crustal structure beneath the site is modeled with the following basin-response term:
where Zxx is either Z1.0 or Z2.5 depending on the region,
where the regional coefficients θz1 through θz4 are different for CASC, JP, NZ, and TW. Basin depths are not available for all subduction regions or for all recording sites within a region. Although both Z1.0 and Z2.5 are available for CASC, the values of Z1.0 are typically associated with the depth to the relatively shallow high shear-wave velocity horizon at the quaternary–tertiary boundary (Stephenson et al., 2019) and is not a good representation of the deeper basin structure that amplifies intermediate-to-long period ground motion (Chang et al., 2014; Frankel et al., 2009). Both basin depths are also available for JP, but we selected Z2.5 to be consistent with the depth parameter used by Si et al. (2022) in their basin-response term and the relatively deep basin depths that Morikawa and Fujiwara (2013) found gave the smallest standard deviations at T > 0.3 s when a basin-response term was included in their GMM. Only Z1.0 is available for NZ and TW and no basin depths are available for AK, CAM, and SA. For regions where basin depths are available, the basin-response term is defined as a function of the difference between the observed and a reference basin depth defined by the regional relationship between basin depth and site velocity given by Equation 16. This is done to minimize the impact of the correlation between basin depth and site velocity in the regression.
Figure 3 plots VS30 versus Z1.0 or Z2.5 for JP, NZ, and TW together with the predictions from Equation 16. Figure 4 shows a plot of VS30 versus Z2.5 for CASC. For sites in CASC, we distinguish between non-basin sites, for which the basin-response term is zero, and sites located within the Everett, Georgia, North Portland, Seattle, Tacoma, and Tualatin basins (Kuehn et al., 2020a). In this latter figure, the sites inside the Seattle basin are found to have basin depths of Z2.5≈ 7000 m independent of the value of VS30. If these sites were included in the regression for Equation 16, it would bias the slope of the

Scaling of (left) Z2.5 versus VS30 for Japan, (middle) Z1.0 versus VS30 for New Zealand, and (right) Z1.0 versus VS30 for Taiwan compared to the fitted models.

Scaling of Z2.5 versus VS30 for Cascadia showing values of Z2.5 for the separate basins in the region compared to the fitted model for all of the basins except for the Seattle basin (red solid circles) that was assigned a constant depth of 7000 m.
Geographic regionalization
As indicated in previous sections of this article, we account for regional differences in the constants θ1,if and θ1,slab; the linear site-response coefficient θ7; the anelastic attenuation coefficients θ6,x1, θ6,x2, θ6,x3, θ6,1, θ6,2, and θ6,3; and the basin-response coefficients θz1, θz2, θz3, and θz4 (see Table 2). Regional differences in ground-motion scaling are expected as a result of differences in tectonic and geological conditions. In particular, regional differences in the anelastic attenuation coefficients are expected to be related to differences in the seismological quality factor Q, and regional differences in the linear site-response and basin-response terms are expected to be related to differences in the average shear-wave velocity profile. In addition, differences in the average stress drop can translate to differences in the constants at short periods. Although basin response is also regionalized, because this term is fit to the within-event residuals after the GMM is developed, it is not part of the regionalization included in the regression. There are only enough data to estimate regional basin-response terms and their epistemic standard deviations for CASC, JP, NZ, and TW and not for a global model plus these regional adjustments.
In the Bayesian regression, the regionalized coefficients are modeled as regional random effects (Stafford, 2014) according to the following equation:
where µθ is the global coefficient and
Past studies have found that the small-magnitude short-period CASC intraslab ground motions are significantly lower compared to intraslab events from other regions (Abrahamson et al., 2016, 2018; Atkinson, 1997; Atkinson and Boore, 2003). Although we have confirmed this result, we also found that the ground motions from the two largest intraslab events in CASC are comparable to, although somewhat lower than, our global model. We did not want to bias the regional CASC constant θ1,slab toward low values based on ground motion data from these smaller earthquakes yet there are too few of them to define a CASC-specific MSR. Therefore, we used only the two largest events, the
We also found that the within-event standard deviation ϕ was smaller for those regions with a basin-response term. This was not the case for the between-event standard deviation τ that was not regionalized. We also evaluated whether the standard deviations were dependent on magnitude, distance, and subduction type (i.e. interface and intraslab), but no significant biases or trends were found due to limited data and a significant overlap of their posterior distributions. However, we did find and included a strong regional dependence of epistemic uncertainty. Additional details are available in the study by Kuehn et al. (2020a).
Regionalization of breakpoint magnitude
It is clear from modern empirical subduction interface GMMs (Abrahamson et al., 2016, 2018; Campbell et al., 2022; Morikawa and Fujiwara, 2013; Zhao et al., 2016b) and other literature reviews (Campbell, 2020; Stewart et al., 2013) that there is a magnitude at which the MSR of ground-motions from subduction interface earthquakes must become smaller. In KBCG20, this magnitude is referred to as breakpoint magnitude (
Model estimation
The model parameters estimated in the Bayesian regression are the global coefficients
Posterior distributions
The posterior distributions of the model parameters are estimated from Markov Chain Monte Carlo (MCMC) sampling using the Bayesian computer platform Stan (Carpenter et al., 2017). These posterior distributions are then used to calculate summary statistics such as the mean, median, fractiles and correlation coefficients of the coefficients, predicted values, and standard deviations. In total, we generate 800 samples from the posterior distribution. In Equation 1, the recording level of the GMM is modeled as a normal distribution with mean µ and within-event standard deviation ϕ, which means that the ground-motion parameter of interest Y is distributed according to a lognormal distribution. In a forward application of the model, this is how the model should be used even though the fit is done assuming a Student’s t-distribution as discussed next.
During exploratory regressions, we observed several apparent outlier recordings. Including such outliers can lead to a biased estimate of the median prediction and an increased estimate of the standard deviations. To mitigate this issue, we performed a Bayesian robust regression. Instead of modeling the within-event variability with a normal distribution as is traditionally the case, a Student’s t-distribution is used (Gelman and Hill, 2006; Kruschke, 2015). Compared to the normal distribution, the Student’s t-distribution has wider tails, which makes it less susceptible to outliers. For more details, see Chapter 5 in the study by Kuehn et al. (2020a) and the Supplemental material to this article.
To ensure a smooth predicted response spectrum, we smoothed the means of the model coefficients and aleatory standard deviations (i.e. the mean of the 800 posterior samples for each model parameter) using a Gaussian process (GP) regression (Rasmussen and Williams, 2006) with a squared exponential covariance function. For more details see Section 4.3 in the study by Kuehn et al. (2020a) and the Supplemental material to the article. The regional standard deviations are not smoothed because they do not affect the predicted median spectrum. For the same reason, the parameters of the correlation matrix of the regional coefficients are not smoothed. We re-centered the posterior distribution of each coefficient by subtracting its smoothed value from its distribution mean and adding this difference to each posterior value. Not only does this retain the range of values in the original posterior distribution, it also retains the correlation between the posterior distributions of the coefficients.
Prior distributions
We classify the prior distributions of the regression coefficients and the other model parameters as being a mix of weakly informative and informative depending on the available data. For those regression coefficients that are not well constrained by data, it is important to define informative prior distributions using other types of information (e.g. physical constraints or physics-based ground-motion simulations) for the results to be meaningful. For other model parameters, a weakly informative prior distribution is used to down-weight ranges of the parameter space that can be ruled out a priori. Although the term weakly informative is not well defined, it is usually interpreted to mean a probability distribution that is not very wide. Stafford (2019) suggests defining the prior distributions based on previous GMMs, which can help to stabilize regression results and lead to more reasonable predicted values (Kowsari et al., 2020; Kuehn and Scherbaum, 2016). However, because most GMMs do not report the full distributions of their coefficients, this information is generally not available. Furthermore, there is considerable overlap between the NGA-Sub database and data sets used in previous GMMs, which means that prior distributions based on published models might double-count some data.
We used informative prior distributions for the MSR coefficients θ4,if and θ4,slab for small events and θ5 for large events, as well as for the coefficients θnft,1 and θnft,2 of the near-fault saturation term. The prior distributions for the other regression coefficients are described in Section 4.1.8 of the study by Kuehn et al. (2020a) and in the Supplemental material to the article. The prior distribution of θ5 needs to be informative because there are not many events to empirically constrain it. Therefore, we base its prior distribution on results of the simulation-based GMMs of Atkinson and Macias (2009) and Gregor et al. (2002). The effective large-magnitude MSRs of these two models are shown in Figure 5. For comparison, we also plot the MSR estimated by Ghofrani and Atkinson (2014) from
where T(a,b) indicates a truncated normal distribution with lower limit a and upper limit b.

Large-magnitude scaling rates (MSRs) for selected GMMs used to set prior values in KBCG20.
The prior distributions of the MSR at magnitudes below the breakpoint magnitude are defined by the equations:
These prior distributions are informed by the physics-based simulations for intraslab events by Ji and Archuleta (2018). We performed a simple regression on the simulated ground motions and found the estimated MSR to be approximately one across all periods. Since the simulated data range is rather limited, we used the regression results as guidance to impart a wider standard deviation on the prior distribution based on the regression. The GMMs of Gregor et al. (2002) and Atkinson and Macias (2009) are valid for
Another parameter that needs to be constrained with an informative prior distribution is the near-fault or “pseudo-depth” term h(
Figure 6 shows the finite-fault terms predicted by the Abrahamson et al. (2016) and Parker et al. (2022) GMMs and the resulting prior distribution for h(

Comparison of the prior distribution of the near-fault term h(
The prior distributions for the regional standard deviations are exponentially distributed. The rate parameters of the exponential distributions are chosen such that large deviations from the global model are penalized, consistent with that proposed by Simpson et al. (2017). This approach is akin to Occam’s razor, in which the simpler model is the global model and deviations from the global model are only modeled if there is a strong regional signal in the data. Since the number of regions is small, the estimate of the regional standard deviations can be quite noisy, which means it is important to set reasonable prior distributions for these parameters (Kuehn et al., 2020a).
Results
Regional coefficients
All of the mean model coefficients and their posterior distributions are provided in the Supplemental material to the article. Herein, we briefly summarize the results of the Bayesian regression analyses with respect to regional differences, since this is one of the unique features of KBCG20 compared to previous subduction GMMs. A more comprehensive evaluation of the other coefficients is available in Section 4.2.3 of the study by Kuehn et al. (2020a) and the Supplemental material to the article. Figure 7 shows the values of the regional adjustment values of the interface constant (δθ1,if), the intraslab constant (δθ1,slab), the linear site-response coefficient (δθ7), and the anelastic-attenuation coefficient (δθ6,2). We only show δθ6,2 because this coefficient is used in the anelastic-attenuation terms of all regions as indicated in Equation 14.

Regional regression adjustment coefficients for the (a) interface constant, (b) intraslab constant, (c) linear site-amplification coefficient, and (d) anelastic attenuation coefficient.
Figure 7 indicates that regional differences in the coefficients can be relatively large, especially at short periods. Differences at long periods are mainly restricted to linear site-response scaling. As seen in Figure 7a, the regional adjustment for the CASC interface constant is slightly negative even though there are no CASC interface events in the database. The reason for this is that the regional adjustment coefficients are modeled as correlated between interface and intraslab events and the regional adjustment of the intraslab constant is negative. This leads to a negative interface constant adjustment for CASC.
Adjustment to AK regional coefficients
The database for the NGA-Sub project was finalized on 22 April 2019 (Bozorgnia and Stewart, 2020). The GMM presented in the study by Kuehn et al. (2020a) was developed using this database. Some changes were made to the database between this date and its final publication that we reviewed to determine which aspects of our model required modification. The modifications included values of VS30 for some of the recording stations in JP and SA and values of the distance metrics in AK (Mazzoni et al., 2022). Preliminary analyses showed that changes in the predicted values due to differences in VS30 were small and could be neglected, but that changes in the predicted values due to differences in the distance metrics in AK were non-negligible. Therefore, we decided to refit the Kuehn et al. (2020a) model based on the updated AK distances. To avoid computational complexities, we only refit the AK regional adjustment coefficients to avoid having to refit the entire GMM.
In the refitting process, we fixed the global coefficients to their estimated mean values and estimated the new regionalized AK adjustment constant, site-response coefficient, and anelastic-attenuation coefficient using the revised database. The outcome of this process was a set of new posterior samples of these regional coefficients. We then calculated the mean of the posterior samples and re-centered the posterior distribution obtained from the first fit to be centered around the mean value obtained from the refitting process. In this way, we keep the original posterior range and correlations. The resulting value of the kth posterior sample for coefficient i was obtained from the equation:
where i is either the adjustment constant for interface events, the adjustment constant for intraslab events, the anelastic-attenuation coefficient, or the site-response coefficient, k is an index representing the 800 posterior samples, µi, origfit is the mean regional coefficient from the original fit, and µi, refit is the mean regional coefficient from the updated fit. This refitting process did not change the coefficients for the other subduction regions or the global model.
Aleatory variability
The total aleatory variance is calculated from the equation

Mean original (dashed lines) and smoothed (solid lines) aleatory between-event standard deviation τ, within-event standard deviation ϕ, total standard deviation σ, and 5% and 95% fractiles (vertical lines) versus period calculated from the posterior distributions.
Model evaluation
As part of the evaluation process, we plot residuals for a limited number of ground-motion parameters and predictor variables as an example of the model validation we performed. Other residual plots are available in the study by Kuehn et al. (2020a) and at https://github.com/nikuehn/KBCG20. Figure 9 plots event terms (between-event residuals) against magnitude for T = 0.2 s and T = 3 s. Figure 10 plots within-event residuals for the same two periods against RRUP, indicating that attenuation is not dependent on subduction type. Overall, the residuals do not show any significant biases or trends with respect to the predictor variables that would indicate an issue with the GMM.

Event terms versus magnitude (

Within-event residuals of (a) interface events for T = 0.2 s, (b) intraslab events for T = 0.2 s, (c) interface events for T = 1 s, and (d) intraslab events for T = 1 s.
Example model predictions
Figure 11 plots predicted response spectra by region for interface events at

Example spectra for different subduction regions and subregions for (a) interface events with ZTOR = 10 km and (b) intraslab events with ZTOR = 50 km.

Magnitude scaling of PGA for (a) interface events with ZTOR = 10 km and (b) intraslab events with ZTOR = 50 km.
Epistemic uncertainty
The model parameters are estimated from a finite set of data. As a result, they inherently contain epistemic uncertainty, also referred to as within-model uncertainty. This uncertainty translates into uncertainty in the median predictions. It is important to be able to estimate this uncertainty because for some event scenarios the between-model uncertainty from a logic-tree of alternative GMMs can be small compared to within-model uncertainty. Examples of within-model uncertainty models are given by Al Atik and Youngs (2014), Lanzano et al. (2019), and Kotha et al. (2020).
In a Bayesian regression, epistemic uncertainty is quantified by a posterior distribution of model parameters. In KBCG20, the posterior distribution of each regression parameter consists of 800 samples using the MCMC methodology. Thus, there are 800 sets of regression coefficients, aleatory standard deviations, and coefficient correlations, which can be used to create 800 samples of the median predicted values and standard deviations of a given scenario event from which the mean, standard deviation, and fractiles of the median prediction can be calculated.
Figure 13a gives an example of the epistemic distribution of the predicted PSA spectrum for a particular scenario in JP. The results are in the form of a histogram of the median estimates of PSA at T = 0.01 s calculated from the 800 sets of coefficients from the posterior distribution. The median prediction is also shown for comparison. Figure 13b shows the mean MSR of PSA at T = 0.01 s for the scenario event calculated from the mean coefficients along with 10 randomly selected samples from the posterior distribution. Also shown are the 5% and 95% fractiles of the epistemic distribution for each magnitude. The 5% and 95% fractiles are calculated independently for each magnitude, meaning that different sets of sampled coefficients can contribute to the highest and lowest predicted values at each of the magnitudes. We highlight the predicted value from one particular sample in Figure 13b that has a steeper MSR than the mean to demonstrate the importance of within-model uncertainty in estimating ground motion. It also serves as a reminder that epistemic uncertainty permeates all aspects of the model (e.g. magnitude scaling, distance scaling, and site response).

Examples of epistemic uncertainty showing (a) a histogram of 800 calculated median predictions of PSA at T = 0.01 s for Japan (JP),
It is especially important to account for epistemic uncertainty in partially nonergodic models so as not to underestimate the seismic hazard (Abrahamson et al., 2019). The regional adjustment terms are calculated from a relatively small number of recordings and are associated with more epistemic uncertainty than the global model in which all data are pooled. This is particularly important for regions with a small number of recordings such as CASC. This is demonstrated in Figure 14 in which we plot the epistemic standard deviation of the median predicted value of PGA (ψµ) from the global model and the seven defined regions versus

Epistemic standard deviations (ψµ) associated with median predictions of ground motion: (a) PGA versus magnitude (
Figure 14d shows that there are larger regional differences between the values of µ at short periods than at long periods. This is because the longer-period coefficients are better constrained in the regression (see Figure 7). We also show values of µ for the global model, which are applicable when KBCG20 is applied to a new region. To calculate median predictions for the global model, we sample regional adjustment terms from their joint distribution and add the sampled adjustment terms to the coefficients from the posterior distribution as described in the study by Gelman and Hill (2006). This shows that the CASC values of µ are closer to the values of the global model for interface events than for intraslab events, consistent with the fact that there are no CASC interface events in the database.
Table 6 illustrates the effect of regionalization of epistemic uncertainty on the predicted value of PGA. Note that, although the value of ϕ in this table is the same for all regions, it is smaller at longer periods for those regions that include a basin-response term (CASC, JP, NZ, TW). Table 6 compares the values of ϕ, τ, and total aleatory standard deviation σ and the regional epistemic standard deviations ψµ to the standard deviations from a nonregionalized model (i.e. a model that includes only event terms). The nonregionalized model uses the same functional form and Bayesian regression methodology as KBCG20 except that none of the coefficients are regionalized. For the nonregionalized model, the coefficients are determined using all recordings in the database, which leads to a very small value of epistemic uncertainty. However, this small value of epistemic uncertainty is offset by larger values of ϕ and τ. The total predictive variability, which sums the aleatory and epistemic variances and describes the full range of all possible ground motions, is calculated as
Standard deviations of aleatory variability (ϕ, τ and σ), epistemic uncertainty (ψµ), and total predictive variability (σpred) for the different regional models compared to a (nonregionalized) global model
The epistemic uncertainty (ψµ) is calculated for
Engineering application
We consider KBCG20 to be generally applicable for estimating ground motions for the following range of parameters (Kuehn et al., 2020a).
General applications and global model
Ground-motion parameters: PGA (g), PGV (cm/s), PSA (g) at T = 0.01–10 s
Magnitudes:
Breakpoint magnitudes:
Source depths: ZTOR≤ 50 km (interface events), ZTOR≤ 200 km (intraslab events)
Rupture distances: 10 ≤RRUP≤ 800 km
Anelastic attenuation: Forearc (e.g. coefficient θ6,2)
Site velocity: 100 ≤VS30≤ 1000 m/s
Basin or sediment depths: Not applicable (see regional applicability)
Regional applications
Breakpoint magnitudes: See Table 5.
Source depths: ZTOR≤ 150 km for intraslab events in Colombia (SA)
Anelastic attenuation: Forearc (AK, CAM, CASC, NZ, JP, SA, TW), Backarc (AK, JP, NZ)
Basin (sediment) depths: Z2.5≤ 10 km (CASC, JP), Z1.0≤ 2.2 km (NZ, TW)
The user should be aware that the epistemic standard deviation can increase (possibly considerably) if one or more of the predictor variables are near or beyond the limit of their observed values for the region of interest.
The total value of the aleatory component of variance is
KBCG20 can be used to estimate ground motions for seven different subduction-zone regions. However, there are other subduction zones located throughout the world for which GMMs might be needed to perform a PSHA or deterministic seismic hazard analysis (e.g. the 79 subduction zones identified in the studies by Berryman et al., 2015 and Campbell, 2020). Some of these subduction zones might have limited or no ground-motion data. In that case the regionalized KBCG20 global model with its larger epistemic standard deviation should be used.
The epistemic uncertainty discussed above and in the Epistemic Uncertainty section of the article represents within-model uncertainty, since it only addresses uncertainty in the median estimates of the ground motion. Additional GMMs, such as those developed as part of the NGA-Sub project (Bozorgnia et al., 2022) or other published GMMs, should be used to account for between-model epistemic uncertainty similar to that recommended by Al Atik and Youngs (2014) for the NGA-West2 GMMs. The use of additional GMMs can address differences in basic assumptions, such as the functional form of the model and the database used to develop it, that are not accounted for in within-model uncertainty. The within-model uncertainty should be seen as a minimum representation of epistemic uncertainty.
An important aspect of KBCG20 is the use of regionalized breakpoint magnitudes (
We did not partition the aleatory standard deviations of the within-event residuals from KBCG20 into a site term (ϕS2S) and an event- and site-corrected (single-station) standard deviation (ϕSS) because of the relatively narrow azimuthal range of the recordings and the potential that unmodeled path effects might be mapped into the site terms. Therefore, the aleatory variability is ergodic with respect to the site model. If a single-station standard deviation is desired, we suggest that studies available in the literature be used to estimate reduction factors to apply to ϕ to estimate ϕSS. The majority of these single-station standard deviations are for crustal earthquakes, but some recent studies provide estimates of ϕ and ϕSS for subduction earthquakes in Chile (Montalva et al., 2017, 2022), JP (Campbell et al., 2022; Hassani and Atkinson, 2021; Zhao et al., 2016a, 2016b), northern South America (Arteta et al., 2021), TW (Chao et al., 2020; Phung et al., 2020), and globally (Abrahamson and Gulerce, 2022; Parker et al., 2022).
Cascadia Subduction Zone
The CASC is an example of why uncertainty in breakpoint magnitude should be considered. Campbell (2020) estimates a mean value for

Magnitude scaling for interface events in Cascadia (CASC) using the mean breakpoint magnitude and its uncertainty according to Campbell (2020).
Another potential issue with the prediction of ground motions for CASC interface events is the lack of interface earthquakes. As a result, we assumed that anelastic attenuation for CASC interface events was the same as for CASC intraslab earthquakes consistent with the CASC model of Abrahamson et al. (2018). On the contrary, Abrahamson and Gulerce (2022) assume that CASC intraslab events have a higher rate of attenuation than interface events consistent with their global GMM and Parker et al. (2022) assume that CASC interface events have a higher rate of attenuation than intraslab events consistent with their global GMM for interface events and their CASC-specific attenuation term for intraslab events. This uncertainty is not restricted to global and regional NGA-Sub models. Zhao et al. (2016a, 2016b) found that intraslab events had a higher rate of attenuation than interface events, whereas Si et al. (2022) found both types of events to have the same rate of attenuation. These results indicate that our assumption of similar anelastic attenuation of interface and intraslab events is not unreasonable.
All of the NGA-Sub GMMs have stronger attenuation in the backarc region of CASC than that in the study by Abrahamson et al. (2016), but similar forearc attenuation at short periods as that by Abrahamson et al. (2016) and the M9 project simulations of Frankel et al. (2018). At long periods, the M9 simulations have less attenuation than all of the NGA-Sub models for reasons that are not known at this time. We conclude that there is a great deal of epistemic uncertainty associated with ground-motion attenuation in CASC that should be taken into account in a hazard analysis for this region.
Summary and conclusion
This article presents a partially nonergodic empirical GMM for subduction interface and intraslab events developed using the extensive global NGA-Sub database. The GMM is partially nonergodic in three aspects: (1) it includes an event term as a random effect; (2) it accounts for regional differences in overall amplitude, anelastic attenuation, linear site response, and basin response as random effects for subduction zones in AK, CAM, CASC, JP, NZ, SA, and TW; and (3) it accounts for regional differences in breakpoint magnitude between subregions of the seven regionalized subduction zones. In addition, there is a global model that can be used for sites outside of the defined regions or as an epistemic alternative to a regional model. Site terms are not evaluated in the GMM but can be estimated from published GMMs. Implementation guidelines are provided in the “Engineering Application” section of the article.
We recommend that our GMM be used together with other credible and appropriate GMMs for the site or region of interest to capture between-model epistemic uncertainty in a seismic hazard analysis. No single model, even if it includes within-model epistemic uncertainty as ours does, should be used to characterize seismic hazard. Our regionalized global model should be used along with its larger epistemic uncertainty when applying it to a subduction zone not characterized in this study. If some data are available for a new region, a Bayesian approach similar to that recommended by Stafford (2019) can be used to adjust the global GMM to this region.
Supplemental Material
sj-pdf-1-eqs-10.1177_87552930231180906 – Supplemental material for A regionalized partially nonergodic ground-motion model for subduction earthquakes using the NGA-Sub database
Supplemental material, sj-pdf-1-eqs-10.1177_87552930231180906 for A regionalized partially nonergodic ground-motion model for subduction earthquakes using the NGA-Sub database by Nicolas M Kuehn, Yousef Bozorgnia, Kenneth W Campbell and Nicholas Gregor in Earthquake Spectra
Footnotes
Acknowledgements
Following the tradition of all NGA research projects, the ground-motion modeling teams as well as database developers have had continuous technical interactions that resulted in a higher quality of the final products than each researcher could achieve individually. Special thanks should be given to numerous junior and senior researchers and practicing professionals who worked on various sub-tasks of the NGA-Sub research program. Their contributions, dedication, and teamwork are greatly appreciated. We also thank three reviewers for their insightful comments that helped to improve the article.
Data and resources
The GMM developed in this study has been coded in multiple computer platforms including Visual Basic in Excel, R, MATLAB, and Python available at
https://www.risksciences.ucla.edu/nhr3/gmtools
. The coefficients of the model as well as an implementation of the GMM in the R software environment (R Core Team, 2023) is available at
https://github.com/nikuehn/KBCG20
and the
to the article and can be used to estimate median predictions and between-event, within-event, and within-model standard deviations and their associated posterior distributions.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported by FM Global, the US Geological Survey, the California Department of Transportation, and the Pacific Gas and Electric Company. This support is gratefully acknowledged. The opinions, findings, conclusions, or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the study sponsors.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
