Abstract
It is of clinical interest to estimate pure-tone thresholds from potentially available objective measures, such as stimulus-frequency otoacoustic emissions (SFOAEs). SFOAEs can determine hearing status (normal hearing vs. hearing loss), but few studies have explored their further potential in predicting audiometric thresholds. The current study investigates the ability of SFOAEs to predict hearing thresholds at octave frequencies from 0.5 to 8 kHz. SFOAE input/output functions and pure-tone thresholds were measured from 230 ears with normal hearing and 737 ears with sensorineural hearing loss. Two methods were used to predict hearing thresholds. Method 1 is a linear regression model; Method 2 proposed in this study is a back propagation (BP) network predictor built on the bases of a BP neural network and principal component analysis. In addition, a BP network classifier was built to identify hearing status. Both Methods 1 and 2 were able to predict hearing thresholds from 0.5 to 8 kHz, but Method 2 achieved better performance than Method 1. The BP network classifiers achieved excellent performance in determining the presence or absence of hearing loss at all test frequencies. The results show that SFOAEs are not only able to identify hearing status with great accuracy at all test frequencies but, more importantly, can predict hearing thresholds at octave frequencies from 0.5 to 8 kHz, with best performance at 0.5 to 4 kHz. The BP network predictor is a potential tool for quantitatively predicting hearing thresholds, at least at 0.5 to 4 kHz.
Keywords
Audiometric thresholds are the gold standard for quantitatively evaluating hearing status. However, because pure-tone audiometry requires responses from subjects, its reliability depends on subject attention and cooperation, which may be difficult to obtain in certain populations. Hence, objective estimates of pure-tone threshold are clinically desirable.
Hearing thresholds can be determined objectively using electrophysiological measurements, such as the auditory brainstem response (Gorga et al., 2006; Johnson & Brown, 2005), the auditory steady-state response (Yeung & Wong, 2007), and the cortical auditory-evoked potentials (Lightfoot & Kennedy, 2006). Electrophysiological methods, however, are time-consuming (e.g., approximately 10.5 min were needed for a single frequency; Van Dun et al., 2015). Therefore, it is worthwhile to explore alternative objective methods, such as otoacoustic emissions (OAEs).
Many studies indicate that distortion-product otoacoustic emissions (DPOAEs) can distinguish between normal and impaired ears from 2 to 4 kHz (Gorga et al., 1993a, 1993b, 1997, 2000; Kim et al., 1996; Musiek & Baran, 1997; Norton et al., 2000; Stover et al., 1996). DPOAE thresholds derived from DPOAE input/output (I/O) functions (Boege & Janssen, 2002; Gorga et al., 2003; Johnson et al., 2007, 2010; Oswald & Janssen, 2003) are significantly correlated with audiometric thresholds (e.g., r = .65, Boege & Janssen, 2002; r = .83, Gorga et al., 2003). Cochlear status at the
Transient-evoked otoacoustic emissions (TEOAEs) are able to identify hearing status at 2 and 4 kHz (Gorga et al., 1993a; Hurley & Musiek, 1994; Hussain et al., 1998; Lichtenstein & Stapells, 1996; Prieve et al., 1993) but not at 0.5 kHz (Gorga et al., 1993a; Prieve et al., 1993). Several previous studies (Gorga et al., 1993a; Hurley & Musiek, 1994; Hussain et al., 1998; Lichtenstein & Stapells, 1996; Prieve et al., 1993) failed to measure TEOAEs greater than 4 kHz due to their analysis methods described by Bray and Kemp (1987) and Kemp et al. (1990), in which the first 2.5 ms of TEOAEs were set zero, and an onset ramp was applied from 2.5 to 5.0 ms to reduce stimulus artifact. Because TEOAE latencies decreased with increasing frequency, elimination of TEOAEs’ first 5 ms reduced high-frequency (>4 kHz) TEOAEs. Later studies (Goodman et al., 2009; Keefe et al., 2011) adopted a new technique based on the double-evoked paradigm to measure TEOAEs up to 16 kHz, suggesting the clinical potential of TEOAEs in predicting hearing status from at least 1 to 10 kHz.
Stimulus-frequency otoacoustic emissions (SFOAEs) are measured at the same frequency as the probe tone within the cochlea, providing frequency-specific responses. However, SFOAEs have received less attention in clinical applications than DPOAEs and TEOAEs. Avan et al. (1991) found that audiometric thresholds at 1.5 and 2 kHz were significantly correlated with SFOAE thresholds at 0.75 and 1 kHz, respectively. Ellison and Keefe (2005) showed that SFOAEs can distinguish between normal and impaired ears from 0.5 to 8 kHz. Although clinical decision theory (Swets, 1988) has been widely used to identify the presence or absence of hearing loss (Ellison & Keefe, 2005; Go et al., 2019; Gorga et al., 1997; Stover et al., 1996), there is still much to learn of the ability of SFOAEs to quantitatively predict pure-tone thresholds.
Artificial neural networks (ANN) are mathematical models comprising many nodes (“neurons”) arranged in layers connected to each other. Each neuron sums weighted inputs and then applies a certain function to the sum to reach the output. The ANN that has been received most attention is the back propagation (BP) neural network (Rumelhart et al., 1986). A BP neural network is constructed with at least three layers: An input layer receives and distributes the input pattern, one or more hidden layers capture the nonlinearities of input-output relationship, and one layer, the output layer, produces the output pattern. It uses a supervised learning technique called BP for training with the advantages of being able to approximate any nonlinear function with satisfactory precision and capture useful information from patterns. Furthermore, BP neural network is widely used due to its strong generalization ability, which refers to the ability of applying the trained model to new samples. Upon “training” with many trials under supervision, the BP neural network “learns” the input–output relations, and then the model can be applied to other (different) samples. Here, we use a BP neural network to systematically assess the ability of SFOAEs to predict pure-tone thresholds.
Materials and Methods
Subjects
Data were collected monaurally from 230 ears of 123 subjects (62 females) with normal hearing (NH) and 737 ears of 538 subjects (256 females) with sensorineural hearing loss (SNHL) due to cochlear lesions (i.e., a loss of hair cell function). Normal-hearing subjects had air-conduction (AC) thresholds for both ears equal to or less than 25 dB HL between 0.25 and 8 kHz, with age ranging from 18 to 42 years (mean = 23.78 years, standard deviation [SD] = 4.13 years). For subjects with SNHL, AC thresholds were greater than 25 dB HL and less than or equal to 75 dB HL for at least one octave frequency between 0.5 and 8 kHz. Their ages ranged from 12 to 75 years (mean = 47.25 years, SD = 14.37 years). All participants had normal middle-ear function. The SNHL group was divided into three subcategories on a frequency-by-frequency basis, which were mild (i.e., >25 dB HL and ≤40 dB HL), moderate (i.e., ≥45 dB HL and ≤60 dB HL), and severe (i.e., ≥65 dB HL), respectively. Thus, it was possible for an individual ear to be classified as having both moderate and severe hearing loss at different separate frequencies. Table 1 lists the total number of normal and impaired ears for each test frequency (the number of NH/SNHL: 218/227, 198/244, 206/238, 218/250, and 214/244 at 0.5, 1, 2, 4, and 8 kHz, respectively). During the SFOAE test, all subjects sat comfortably on the recliner in the sound-attenuating chamber and were instructed to sleep or watch silent films with subtitles, avoiding gnashing, chewing, and swallowing to reduce noises. All subjects were informed of all experimental procedures and objectives and provided written, informed consent. They were given appropriate compensation. All procedures were approved by the institutional review board at Tsinghua University (IRB00008273).
Summary of the Number of Ears in Each Category for Each Test Frequency.
Stimulus Generation and SFOAE Recording
Stimulus generation and SFOAE recording were performed using a custom software program. Digital-to-analog conversions and analog-to-digital conversions were accomplished with a 24-bit sound card (Fireface 800, RME, Haimhausen, German) using a sampling rate of 48 kHz. Stimuli were presented to the ear via an insert earphone (ER-2, Etymotic Research, Elk Grove Village, IL, USA), and responses were recorded using a low-noise microphone (ER-10B+, Etymotic Research, Elk Grove Village, IL, USA) with an amplification of 20 dB. Prior to data collection, stimuli were calibrated in a Brüel & Kjær ear simulator (type 4157; IEC 711 standard) at half-octave frequencies from 0.125 to 8 kHz.
SFOAEs were recorded using a procedure based on the two-tone suppression method (Brass & Kemp, 1993). Figure 1 shows how the probe and suppressor tones are presented for a single SFOAE acquisition. Interval M and N were added to the traditional four-interval paradigm to eliminate the effects of system delay and SFOAE latency. There was an interval of 2

Presentation of Probe and Suppressor Tones for a Single SFOAE Acquisition. Top line shows the presentation of probe tones, and the bottom line shows the presentation of suppressor tones for a single SFOAE acquisition. The tones are presented in six consecutive Intervals M, A, B, N, C, and D, and the duration of the first interval is
Procedure
All subjects underwent an external auditory canal examination prior to the test, and cerumen (if present) was removed from the ear canal. Pure-tone AC and bone-conduction threshold from 0.25 to 8 kHz were measured in 5-dB steps on a clinical diagnostic audiometer (Otometrics, Denmark Inc., Astera). Tympanometry was performed using a 0.226-kHz probe via a clinical middle-ear analyzer (Grason-Stadler Inc., TympStar). Normal 0.226-kHz tympanometry and air-bone gaps of 10 dB or less altogether ensured that all participants had normal middle-ear function. Normal tympanometry required peak pressure between –83 and 0 daPa, peak-compensated admittance between 0.3 and 1.4 mmhos, and equivalent ear-canal volume between 0.6 and 1.5 ml. To avoid interference from spontaneous otoacoustic emissions (SOAEs), the test frequencies with strong SOAEs (i.e., peak amplitudes > 3 dB) ± 300 Hz around the center frequencies of 1, 2, 4, 8 kHz and ± 150 Hz around the center frequency of 0.5 kHz were excluded. SFOAEs were not measured in 1.5%, 5.5%, 3.6%, 2.0%, and 1.2% of ears due to the presence of SOAEs at 0.5, 1, 2, 4, and 8 kHz, respectively. SFOAE I/O functions from 0.5 to 8 kHz were measured by fixing the probe frequency
Data Analyses
Part I: SFOAEs as Predictors of Hearing Thresholds
Here, we proposed a new method based on a BP neural network and principal component analysis (PCA) to predict hearing thresholds. To test the effectiveness of this method, we compared it with the method of Boege and Janssen (2002) and Gorga et al. (2003), who did a correlation analysis between hearing thresholds and DPOAE thresholds.
Method 1
SFOAE thresholds were estimated with the approach of Boege and Janssen (2002) and Gorga et al. (2003). There were four inclusion subcriteria (collectively identical to the inclusion criterion of Method 1) for subsequent analyses. First, at least three points of the SFOAE I/O functions must have signal-to-noise ratio (SNR)

SFOAE I/O Function at 1 kHz for Subject #6 (Right Ear) and #12 (Left Ear). In the upper panel, the SFOAE level (dB SPL) is plotted as a function of probe level (log-log scale), and in the lower panel, the same data are plotted as SFOAE pressure as a function of probe level. In the lower panel, the solid line shows the fitted linear function (left panel:
Method 2
Inclusion Criterion
Figure 3 shows the process of extracting SFOAE threshold for three normal ears (left panel) and three ears with SNHL (right panel), respectively. An inclusion criterion different from that used for Method 1 was used to determine if thresholds could be predicted accurately in more ears. The probe level was raised in 5-dB increments from 5 dB SPL until SFOAE SNR ≥ 9 dB (point in the dark gray-shaded area of Figure 3). This level was regarded as the SFOAE threshold if at least N–1 stimulus point(s) of the following N stimulus points had SNR ≥ 9 dB (N equaled to 3 if there were 3 or more stimulus points after the candidate; otherwise, N equaled the total number of stimulus points after the candidate; see top right and bottom right panel). Generally, SFOAE thresholds in impaired ears were larger than those in normal ears. If an SFOAE threshold could not be determined, this ear was excluded in further analyses.

The Process of Extracting SFOAE Threshold for I/O Functions Meeting the Inclusion Criterion of Method 2.
Feature Extraction
In previous studies (Ellison & Keefe, 2005; Go et al., 2019), SFOAE level or SNR at a certain probe level was typically used as an independent predictor to predict hearing status or thresholds. In the present Method 2, feature extraction was not limited to SFOAE measurements at only one probe level. Rather, we captured as much information related to pure-tone thresholds as possible from SFOAEs measured at all probe levels. Given the likelihood of highly correlated SFOAE parameters across probe levels, PCA was performed on each of the three data sets—that is, SFOAE levels (
Figure 4 shows the percentage of variances (i.e., information) in the original data set explained by each PC for the three data sets, as well as correlation coefficient (r) between each PC and the measured pure-tone threshold. As shown in Figure 4, the first PCs accounted for the most information in the original data set (typically more than 70% of variances). And the majority of them were more relevant to the measured pure-tone threshold than other subsequent PCs. The exception was that the relation between the second PC and the pure-tone threshold was strongest when PCA was separately performed across SFOAE levels and

The Percentage of Variance in the Original Data Set Explained by Each Principal Component (PC) for SFOAE Level, SFOAE SNR, and
Pearson correlation analysis was performed to determine the significance of each input variable of Method 2 as a predictor of hearing threshold. Figure 5(A–D) plots the measured audiometric threshold as a function of SFOAE threshold, principal component (PC) of SFOAE level, PC of SFOAE SNR, and PC of

Audiometric Threshold (dB HL) as a Function of Each Input Variable in Method 2 From 0.5 to 8 kHz. A: Audiometric threshold (dB HL) plotted versus SFOAE threshold in Method 2. B: Audiometric threshold (dB HL) plotted versus principal component (PC) of SFOAE level. C: Audiometric threshold (dB HL) plotted versus principal component (PC) of SFOAE SNR. D: Audiometric threshold (dB HL) plotted versus principal component (PC) of
Model Construction and Evaluation
BP neural network was used to predict hearing thresholds. As shown in Figure 6A, the structure of BP network predictors contained three layers: the input layer, the hidden layer, and the output layer. The number of nodes in the input layer and hidden layer was 4 (or 6 for 0.5 kHz, i.e., the number of input variables) and 5, respectively. Only one node in the output was the estimate of hearing thresholds. Each frequency was analyzed separately and thus had its own neural network. Five experimental runs (or iterations) were conducted through fivefold cross-validation to avoid overfitting. As shown in Figure 6B, each data set was divided into five approximately equal-sized disjoint folds. Each fold is in turn a test set to validate accuracy of the model trained by the other four folds (i.e., training set). The process of network training and prediction is shown as follows:

BP Neural Network Model Construction and Evaluation. A: The architecture of BP network predictor for estimating hearing thresholds in Method 2. B: The schematic diagram of fivefold cross-validation. C: The process of normalizing the pure-tone thresholds estimated by BP network predictor at 5-dB intervals. D: The architecture of BP network classifier for identifying hearing status.
Step 1: During the kth run, take four folds as a training set and the remaining fold as a test set (see Figure 6B). The initial connection weights among the nodes are randomly assigned first.
Step 2: The operating signal of the training set is propagated from the input layer, via the hidden layer, to the output layer. During the forward propagating process, the weights are constant, and each neuron’s status only influences the next layer.
Step 3: If the expected output cannot be obtained in the output layer, it then turns to the BP of error signal (i.e., the difference between the real output and expected output of the network). In the BP of error signal, the error signal is back propagated from the output end to the input layer of the network for updating weights.
Step 4: Repeat Step 3, the weight value of network is continuously updated to make the output closer to the expected one, until the error is reduced to a set minimum value or reaching the steps of training, the weights are fixed, and network training has been completed.
Step 5: Take samples of the training set as input of the trained network, it can get prediction results of the training set. Likewise, take being predicted samples of the test set as input of the trained network, predicted hearing thresholds can be obtained. Predicted hearing thresholds of the training set and the test set are then normalized at intervals of 5 dB according to Figure 6C.
Step 6: MAE of the training set and the test set was calculated separately for each run. After all five runs were completed, the final performance was the average of the five MAEs resulting from these five runs. Thus, mean MAE of five runs was calculated for the training set and the test set to monitor whether the model was overfitting. MAE is adopted to evaluate the performance for estimating hearing threshold, defined as Equation 1.
Part II: SFOAEs as Identifiers of Hearing Status
Feature Extraction
All data were included in the BP network classifiers to identify hearing status (the inclusion criterion was irrelevant, as all data were meaningful in terms of identifying hearing status). For the BP network classifier, three features (PC of SFOAE level, PC of SFOAE SNR, PC of
Model Construction and Evaluation
The structure of the BP network classifier is shown in Figure 6D. It consists of an input layer, a hidden layer, and an output layer. The input and hidden layers were constructed with 3 (or 5 for 0.5 kHz, i.e., the number of input variables) and 5 nodes, respectively. The two nodes in the output layer represented two classes of hearing status (normal vs. impaired). Each frequency was analyzed separately and thus had its own neural network. Fivefold cross-validation was conducted. The process of network training and prediction for the BP network classifier are in common with the aforementioned BP network predictor in Method 2. Receiver operating characteristic (ROC) curves (plots of true positive rate, which is the proportion of ears with hearing loss that were correctly identified, versus false positive rate, which is the proportion of ears with NH incorrectly classified as hearing loss) were constructed. The area under the ROC curve (AUC) and classification accuracy (i.e., the percentage of ears that were correctly identified) were used to assess performance of the BP network classifier for each frequency.
Results
Part I: SFOAEs as Predictors of Hearing Thresholds
Evaluation of the Audiometric Thresholds of Ears Not Meeting the Inclusion Criterion
Data analyzed with Method 1 were derived from SFOAE I/O functions in which at least three points had SNRs
Percentage of Ears Failing to Meet the Inclusion Criterion of Method 1 and Method 2, the Percentage of These Ears That Have Audiometric Thresholds Greater Than 25 dB HL, and the Mean and Standard Deviations for These Ears at Each Frequency.
Performance of Method 1 in Predicting Hearing Thresholds
The measured audiometric thresholds were plotted as a function of estimated SFOAE threshold for each of test frequencies in Figure 7. Linear regression analyses revealed that SFOAE threshold was significantly correlated with audiometric threshold for all frequencies—correlation coefficients r of .6, .86, .79, .68, .49 (p < .001 for all frequencies) from 0.5 to 8 kHz, respectively. MAE was used to quantify the prediction performance of the linear regression model. As shown in Figure 7, the best performance was achieved at 1 kHz, with a MAE of 6.34 dB (SD = 5.85 dB). MAEs were 8.47 (SD = 7.82 dB), 7.72 (SD = 6.32 dB), and 8.28 dB (SD = 7.88 dB) for 0.5, 2, and 4 kHz, respectively. The poorest performance occurred at 8 kHz (MAE of 9.96 dB; SD = 7.57 dB).

Audiometric Threshold (dB HL) as a Function of SFOAE Threshold (dB SPL) Estimated With Method 1 From 0.5 to 8 kHz. The solid lines represent the best-fit line to the data in each panel. Also indicated is the Pearson correlation coefficient (r), the number of threshold comparisons (n) (i.e., the number of I/O functions meeting all inclusion subcriteria of Method 1), the total number of I/O functions (N), and the MAE of the linear regression model.
Performance of Method 2 in Predicting Hearing Threshold
The BP network predictor in Method 2 can predict hearing thresholds. The prediction performance of the predictor was quantified by the MAE. Table 3 lists mean MAE values of fivefold cross-validation for each test frequency. There was no overfitting in the model as the mean MAE of the training set was quite close to that of the test set, with the difference of mean MAE between the training set and the test set not exceeding 0.05 dB at each
Mean MAE of Fivefold Cross-Validation When Using Method 2’s BP Network Predictor to Estimate Hearing Thresholds.
Note. N represents the number of I/O functions meeting the inclusion criterion of Method 2. MAE = mean absolute error.
Fivefold cross-validation was conducted for modeling and evaluation. Each fold was in turn a test fold (or a test set of each run, see Figure 6B) so that each of the five folds was used exactly once as the test samples to test the model without repeating. Figure 8 shows that all test folds of five runs were collectively used to plot the distribution of prediction error (i.e., the difference between the estimated and measured hearing thresholds) for each frequency. Each panel shows the result for a different frequency, going from 0.5 (left panel) to 8 kHz (right panel). In most ears, predictions with Method 2 were within

The Distribution of Hearing Threshold Prediction Error (i.e., the Difference Between the Estimated and Measured Hearing Thresholds) for Each Frequency When Using Method 2. Also shown is the number of I/O functions meeting inclusion criterion of Method 2 (n), and the total number of I/O functions (N), the mean of absolute error (mean), and the standard deviation (SD).
Performance Comparison Between Method 1 and Method 2.
Note. MAE = mean absolute error.
Table 4 compares the percentage of cases meeting the inclusion criterion and MAE for Methods 1 and 2. It can be seen that Method 2 performed better than Method 1 in predicting hearing thresholds at all test frequencies as a larger number of ears met the inclusion criterion for Method 2 and a lower MAE was observed with Method 2. An additional analysis that Method 2 was applied to the same data set as Method 1 (i.e., the data meeting the inclusion criteria for Method 1 instead of Method 2’s own inclusion criterion) was performed for each frequency, which resulted in test MAE of 6.70, 5.02, 6.56, 5.63, and 8.19 dB for 0.5, 1, 2, 4, and 8 kHz, respectively. Based on the same inclusion criterion, the lower MAE for the additional analysis than Method 1 further verified the advantage of Method 2.
To compare how well thresholds in ears with different degrees of hearing loss were correctly estimated with Method 2, the mean and SD of absolute error (i.e., the absolute difference between the estimated and measured hearing thresholds) for each category were calculated as shown in Table 5. Hearing thresholds in ears with severe hearing loss were correctly predicted least often, resulting in the largest MAEs compared with other categories at each frequency. Small errors were observed in ears with NH and moderate hearing loss. Compared with ears with NH and moderate hearing loss, mild loss group exhibited larger MAEs.
Mean (M) and Standard Deviation (SD) of Absolute Error in Method 2 for Each Category: Normal, Mild, Moderate, and Severe Hearing Loss.
Part II: SFOAEs as Identifiers of Hearing Status
A BP network classifier was built to identify hearing status for all tested ears using an NH criterion of 25 dB HL. Fivefold cross-validation was conducted. The mean ROC curve and average AUC value of fivefold cross-validation for each frequency are shown in Figure 9A. Figure 9B compares the AUC in the present study with that of Ellison and Keefe (2005). The performance of the BP network classifier was also evaluated according to its accuracy (i.e., the percentage of ears that were correctly identified; see Figure 9C). It can be reasonably assumed that the models at all frequencies did not overfit the data as the accuracies of the training set and the test set were nearly the same. These results showed that the BP network classifier exhibited excellent performance at all test frequencies. The mean AUC exceeding 0.97 and accuracy of more than 92.1% were observed at frequencies from 0.5 to 4 kHz. The best performance was achieved at 1 kHz, which resulted in a largest AUC of 0.99

Performance of the BP Network Classifier for Identifying Hearing Status. Training and testing were conducted with fivefold cross-validation. A: The ROC curve for the classifier. B: The mean AUC of fivefold cross-validation for the BP network classifiers at all test frequencies (squares), the BP network classifier in this study using a normal-hearing criterion of 15 dB HL (stars), a BP network using univariate SFOAE as the input based on 15 dB HL criterion (circles), and the best AUC from a previous SFOAE study (Ellison & Keefe, 2005; triangles). C: Classification accuracy (%) of the training set and the test set for each frequency. The error bars represent the standard deviations of the fivefold cross-validation. D: The AUC obtained with SFOAEs in this study was compared with univariate (Gorga et al., 1993b, 1997) and multivariate DPOAE models (Gorga et al., 2000). Note the AUCs obtained in Gorga et al. (2000) are approximations based on the published plots.
Discussion
Two methods were used here to predict hearing thresholds from SFOAEs. Method 1 used a linear regression model with estimated SFOAE threshold as an independent variable according to the approach of Boege and Janssen (2002) and Gorga et al. (2003). Method 2, based on a BP neural network, performed better than Method 1 at all frequencies, as revealed by lower MAEs and higher percentage of ears meeting the inclusion criterion. The better performance of Method 2 may result from the use of PCA, which contributed to maximize the extraction of pure-tone threshold information from SFOAEs, and of multiple variables (SFOAE threshold, principal component of SFOAE level, SFOAE SNR, and
The BP network predictors of Method 2 performed well in estimating hearing thresholds at all test frequencies, but prediction performance differed across frequency: Better performance was observed at 1 to 4 kHz than at 0.5 kHz, probably due to increased noise levels with frequency decreasing during SFOAE measurement at 0.5 kHz (much as previously found in studies involving DPOAEs and TEOAEs; Gorga et al., 1993b; Prieve et al., 1993). SFOAEs were weaker at 8 kHz than at lower frequencies (as found by Dewey & Dhar, 2017; Dhar & Shaffer, 2004) and hence were difficult to separate from noise, causing a larger proportion of ears not meeting the inclusion criterion and larger MAEs.
Using Method 2, large errors in predicting high hearing thresholds (with severe hearing loss) probably resulted from too small or unreliable SFOAEs. It is well known that OAEs are generated as a by-product of the normal function of outer hair cells in the cochlea, and outer hair cell-related SNHL generally accounts for hearing loss no more than 60 dB HL. It may be also the reason why ears with severe hearing loss were almost unable to be correctly predicted. Poor prediction performance also occurred in ears with mild hearing loss, consistent with previous studies (Ellison & Keefe, 2005; Gorga et al., 1997).
The BP network classifiers achieved excellent performance in determining the presence or absence of hearing loss across all test frequencies, with performance better than in the SFOAE study of Ellison and Keefe (2005) regardless of using a normal audiometric criterion of 15 or 25 dB HL (see Figure 9B). In another set of tests with the 15 dB HL (i.e., the NH criterion used by Ellison and Keefe, 2005), SFOAE level or SNR at moderate probe level (50- or 60-dB SPL) were taken as the univariate input to a BP neural network. As shown in Figure 9B, the best AUC obtained with univariate analysis in this study was larger than the AUC in Ellison and Keefe (2005) but lower than that of the BP network classifier in the present study for each frequency. Thus, it seems, the improved performance in the present study compared with that of Ellison and Keefe (2005) reflects the advantage of multivariate models over univariate models and the use of BP neural network, as well as PCA. In addition, the present study excluded a small number of ears with strong SOAEs while these were included in Ellison and Keefe (2005).
Several investigations have shown that DPOAEs can be used to predict hearing status. Figure 9D compares the AUCs of the present SFOAE study with those of univariate and multivariate DPOAE models (Gorga et al., 1993b, 1997, 2000). The performance of SFOAEs and DPOAEs in identifying hearing status was generally similar except that SFOAEs were slightly poorer than DPOAEs for 8 kHz. Also, univariate SFOAE models were superior to univariate DPOAE models for 0.5 and 1 kHz. Standard error was also calculated for Method 2 to make comparison between the performance of SFOAEs and DPOAEs in predicting thresholds, as shown in Table 6. SFOAEs performed better in predicting hearing thresholds than DPOAEs for 1, 2, and 8 kHz (Gorga et al., 2003; Johnson et al., 2007), as evidenced by a much higher percentage of ears meeting the inclusion criterion and lower standard error for SFOAEs (see Table 6). For 0.5 kHz, despite a larger standard error for SFOAEs than DPOAEs, SFOAEs appeared to be superior to DPOAEs as the result for DPOAEs (Gorga et al., 2003) at this frequency was obtained from only 17% of ears meeting the inclusion criterion (27 ears), while a significantly larger proportion (76.9%) of ears met the inclusion criterion (342 ears) in the present SFOAE study. Similar performance of SFOAEs and DPOAEs in threshold prediction was observed for 4 kHz (Johnson et al., 2007). Thus, SFOAEs have similar potential to DPOAEs in the identification of SNHL and improve upon the prediction of hearing thresholds at some frequencies. A more complete comparison of the prediction performance of SFOAEs and DPOAEs would be to carry out all these tests on the same subjects. It is also clear that recording times should be shortened and signal extraction simplified prior to clinical applications.
Comparison of Threshold Prediction Performance Between SFOAEs in the Present Study and DPOAEs in the Studies of Gorga et al. (2003) and Johnson et al. (2007).
Note. Standard error is also calculated for Method 2 in this study to compare with the results for DPOAEs, also shown is the percentage of cases meeting the inclusion criterion. Dashes indicated that predictions were not reported at that frequency. SFOAEs = stimulus-frequency otoacoustic emissions; DPOAEs = distortion-product otoacoustic emissions.
The present study has two limitations. One is that ears failing to meet the inclusion criterion were excluded for analysis of the regression model. Another is that SFOAEs were measured at a set of standard audiometric frequencies (i.e., 0.5, 1, 2, 4, 8 kHz). SFOAE spectra are plagued by deep notches whose frequencies differ across ears. These notches in the SFOAE spectra might shift to the recording standard frequency as level increases, and thus, level-dependent notches would be observed in some SFOAE I/O functions. The preexisting notches may have reduced SFOAE level and SNR, thus leading to the overestimation of the hearing thresholds. The choice of the input variables in this study (extracting the principal components from SFOAEs at all probe levels, instead of relying solely on a single level at which a notch might happen to occur) probably minimized the effects of notches. However, a better methodology in future study would be to measure SFOAE I/O functions and corresponding audiometric thresholds at frequencies chosen individually for each ear to avoid these notches, even if they differ somewhat from the standard values.
In conclusion, SFOAEs can quantitatively predict hearing thresholds at octave frequencies from 0.5 to 8 kHz, with best performance at 0.5 to 4 kHz. In addition, SFOAEs can identify hearing status with great accuracy at all test frequencies. Further work is needed to improve prediction accuracy in ears with mild hearing loss and reduce the test time to improve the clinical potential of SFOAEs.
Footnotes
Acknowledgments
We thank Professor Mario Ruggero (Northwestern University) for helping with text editing and Fei Ji (The General Hospital of the People’s Liberation Army) for helping with data collection.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China (grant numbers 61871252 and 61271133).
