Principles of Digital Dynamic-Range Compression

Abstract

This article provides an overview of dynamic-range compression in digital hearing aids. Digital technology is becoming increasingly common in hearing aids, particularly because of the processing flexibility it offers and the opportunity to create more-effective devices. The focus of the paper is on the algorithms used to build digital compression systems. Of the various approaches that can be used to design a digital hearing aid, this paper considers broadband compression, multi-channel filter banks, a frequency-domain compressor using the FFT, the side-branch design that separates the filtering operation from the frequency analysis, and the frequency-warped version of the side-branch approach that modifies the analysis frequency spacing to more closely match auditory perception. Examples of the compressor frequency resolution, group delay, and compression behavior are provided for the different design approaches.

Introduction

The objective of this article is to explain the design approaches that have been used for digital dynamic-range compression in hearing aids. As with any algorithm design, a dynamic-range compressor involves several engineering trade-offs. The most important processing concerns are the system frequency resolution and the processing time delay. Most digital compression systems use multiple frequency bands. For any given processing approach, increased frequency resolution comes at the price of increased processing delay. It is important to realize that there is no single best compressor design. Each system involves trade-offs between processing complexity, frequency resolution, time delay, and quantization noise.

One concern in designing a multi-channel compressor is to match the frequency resolution of the digital system to the resolution of the human auditory system. For example, several hearing-aid fitting procedures are based on loudness scaling in the impaired ear (Dillon et al., 1997), and the estimation of loudness presupposes an auditory frequency analysis. Digital frequency analysis, such as the discrete Fourier transform, typically provides constant-bandwidth frequency resolution. The frequency resolution of the human auditory system, however, is more accurately modeled by a filter bank having a nearly constant bandwidth at low frequencies but with bandwidth becoming proportional to frequency as the frequency increases (Zwicker and Terhardt 1980; Moore and Glasberg, 1983).

A second concern in designing a compression system for a hearing aid is the overall processing delay. These time delays can cause coloration effects to occur when the hearing-aid user is talking. When talking, the talker's own voice reaches the cochlea with minimal delay via bone conduction and through the hearing-aid vent. This signal interacts with the delayed and amplified signal produced by the hearing aid to produce a comb-filtered spectrum at the cochlea. Delays as short as 3 to 6 msec are detectable (Agnew and Thornton, 2000; Stone and Moore, 2002), and overall delays in the range of 15 to 20 msec can be judged disturbing or objectionable (Stone and Moore, 1999; 2002).

The system frequency-dependent delay can also introduce audible artifacts when listening to speech even when the user of the hearing aid is not talking. For example, a click is converted into a descending chirp when passed through a cascade of all-pass filters having a group delay that increases with decreasing frequency. Relatively short delays can be detected for click stimuli when the group delay varies across frequency. Blauert and Laws (1978) passed clicks through all-pass filters giving increased delay in narrow frequency regions, and found that normal-hearing subjects can detect delays as short as 1 msec at 2 kHz, with the detection threshold increasing to 2 msec at 8 kHz or 1 kHz.

Group-delay detection thresholds for speech are greater than for clicks. Based on results using one normal-hearing subject, Greer (1975) reported that for narrow-band all-pass filters the detection thresholds for speech sounds were 4 to 8 msec for a plosive, 8 to 16 msec for a vowel, and 16 to 32 msec for a fricative. For broader all-pass filters, having bandwidths approximately equal to the filter center frequency, the detection thresholds were 2 to 4 msec for a plosive, 2 to 4 msec for a vowel, and 4 to 8 msec for a fricative. The frequency-dependent group delay also can interfere with speech intelligibility, but at delays that greatly exceed the detection thresholds. Stone and Moore (2003) found that hearing-impaired listeners’ identification of nonsense syllables decreased by a small but significant amount as the low-frequency delay was increased from no delay to a 24-msec delay compared to the delay at high frequencies.

An additional processing concern is quantization noise. The basic DSP chip in a hearing aid uses 16-bit fixed-point arithmetic. The maximum dynamic range (peak level to root-mean-squared average quantization noise floor) of a 16-bit system is approximately 107 dB (Oppenheim and Schafer, 1975). This dynamic range is adequate for amplified speech, assuming no additional sources of noise. However, the signal processing operations within the hearing aid contribute additional quantization noise as numbers are multiplied together and summed. Quantization noise tends to be greater in recursive (IIR) than in non-recursive (FIR) filters, and tends to be greater in fast Fourier transform (FFT) systems than in direct time-domain implementations. Often the amount of quantization noise depends on details of the signal-processing implementation that are beyond the scope of this paper.

This paper begins with a brief summary of the signal-processing issues important in the design of a digital compressor. An overview of digital hearing aids is then provided. Single channel compression and the envelope detection used to control the compressor are then described. Filter banks using recursive and non-recursive designs are discussed, followed by a treatment of multichannel compression. Frequency-domain compression is described next. Digital filtering for compression using a side-branch frequency analysis approach is then discussed, and the technology description concludes with a discussion of compression based on digital frequency warping.

Digital Principles

There are some basic concepts that are worth keeping in mind when dealing with digital signal processing. These concepts will be briefly reviewed in this section, but this paper is not intended to be a tutorial in digital signal processing. Other publications (Rosen and Howell, 1991; Schweitzer, 1997; Lyons, 1996) provide an introduction to signal processing and digital principles, and many engineering texts are also available (Oppenheim and Schafer, 1975).

Group Delay

Group delay corresponds to the amount of time it takes a tone burst to propagate through a system. A minimum-phase system is one that has the shortest possible group delay for a given magnitude frequency response. In a minimum-phase system, the group delay is related to the slope of the magnitude frequency response; the steeper the slope, the greater the delay. Analog systems tend to be minimum-phase, so response peaks in an analog hearing aid will be associated with large amounts of group delay because of the steep slopes of the response on either side of the peak. Similarly, a low-pass or high-pass filter with a steep slope will have a greater group delay, especially in the region surrounding the cutoff frequency, than a filter having a shallow slope.

The basic principle of greater group delay associated with steeper filter slopes also applies to minimum-phase digital systems. However, in a digital system it is also possible to compensate for the variation of group delay with frequency and create a system with the same time delay at all frequencies. Such a system is called a linear-phase system. But nothing comes for free in signal processing, and a linear-phase system will always have a longer delay than a minimum-phase system realizing the same magnitude frequency response.

Frequency Analysis, Block Processing, and Sampling Rate

Frequency analysis is important in multi-channel compression. Frequency analysis can be provided by a filter bank or by a procedure such as the fast Fourier transform (FFT). In a filter-bank system, increasing the resolution of the analysis requires more filters, each having a narrower bandwidth. The narrower bandwidth in turn leads to an increase in group delay. In an FFT system, increasing the resolution of the analysis requires a larger FFT. The FFT can only be computed after the complete data segment has been acquired, so a longer FFT in turn requires a longer delay while one waits for the data buffer to be filled. Thus for both filter banks and FFTs, the better the frequency resolution, the longer the delay.

Processing systems based on a filter bank can be implemented by operating on each data sample as it is acquired, or by accumulating a block of samples and then processing them all at once. An FFT system, on the other hand, requires that a complete block be acquired first, after which the FFT can be computed. One could, if desired, compute a new FFT every time a new data sample is acquired. However, the compression gain for a hearing aid does not generally need to updated for every sample, but rather can be updated at a slower rate, for example once every millisecond. Thus it is computationally more efficient to use a block implementation for both filter banks and FFT systems. A block of data is acquired (e.g., 1–2 msec), then the entire block is filtered or the frequency analysis performed on the entire block of data at once, after which the compression gains are computed for the entire block. The advantage of block processing is computational efficiency resulting from the reduced rate of computing the compression gains, but the price paid is additional time delay since the compression gains can not be computed until the data buffer has been filled.

An additional issue in designing a digital system is the sampling rate. The highest frequency that can be represented in a digital system is the Nyquist frequency, which is half the sampling rate. Thus to include frequencies up to 8 kHz in a system using 16-bit words to represent the data, one must acquire the 16-bit words at a rate of 16 kHz or higher. Increasing the bandwidth of the hearing aid requires increasing the sampling rate. However, a higher sampling rate draws more power from the battery; doubling the sampling rate to add an additional octave to the signal bandwidth will typically halve the battery life. Thus in designing a digital hearing aid, one must balance the considerations of processing effectiveness and complexity, group delay, and battery life.

Quantization Noise

Quantization noise is a fundamental property of any digital system. Quantization noise arises from using a limited number of bits to represent the numbers in the hearing aid; the digital processing in a hearing aid typically uses 16-bit words. For example, assume that there are two numbers represented using 16-bit words:

\begin{matrix} y_{1} = x_{1} + ε_{1} \\ y_{2} = x_{2} + ε_{2} \end{matrix}

(1)

where x_k is the exact number, y_k is the approximation using the limited number of bits, and ε_k is the error between the exact number and its digital representation. When the two digital numbers are added, the errors add as well, giving

y_{1} + y_{2} = (x_{1} + x_{2}) + (ε_{1} + ε_{2})

(2)

If two large numbers of equal amplitude but opposite sign are added, the correct values may cancel, but the quantization error may still remain. Multiplication of the two numbers results in

y_{1} y_{2} = (x_{1} x_{2}) + (x_{1} ε_{2}) + (x_{2} ε_{1}) + (ε_{1} ε_{2})

(3)

If a large number is multiplied by a small number, the correct result may be small but the quantization error will be proportional to the larger number. Thus as numbers are added or multiplied, the errors accumulate along with the correct answer. The greater the number of arithmetic operations, the greater the total error, and audible noise can result. Controlling the amount of quantization noise in a hearing aid requires careful adjustment of the arithmetic operations in the processing algorithms to keep the quantization noise at a minimum.

The Hearing Aid

A block diagram of a basic digital hearing aid is presented in Figure 1. The digital device contains all of the parts needed for an analog hearing aid, plus the analog-to-digital convert (A/D), the digital-to-analog converter (D/A), and the digital processing that has replaced the analog compression circuitry. A more complete picture of the hearing aid in the ear is presented in Figure 2. Placing the hearing aid in the ear modifies the acoustic environment in which it operates. The vent, if present, provides both a feed-forward path for the sound to reach the ear canal without amplification and a feedback path by which the amplified sound in the ear canal can return to the microphone. The sound output of the hearing aid is produced by vibrations of the receiver diaphragm, which can also cause vibrations in the hearing-aid shell, thus providing mechanical feedback in addition to the acoustic feedback. The output of the receiver is also modified by the acoustics of the ear canal in which it operates and the acoustic load provided by the tympanum and the ossicles connecting the tympanum to the oval window of the cochlea.

Figure 1.

Block diagram of a generic digital hearing aid.

Figure 2.

Block diagram of a digital hearing aid inserted in the ear canal.

Every processing step and block in the hearing aid provides a modification of the frequency response and adds to the system group delay. Consider a simple in-the-ear hearing aid consisting of a microphone and receiver and a small vent in the hearing-aid shell. Assume that the A/D and D/A converters are ideal devices providing data conversion with no error or delay, and that the hearing-aid processing is a simple 30-dB gain at all frequencies which also has zero group delay. This digital system is thus equivalent to a linear analog hearing aid.

The frequency response from a simulation (Kates, 1988) of this simple hearing aid is plotted in Figure 3. At frequencies below 300 Hz, the gain is 0 dB despite the 30-dB amplification provided by the amplifier. The loss of gain is the result of the low-frequency roll-off in the microphone and receiver responses, combined with the additional loss of low-frequency power due to the acoustic interaction of the vent with the ear canal. At approximately 600 Hz there is a resonance peak in the frequency response due to the resonance of the acoustic mass of the vent and the acoustic compliance of the air in the ear canal. The response stays relatively flat at the specified 30-dB gain up to approximately 2200 Hz, at which point the receiver resonances increase the output of the hearing aid. The frequency range of the hearing aid is limited by the frequency response of the receiver.

Figure 3.

Magnitude frequency response from a computer simulation of a linear in-the-ear hearing aid having a gain of 30 dB.

The presence of the vent will also modify the low-frequency behavior of a compressor. The sound pressure in the ear canal at low frequencies is a combination of the direct sound through the vent and the compressed and amplified sound from the hearing-aid processing. Below the low-frequency cutoff caused by the vent, the only sound is that which comes through the vent. The low-frequency sound coming through the vent has a gain of 0 dB and is linear (no compression); thus in a vented hearing aid, the compression ratio will tend to 1:1 and the amount of amplification will decrease at low frequencies. It is incorrect to assume that the gain and compression ratio indicated in the digital processing will actually be realized in the frequency region below the vent-ear canal resonance frequency.

The group delay for this simulated hearing aid is plotted in Figure 4. Each resonance peak and each abrupt change in slope of the magnitude frequency response has an associated peak in the group delay. Thus there is a peak of 2.6 msec delay at 300 Hz that corresponds to the corner frequency of the system high-pass filter behavior, and a second peak of 2.0 msec delay due to the interaction of the vent and ear canal. The receiver resonances also contribute corresponding peaks in the group delay.

Figure 4.

Group delay for the hearing aid of Figure 3.

In a real digital hearing aid, the group delay would be increased by the delays inherent in the A/D and D/A converters, each of which contributes about 1 msec of delay. Thus even before the delay in the digital processing is considered, the other components of the hearing aid (microphone, receiver, A/D, D/A) and the acoustic interactions will contribute from 2 to almost 5 msec group delay. A manufacturer who only quotes the digital processing delay is misrepresenting the actual delay of the product.

Single-Channel Compression

The steady-state input/output relationship for a typical compression hearing aid is shown in Figure 5. For input levels below the lower compression kneepoint (typically 40–50 dB SPL), the system provides a constant linear gain. For input levels above the upper kneepoint (typically 85–100 dB SPL), the system provides compression limiting, in which the output level remains constant despite the changes in the input level. For input levels between the kneepoints, the system provides dynamic-range compression. The output level increases by 1/CR dB for each dB increase in the input level, where CR is the compression ratio.

Figure 5.

Input/output relationship for a typical hearing-aid compression amplifier.

Analog hearing aids typically have a volume control, with the volume control placed after the compression amplification stage of the hearing aid (Cole, 1993). The compressor operates in a feedback configuration, as illustrated in Figure 6. The block labeled “Detect” determines the signal level, and the block labeled “Comp.” compresses the signal level. The hearing-aid designer has several options as to where the volume control is to be placed. Each option, as shown by the A, B, and C in the figure, gives a different family of input/output curves as the volume control is adjusted by the user.

Figure 6.

Block diagram of a compression hearing aid using feedback compression control. The volume-control attenuation points are indicated by A, B, and C. The corresponding input/output functions are shown along the bottom of the figure (after Cole, 1993).

Assume that the compression system is configured as a high-level limiter. Control point A for the volume control shifts the output of the compressed signal and also shifts the input to the compression control circuit. The volume control thus simultaneously adjusts the gain and the compression threshold, and a separate trimmer is used to adjust the maximum output level. This operation is termed output AGC because the output level is limited. Control point B in Figure 6 shifts just the level of the compressed signal. The gain and maximum output levels are simultaneously adjusted by the volume control, but the input-referred compression threshold is unaffected. A separate trimmer adjustment is normally provided for the compression threshold. This operation is termed input AGC. Control point C in Figure 6 shifts the level of the input to the detector. Reducing the volume control output at this point reduces the signal level at the input to the detector, which in turn causes an increase in the compression gain and a higher hearing-aid output.

Digital hearing aids typically use feed-forward compression, as shown in Figure 7. The signal detection block extracts the incoming signal level, and this level is used to compute the compression gain. A volume control often is not provided in a digital instrument given the assumption that wide-dynamic-range compression will place every sound at a comfortable listening level. If a volume control is provided, however, the placement of the control will affect the compression behavior in a manner similar to that of the feedback compression shown in Figure 6. The volume control at position B shifts the level of the compressed signal, while a volume control at position A shifts both the input to the compression amplifier and the input to the level detector.

Figure 7.

Block diagram of a compression hearing aid using feed-forward compression control. The volume-control attenuation points are indicated by A, B, and C. The corresponding input/output functions are shown along the bottom of the figure.

Envelope Detection

The input/output characteristic shown in Figure 5 relates the signal output level to its input level. The input level has to be estimated from the signal using a level detector. In general, hearing aids use peak detectors to follow the maxima of the incoming signal. Because the compressor gain depends on the estimated signal level, the behavior of the compressor, whether analog or digital, will depend on the characteristics of the envelope detector.

One of the challenges in designing an envelope detector is to get desirable temporal behavior. The detector should have a quick response to rapid increases in the input signal level to prevent saturating the digital arithmetic, overloading the output amplifier, or exceeding the listener's uncomfortable loudness level. At the same time, the detector should smooth out the variations in estimates of noise and steady-state signals to prevent audible amplitude modulation of signals that are perceived as having constant loudness. In general, over-amplification of a loud sound is a greater problem than insufficient amplification of a soft sound. Thus, the envelope detector is typically designed to have a fast response to increases in the signal level to prevent overload, and a slow response to decreases in the signal level to reduce audible perturbations in the gain. This behavior gives rise to a peak detector since it tends to track the peaks of the input signal.

To illustrate the behavior of different envelope detectors, consider the detector response to a short segment of speech. The word “air” spoken by a male, is plotted in Figure 8. The speech segment is predominantly voiced, and the pitch periods are clearly visible. The onset of the word is relatively quick, taking about 40 msec, and the transition to silence at the end takes about 100 msec. The segment is followed by a pause, which is filled in by the low-level background noise of the recording.

Figure 8.

Speech segment for the word “air” from the “Rainbow Passage” spoken by a male.

The results for three different envelope detectors are plotted in Figure 9 for the segment of speech as shown in Figure 8. For the curve labeled “Block RMS,” the speech was divided into 8-msec blocks, and the root-mean-squared (RMS) level found for the signal within each block. A rapid response to increases in the signal level is provided by using a short block size. This approach has the advantage of giving the exact signal power within each block, but has the disadvantage that the RMS value can not be computed until the entire block has been read into digital memory. Thus the 8-msec block size requires a 8-msec processing delay; this delay, when added to the additional delays in the hearing-aid processing, is too long for most hearing-aid applications. In addition, the block boundaries are not synchronized to the pitch period, so during the vowel there are occasional jumps in the estimated signal level. Similarly, during the noise that follows the word, the block RMS levels vary in response to the fluctuations in the noise since the same minimal smoothing, provided by the block size, is used for decreases in the signal level as is used for increases.

Figure 9.

Broad-band peak detector output for the speech segment plotted in Figure 8. The detector curves are for the root-mean-squared (RMS) level in 8-msec blocks (“Block RMS”, dashed), syllabic response with an ANSI attack time of 5 msec and a release time of 50 msec (“Syllabic”, solid), and long time constants with an ANSI attack time of 50 msec and a release time of 250 msec (“Long”, dot-dash).

A peak detector with short syllabic time constants was used for the curve labeled “Syllabic.” The detector uses a low-pass filter to smooth the magnitude of the signal. The attack time to track increases in signal level was set to 5 msec, and the release time to track decreases in the signal level was set to 50 msec. The envelope of the syllabic peak detector lies about 3 dB above that of the block RMS detector. During the voiced speech, the syllabic peak detector shows some ripple in response to the periodic glottal pulses. At the end of the utterance, the fast release time allows the syllabic peak detector to track the decrease in the signal level as closely as the block RMS detector. In the noise segment, the syllabic peak detector shows level variations that are very similar to those shown by the block RMS detector.

The ripples in the peak detector output can be reduced by increasing the attack and release time. A peak detector using an attack time of 50 msec and a release time of 250 msec is indicated in Figure 9 by the curve labeled “Long.” The longer attack and release times greatly reduce the variance in the estimated signal level during the vowel and noise segments of the utterance. However, the peak detector now responds much more slowly to changes in the input signal level. The long attack time means that the detector takes about 100 msec to respond to the onset of the vowel, during which time the compressor gain may be higher than desired because of the low estimated signal level. Similarly, the estimated signal level remains high at the end of the utterance, which will cause a reduction in gain and output level compared to the syllabic peak detector.

Filter Banks

Most hearing aids today use multi-channel compression. The most obvious way to build a multichannel system is to use a filter bank, as shown in Figure 10. Individual filters are used to separate the input signal into a multiplicity of frequency bands. The lowest frequencies are output by a low-pass filter, the highest frequencies by a high-pass filter, and the remaining intermediate frequencies by band-pass filters. The compression, typically using the feed-forward approach shown in Figure 7, is implemented independently in each frequency band.

Figure 10.

Block diagram of a multi-channel compression system.

Infinite-Impulse Filters

One way of designing the filters in the filter bank is to use digital signal processing to approximate the characteristics of analog filters. This approach uses recursive or infinite-impulse response (IIR) filters. In an IIR filter, the current output sample depends on the current and past input samples and on the past output samples. Let x(n) be the input sequence and y(n) be the output sequence. The filter output is then given by:

y (n) = \sum_{k = 0}^{K} b_{k} x (n - k) + \sum_{j = 1}^{J} a_{j} y (n - j)

(4)

This filter has J poles and K zeros. The weights are chosen using a design rule to give the desired magnitude frequency response, and are often based on a digital transformation of an existing analog filter design.

As an example of an IIR filter bank, consider a combination of three filters. The system sampling rate is 16 kHz, so the highest frequency that can be represented is 8 kHz. The first filter is a three-pole Butterworth (maximally flat) low-pass filter that passes the frequencies up to 1000 Hz. The Butterworth low-pass filter design has no ripples, but instead proceeds monotonically from the gain of approximately 1 in the pass-band to the attenuation of the stop-band. The second is a six-pole Butterworth band-pass filter that passes the frequencies from 1000 to 2500 Hz, and the third is a three-pole Butterworth high-pass filter that passes the frequencies above 2500 Hz. The filter magnitude frequency responses are plotted in Figure 11. Because the digital filter design is based on an analog Butterworth filter, the response of each filter is down 3 dB at the frequencies that define its band edges.

Figure 11.

Individual frequency response curves for third-order recursive (IIR) Butterworth filters having band edges at 1000 and 2500 Hz.

The filter outputs of a Butterworth low-pass/high-pass filter pair are 90 deg out of phase at the crossover frequency, and the filter outputs combine to give constant power and a flat magnitude frequency response. The combined output of the three filters is plotted in Figure 12. The outputs of the three filters do not combine in the same way as for a simple pair of filters. The phase responses of the filters interact to produce peaks and valleys in the composite frequency response. Changing the gain in each frequency band, as would occur in a compression system, will shift the locations of the peaks and valleys in frequency. Thus a naïve implementation of an IIR filter bank results in unanticipated phase interactions that interfere with achieving the desired frequency response and which could lead to audible processing artifacts.

Figure 12.

Composite frequency response for the combined output of the three filters presented in Figure 11.

Another way to look at the system phase behavior is to compute the group delay. Group delay is defined as the derivative of the phase with respect to frequency, and can be considered as the amount of time it takes a tone burst at a given frequency to propagate through the signal-processing system. The group delay for the combination of three IIR filters is plotted in Figure 13. For most IIR filters, the group delay depends on the slope of the magnitude frequency response with frequency; the steeper the slope the greater the delay. The maximum delay occurs at the band edges. Filters at lower frequencies have steeper slopes and thus have greater group delay. Thus in Figure 13, there is a peak of about 1 msec in the delay at 1000 Hz and a second peak of approximately 0.5 msec at 2500 Hz. Reducing the order of the IIR filter will reduce the group delay, but at the expense of reducing the slope of the filter stop band.

Figure 13.

Group delay for the combined output of the three filters presented in Figure 11.

Finite-Impulse Response Filters

In an ideal filter bank, every filter would have exactly the same phase response. If this were the case, the output of each filter would be in phase with the output of every other filter at every frequency. One could then compute the composite output by simply adding the magnitudes of the outputs of each filter without worrying about additional peaks or valleys introduced by phase interactions.

This ideal can be realized in a digital filter bank by using linear-phase finite-impulse response (FIR) filters. A FIR filter is equivalent to a tapped delay line. In terms of Eq (4), the output y(n) depends only on weighted delayed samples of the input x(n). There is no dependence on the past output samples and the {a} coefficients are all zero. The FIR filter is thus given by:

y (n) = \sum_{k = 0}^{K} b_{k} x (n - k)

(5)

A linear-phase FIR imposes the additional constraint that the set of coefficients {b} has even symmetry, with b₀=b_K, b₁=b_K-1, and so on. The filter delay is then K/2 samples at all frequencies. The quantization noise of an FIR filter is also low.

The output from a three-band linear-phase FIR filter bank is plotted in Figure 14. The filters have been designed to have band edges at 1000 and 2500 Hz, the same as for the IIR example discussed previously in this section, and each filter has a length of 129 samples. While IIR filters are 3 dB down at the band edges, the FIR filters are 6 dB down. Each filter shown in Figure 14 has a nearly flat frequency response in its pass-band, stop-band attenuation of 60 dB or more, and narrow transition bands. The stop band level means that the filter will have some response, albeit greatly reduced, to a signal that is at a frequency out of its designated pass-band. The FIR filters also have a large amount of ripple in the stop bands, the result of the finite filter length. Increased filter length can be used to reduce the ripple in the pass-band, reduce the sidelobe levels, and/or reduce the width of the transition regions at the filter band edges.

Figure 14.

Individual frequency response curves for 127-point non-recursive (FIR) filters having band edges at 1000 and 2500 Hz. The system sampling rate is 16 kHz.

The combined frequency response of the three FIR filters is plotted in Figure 15, and the group delay for the combined output is plotted in Figure 16. There is essentially no ripple in the magnitude frequency response, and the group delay is a constant 4 msec independent of frequency.

Figure 15.

Composite frequency response for the combined output of the three filters presented in Figure 14.

Figure 16.

Group delay for the combined output of the three filters presented in Figure 14.

Thus, the linear-phase FIR filter satisfies the desire for a filter bank where the phase response of the filter can be ignored in summing the weighted outputs of the individual bands. However, the advantages of the FIR filter do not come without their cost. The group delay of the FIR filter, while independent of frequency, is substantially greater than the group delay plotted in Figure 13 for the IIR filter bank. Furthermore, the computational requirements for the linear-phase FIR filter are much greater than for the IIR filter; the third-order low-pass IIR filter, for example, requires seven multiplications per output sample while the low-pass FIR filter requires 128 multiplications per output sample. Thus the design of a filter bank requires many trade-offs between the allowable degree of filter interaction, the width of the filter transition bands, the filter side-lobe suppression, and the number of arithmetic operations per second permitted by the digital signal processor and battery in the hearing aid.

In the FIR filters used in the previous example, the sampling rate in each frequency band is the same as for the original signal. But, according to the Nyquist theorem, the sampling rate of the filter output can be reduced as the bandwidth is reduced. Not every output sample is needed at the original sampling rate; one only needs to sample at higher than twice the filter bandwidth. To reduce the number of operations per second needed to implement the compressor, the signals in each frequency band can be processed at the reduced sampling rate within each frequency band. The contents of the separate bands are then passed through a complementary filter bank and recombined at the original sampling rate. The ability to reduce the sampling rate in the separate frequency bands can thus be exploited to reduce the overall processing complexity in the hearing aid (Brennan and Schneider, 1998; Chau et al., 2004), but the basic behavior of the frequency analysis and group delay is the same as for the filter bank that maintains the original sampling rate in the separate frequency bands.

Multi-Band Time-Domain Compression

The multi-band compressor shown in Figure 10 combines a filter bank with compression in each frequency band. In most implementations, the compression operates independently in each channel. The compressor output involves the reaction of each frequency band to the signal, and even simple test signals can cause complicated responses.

Temporal Response

Consider a stepped sinusoid test signal. The ANSI (1996) test for compression dynamic behavior uses a sinusoid that jumps 35 dB in level and then returns to the original level. The envelope of a stepped 1.6 kHz sinusoid is plotted in Figure 17. The level increases by 40 dB at 200 msec and then returns to the original level at 500 msec. The sinusoid frequency of 1.6 kHz was chosen to be roughly centered in the mid-frequency band of the three-channel filter bank illustrated in Figure 14.

Figure 17.

Envelope of the stepped 1.6-kHz sinusoid test signal.

The output of a compressor using the three-channel FIR filter bank to the stepped 1.6-kHz sinusoid is plotted in Figure 18. The plot has been normalized so that the highest output level is set to 0 dB. The compression ratio has been set to 2:1 in each of the three channels, and the compressor operation in each channel is independent of the operation in the other two channels. The control signal in each channel is the peak-detected filtered input, with an attack time of 5 msec and a release time of 70 msec.

Figure 18.

Envelope of the output of the three-band FIR filter compressor for the stepped 1.6-kHz test signal. The compression ratio was set to 2:1 in each frequency band.

There is obvious overshoot in the output for the jump in the test signal that occurs at 200 msec. The input signal level changes suddenly, but the peak detector output changes more slowly. Thus, the compressor gain as the input jumps is still set at the higher gain that corresponds to the lower initial input level. The higher gain applied to the suddenly higher input level causes the overshoot. The peak detector then reacts to the jump in the input signal; the peak detector output increases and the compressor gain goes down until it reaches steady-state for the higher signal level. The speed of the compressor gain adjustment is controlled by the attack time constant.

The sudden decrease in the test signal level that occurs at 500 msec causes the opposite behavior. The compressor gain has stabilized at the lower gain that corresponds to the 40-dB higher signal level. When the signal level is suddenly reduced, the compressor gain is still at the lower gain that corresponds to the high signal level. The lower gain applied to the new lower signal level causes the undershoot in the output. The peak detector then reacts to the reduction in the input signal; the peak detector output decreases and the compressor gain goes up until it reaches steady-state for the lower signal level. The speed of the compressor gain adjustment is controlled by the release time constant.

But notice the small amount of overshoot in the output at 500 msec in Figure 18 before the undershoot appears. To understand this effect, look at the individual band outputs plotted in Figure 19. The mid-frequency band shows overshoot at the jump in the 1.6 kHz test signal at 200 msec and undershoot when the signal level is suddenly reduced at 500 msec. The low- and high-frequency bands, however, show spikes in their outputs at both the jump in signal level at 200 msec and sudden reduction at 500 msec. The spikes are the result of the spectral splatter caused by the sudden changes in signal level. Any change, be it an increase or a decrease in signal level, generates spectral content across a wide frequency range. The greater the change, the greater the amount of spectral splatter.

Figure 19.

Envelope of the output in each of the three compressor bands for the stepped 1.6-kHz test signal.

The low- and high-frequency compression bands initially have high gains corresponding to the low signal levels in these two frequency regions. The jump in signal level at 200 msec then generates spectral content in these two bands, which is amplified by the high gain that was established for the low initial input levels. The low- and high-frequency compression bands than adjust to the shift in detected signal level between 200 and 500 msec. At 500 msec the sudden decrease in the test signal level generates out-of-band spectral energy that again is amplified by the compression gains set by the detected signal level prior to the sudden decrease. The low- and high-frequency channel gains then adjust once again to the lower detected levels in each frequency band.

Swept Frequency Response

Another aspect of multi-band compressor behavior is illustrated by plotting the response to a swept sinusoid. The test signal was a swept sinusoid starting at 200 Hz and continuing up to 8 kHz. The sweep was 5 sec, and the instantaneous frequency as a function of time for the swept sinusoid is plotted in Figure 20. The compressor output to the swept sinusoid excitation is plotted in Figure 21. All three bands of the compressor were set to a compression ratio of 3:1 with a 5-msec attack time and a 70-msec release time. The response to 1 kHz occurs at 2190 msec and the response to 2500 Hz occurs at 3430 msec into the sweep. There is a response peak of 2 dB at each of these two frequencies, each of which corresponds to the boundary between adjacent frequency bands for the FIR filters plotted in Figure 14.

Figure 20.

Instantaneous frequency as a function of time for the swept sinusoid excitation signal.

Figure 21.

Envelope of the output of the three-band FIR filter compressor for an input sinusoid sweep at a constant amplitude. The compression ratio was set to 3:1 in each frequency band. The sweep ran from 200 Hz to 8 kHz; 1 kHz occurs at 2190 msec, and 2.5 kHz occurs at 3430 msec.

The response peaks are the result of the filter gain at the band edges interacting with the compression amplification (Lindemann, 1997). At 1000 kHz, for example, the low-frequency band and the mid-frequency band filters both have a gain of −6 dB relative to their passband responses. The reduced signal level in each frequency band causes an increase in the gain of the compression amplifier in each band compared to the gain that the compressor would have for a signal in the center of its passband. The system output is the combination of the amplified signals in the two bands. With no compression (compression ratio of 1:1), the filter outputs would combine to give the same signal level as would occur for the signal in the center of one of the filter bands. But, with the 3:1 compression, the added gain for the reduced signal level in each band results in a peak in the system response when the band outputs are combined.

An additional source of ripple in the compressed output is the ripple in the filter sidelobes. A signal at 4 kHz, for example, will be in the pass-band of the high-frequency filter but in the stop bands of the mid- and low-frequency filters. The FIR filter stop-bands all have ripple, and the level of the signal in the stop bands is amplified relative to that in the filter passband by the compression. Thus, the multi-band compression response to the swept sinusoid amplifies any ripples or irregularities in the filter stop-band responses.

Frequency-Domain Compression

The filter bank represents an approach to time-domain processing. The input sequence is convolved with the filters one sample at a time, and the output sequence is formed by summing the filter outputs. The compressor operates independently on the signal levels estimated for each filter. An alternative approach is to divide the signal into short segments, transform each segment into the frequency domain, compute the compression gains from the computed input spectrum and apply them to the signal, and then inverse transform to return to the time domain.

Overlap-Add

The frequency-domain approach requires that the input signal be divided into short segments for processing. The overlap-add technique is typically used for processing signal segments. The overlap-add procedure in the time domain using data windowing is illustrated in Figure 22. The input sequence is divided into segments each of length L samples. Each segment is multiplied by a smooth window, also of length L samples. The windowed segment is then convolved with the filter of length M samples. The output sequence from the convolution has a length of L+M-1 samples. A new windowed input segment is acquired every L/2 samples. The filtered output sequence is formed by summing all of the output segments that overlap for each L+M-1 sample interval.

Figure 22.

Overlap-add processing using windowed input data segments.

For frequency-domain processing, the input sequence is again divided into segments or blocks of length L samples. Each segment is windowed, and then extended with zeros to give a total length of L+M-1 or more samples. The filter impulse response is also padded with zeros to give the same length in samples as the windowed and padded input sequence. The Fourier transform of the zero-padded input segment and the zero-padded filter segment are then computed. Most often the fast Fourier transform (FFT) algorithm is used for these computations using a transform size given by a power of 2, although other procedures for computing the discrete Fourier transform or other transform sizes could also be used. The two transforms are then multiplied together to effect the filtering in the frequency domain, and the resultant frequency response is inverse transformed to return to the time domain. The filtered time-domain sequences are then combined across processing segments using the overlap-add procedure as shown in Figure 22.

Ideal Fast Fourier Transform System

A block diagram of a frequency-domain compressor is shown in Figure 23. The sampling rate has been set to 16 kHz and the FFT size set to 128 samples (8 msec) for purposes of illustration. The FFT implementation uses overlap-add processing as described in the paragraph above. The input fills a data buffer and is windowed and zero-padded. The FFT of the segment is calculated and the power spectrum is then computed from the FFT. For the 128-point FFT used in this example, the power spectrum is computed at a 125-Hz frequency spacing. Power estimates in the desired auditory frequency bands are computed by using individual FFT bins at low frequencies and combining FFT bins as the frequency is increased. The power spectrum is computed for each input block of data, and the levels are peak-detected to give the desired attack and release times at the block sampling rate.

Figure 23.

Block diagram of an ideal frequency-domain compression system using 128-point FFTs and a sampling rate of 16 kHz.

The compressor gains in each frequency band are computed for the FFT system in the same way as for the time-domain filter-bank approach. The gains are then interpolated in frequency to give a gain value for each FFT bin. The FFT of the input signal is then multiplied by the compressor gains to give the compressed signal in the frequency domain. The compressed signal is then inverse transformed to give the time sequence, and the sequences are combined using overlap-add.

For the overlap-add processing diagrammed in Figure 22, the length of the input segment and the length of the filter impulse response are both known. But, for the FFT compressor, the compression filter is designed in the frequency domain and the length of its impulse response is not known. The unknown compression filter length can lead to a problem known as temporal aliasing. The size of the FFT is set to 128 points in this example. For overlap-add processing, the size of the input sequence L (before zero-padding) and the length of the filter impulse response M must be chosen so that L+M-1 ≤ 128. For example, if the input segment has length L=64, then the compression filter must have length M ≤ 65. If M is longer than this amount, the total length of the convolution of the input segment with the filter will exceed the FFT size of 128 points, with the result that the samples in the convolution that lie beyond 128 samples will be wrapped around to the beginning and will be perceived as distortion. A solution to this problem is to compute the compression filter response in the frequency domain, inverse transform into the time domain, and then truncate the filter impulse response to 65 samples so that temporal aliasing will not occur. The truncated filter is then transformed back into the frequency domain to give the frequency response of a filter that approximates the desired gain-vs.-frequency characteristic but for which temporal aliasing has been eliminated.

Practical Fast Fourier Transform System

The FFT system with temporal aliasing eliminated requires a total of four FFTs: a forward FFT for the input segment, an inverse FFT for the compression gains, a forward FFT for the truncated compression impulse response, and an inverse FFT for the filtered segment. A practical digital hearing aid, in general, will not have the signal-processing capability to perform four FFTs. The DSP may not be fast enough, or the battery drain may be too great. Therefore a compromise approach is often used for FFT compression, as shown in Figure 24. Instead of the inverse transform of the compression gains, truncation, and the forward transform, the compression gains are smoothed in frequency. There is a relationship in signal processing that indicates that the shorter the impulse response, the smoother the frequency response. Thus smoothing the compression-gain frequency response is equivalent to an approximate truncation of the impulse response. The smoothing does not produce an exact truncation, so some residual temporal aliasing distortion is possible. A careful selection of the input segment length, FFT size, and frequency-domain smoothing will result in temporal aliasing distortion that cannot be perceived under most listening conditions.

Figure 24.

Block diagram of a practical frequency-domain compression system using 128-point FFTs and a sampling rate of 16 kHz.

The time delay of the FFT compressor depends on the size of the input buffer and the size of the FFT. The FFT cannot be computed until the input buffer is filled, so there is a processing delay while the input segment is accumulated. In addition, the compression frequency response is specified as a real number greater than zero in each frequency band. A frequency response that is pure real has a corresponding impulse response that is linear-phase. Thus, a set of compression gains for a 128-point FFT has a compression filter delay equal to 64 samples. If the filter is truncated to fewer than 128 samples, it will still be centered at 64 samples, but will have zeros at its beginning and end. Again, consider a system with a 16-kHz sampling rate and a FFT size of 128 points. If the windowed input segment length is 128 samples with an FFT computed every 64 samples, then the overall signal-processing delay will be 64 + 64 = 128 samples, or 8 msec.

The delay can be adjusted by changing the size of the input segment and/ or that of the FFT. A shorter input segment means that the input buffer will be filled sooner, with a corresponding reduction in the overall delay. However, a shorter input buffer means that the FFTs will have to be computed more often, and the processing capacity of the DSP or the battery drain will need to be increased. The other option is to use a smaller FFT. If the input buffer size is halved and the FFT size halved, then the delay will be also be halved without an increase in the computational or power requirements. However, the frequency resolution for a smaller FFT is reduced. For example, if the FFT size is reduced from 128 to 64 points, the frequency resolution will be 250 Hz instead of 125 Hz, and the low-frequency resolution of the compression system will be reduced.

An additional concern in an FFT compressor is quantization noise. The DSP chip in a hearing aid uses 16-bit fixed-point arithmetic to reduce the size of the circuit and to reduce power consumption. An FFT involves a large number of multiply-add operations, and for each calculation there is the possibility of round-off error. If the amplitude of the input signal is increased to reduce the relative magnitude of the round-off error, then there is a possibility of overflow because combining two numbers can result in a number twice as large. The FFT computation often involves scaling the input segment during the forward FFT calculation and compensatory scaling during the inverse FFT calculation to prevent overflow while minimizing the quantization noise, but even with these adjustments the quantization noise in a FFT compressor will be higher than for a FIR filter bank.

Side-Branch Compressor

The side-branch architecture (Williamson, Cummins, and Hecox, 1991) combines the low quantization noise of the FIR filter bank with the frequency resolution of the FFT system. The side-branch system, shown in Figure 25, separates the input signal filtering from the frequency analysis and calculation of the compression gains. The side-branch structure is therefore a digital multi-band cousin of the simple broadband compressor shown in Figure 7. Conversely, the broadband system corresponds to a side-band compressor with a 1-tap FIR filter that adjusts the overall gain of the input signal. Increasing the number of taps in the FIR filter allows for finer adjustment of the compressor frequency response at the expense of increased processing complexity and system delay. Another way of viewing the side-branch compressor is that it is an FFT system in which the compression filter is transformed into the time domain, with the filtering then performed via time-domain convolution rather than frequency-domain multiplication.

In the implementation shown in Figure 25, the input signal fills a K/2-sample buffer. The present K/2 samples are appended to the previous K/2 samples to give a total of K samples that are then windowed to give the input to the K-point FFT. The signal power spectrum is then computed from the FFT bins. As with the FFT compressor, individual bins are used at low frequencies and sums of adjacent bins used at high frequencies to give the desired auditory frequency bands. The frequency bands are then peak-detected, and the compressor gains as a function of frequency are then computed from the peak-detector outputs. The compression gains as a function of frequency are inverse transformed to give the impulse response of the compression filter. Because the gains as a function of frequency are real, the impulse response is linear-phase. The impulse response can be windowed if desired to smooth its frequency response. The K/2 most-recent input samples are then convolved with the K-point FIR filter to produce the output.

Figure 25.

Block diagram of the side-branch compressor structure.

The processing delay for the side-branch compressor is K/2 samples to fill the input buffer and K/2 samples for the linear-phase compression filter. This delay is exactly the same as for the FFT compressor described in the section above. The delay can be reduced by filtering each sample of the input signal as it is acquired, giving a total delay of K/2+1 samples. The disadvantage of this filtering approach is that the compression gains will lag the input signal by up to K/2 samples, increasing the chance of overshoot in the compressor output for a jump in the input signal level.

Frequency Warping

One of the problems in building a digital hearing aid is the frequency resolution. Digital frequency analysis inherently embodies a uniform frequency scale, while the ear embodies a critical-band frequency scale having increasing bandwidth with increasing frequency. A second concern is the trade-off between system delay and frequency resolution. Increased frequency resolution in any of the systems described in the previous sections requires longer FIR filters or larger FFTs. Digital frequency warping provides a technique for approximating the frequency resolution of the ear while reducing the overall time delay compared to other approaches (Kates, 2003; Kates and Arehart, 2005).

Digital Frequency Warping

Digital frequency warping is achieved by replacing the unit delays in a digital filter with first-order all-pass filters (Oppenheim et al., 1971; Smith and Abel, 1999; Härmä, et al., 2000). The all-pass filter has one zero and one pole, and is given by

A (z) = \frac{z^{- 1} - a}{1 - {az}^{- 1}}

(6)

where a is the warping parameter and z¹ is the unit delay operator. In the time domain the all-pass filter becomes:

y (n) = - ax (n) + x (n - 1) + ay (n - 1)

(7)

where x(n) is the filter input and y(n) is the filter output. The value for the warping parameter that gives a closest fit to the Bark frequency scale is a=0.576 for a 16-kHz sampling rate (Smith and Abel, 1999). The group delay for this choice of parameters is illustrated in Figure 26 for a single all-pass filter. The delay at low frequencies exceeds one sample, while the delay at high frequencies is less than one sample.

Figure 26.

Group delay in samples for a single all-pass filter having the warping parameter a=0.576.

The warped FIR filter transfer function is the weighted sum of the outputs of each all-pass section:

B (z) = \sum_{k = 0}^{K} b_{k} A^{k} (z)

(8)

for a filter having K+1 taps (K all-pass sections). A warped FIR filter is compared with the conventional filter in Figure 27. Forcing the real filter coefficients {b_k} to have even symmetry for an unwarped FIR filter yields a linear-phase filter, in which the filter delay is independent of the coefficients as long as the symmetry is preserved. If the unwarped FIR filter has K+1 taps, the delay is K/2 samples. Similarly, forcing even symmetry for the coefficients of a warped FIR filter gives a filter having a fixed frequency-dependent group delay that is independent of the actual filter coefficient values (Kates and Arehart, 2005). If the warped FIR filter has K+1 taps, the group delay is K/2 times that of a single all-pass filter. This filter coefficient symmetry property ensures that no phase modulation is audible as the compressor changes gain in response to the incoming signal, and that phase localization cues are preserved in a binaural fitting.

Figure 27.

Comparison of a) conventional FIR filter structure with b) its frequency-warped equivalent.

Warped Compression System

A dynamic-range compression system using warped frequency analysis is presented in Figure 28. The basic design is similar to the Side-Branch compressor shown in Figure 25. The compressor combines a warped FIR filter and a warped FFT. The same tapped delay line is used for both the frequency analysis and the FIR compression filter. The incoming signal x(n) is passed through a cascade of first-order all-pass filters of the form given by Eq (6), with the output of the k^th all-pass stage given by p_k(n). The sequence of delayed samples {p_k(n)} is then windowed, and an FFT calculated using the windowed sequence. The result of the FFT is a spectrum sampled at a constant spacing on a Bark frequency scale. The algorithm can be implemented on a sample-by-sample basis or using block data processing. Block processing is typically used, with the FFT computed after a block of samples is read in and processed through the cascade of all-pass filters; the compression gains are therefore updated once per block.

Figure 28.

Block diagram of a side-branch compression system using frequency warping.

Because the data sequence is windowed, the spectrum is smoothed in the warped frequency domain, giving smoothly-overlapping frequency bands. The compression gains are then computed from the warped power spectrum for the auditory analysis bands. The compression gains are pure real numbers, so the inverse FFT to give the warped time-domain filter results in a set of filter coefficients that is real and has even symmetry. The system output is then calculated by convolving the delayed samples with the compression gain filter:

y (n) = \sum_{k = 0}^{K} g_{k} (n) p_{k} (n)

(9)

where {g_k(n)} are the compression filter coefficients.

In comparison with a conventional FIR system having the same FIR filter length, the warped compression system will require more computational resources because of the all-pass filters in the tapped delay line. However, in many cases the warped FIR filter will be shorter than the conventional FIR filter needed to achieve the same degree of auditory frequency resolution. A nine-band compressor, for example, requires a 31-tap conventional FIR filter but can be realized with a 15-tap warped FIR filter.

Simulation Results

Two compression systems were simulated for the performance evaluation. The systems operated at a 16-kHz sampling rate and were simulated in MATLAB using floating-point arithmetic. The first compressor is the Side-Branch system of Figure 25. For a short system delay, a 16-sample buffer is used for the block time-domain processing, and the signal is processed by a 31-tap FIR filter. The frequency analysis uses a 32-point FFT operating on the present and previous 16-point data segments. A window is used to provide adequate FFT smoothing at low frequencies, and overlapping FFT bins are summed to give the analysis bands at high frequencies. This system has a total of 9 analysis bands, with a low-frequency resolution of 500 Hz. The frequency resolution can be improved by increasing the FFT size, but the system delay will also be increased. The compression gains are calculated in the frequency domain, and the gains inverse transformed to give the symmetric compression filter used to modify the incoming signal.

The second compressor is the warped FIR side-branch system of Figure 28 in which a 16-sample data buffer and a 32-point FFT are used in conjunction with a 31-tap warped FIR filter. This compressor is essentially the frequency-warped version of the Side-Branch compressor of Figure 25. The input data segment is windowed with a 32-point Hann window, and no frequency-domain smoothing is applied to the spectrum. The compression gains are smoothed by applying a 31-point Hann window to the compression filter after the gain values are transformed into the time domain. This system is termed the Warp-31 compressor.

The Warp-31 compressor provides frequency analysis with a separation of approximately 1.3 Bark. There are a total of 17 bands covering the positive frequencies, including 0 and π radians. The low-frequency bands are approximately spaced at multiples of 135 Hz, with the spacing increasing to 1800 Hz at the highest frequency. The Side-Branch compressor using the 32-point FFT, on the other hand, uses the output of the FFT to approximate frequency bands on a Bark scale. The limited resolution of the short FFT with its uniform 500 Hz bin spacing causes a poor match between the Side-Branch frequency bands and the Bark band spacing at low frequencies. At high frequencies, however, FFT bins can be combined to give a reasonably good match. To achieve the same low-frequency resolution as the Warp-31 system, the Side-Branch compressor requires an FFT size of 128 points, which gives a bin spacing of 125 Hz.

The overall system processing group delay is due to several factors. Certain aspects of the overall system delay, such as the A/D and D/A converter delays, are fixed by the hardware and are not affected by the signal processing. The total software processing delay is the sum of the time required to fill the input buffer, the group delay inherent in the frequency-domain or time-domain filtering operation provided by the compressor, and the time needed to execute the code before the output signal is available.

The Side-Branch compressor uses a linear-phase FIR filter, so the delay is independent of frequency. The Warp-31 compressor uses all-pass filters to replace the unit delays in the FIR filter implementation, so this system has a frequency-dependent delay. The total delay for the Warp-31 compressor is an estimate, assuming that the hardware delays and the time needed for the code execution will be similar to that needed for the Side-Branch compressor, with an additional allowance for the all-pass filters. The delay values for the 32-point FFT version of the Side-Branch compressor are based on measurements of an actual hearing aid, and assume 2.5 msec for the hardware and code execution and 1.0 msec for the 16-sample input buffer.

The group delay for the compression systems is plotted in Figure 29. The Side-Branch system has a constant delay as a function of frequency because of the linear-phase filters used for the processing. The delay is 3.5 msec for an FFT size of 32 points, and increases to 10.5 msec when the FFT size is increased to 128 points. The Warp-31 system has a smooth frequency-dependent delay due to the group-delay characteristics of the all-pass filters used for the warped FIR filtering. The maximum delay for the Warp-31 compressor is 6.1 msec at 0 Hz, with the delay falling to 2.9 msec at high frequencies. Thus, the Warp-31 compressor has delay characteristics similar to those of the Side-Branch system with a 32-point FFT, while providing frequency resolution that can only be achieved when a 128-point FFT with its much greater delay is used. The warped compressor thus has substantially reduced delay in comparison with a conventional design having comparable frequency resolution.

Figure 29.

Group delay for the frequency-warped compressor and two versions of the side-branch compressor.

Summary

Multi-channel dynamic-range compression is a basic part of digital hearing aids. The design of a digital compressor involves many considerations, including frequency resolution, processing group delay, quantization noise, and algorithm complexity. This paper has focused on the issues of frequency resolution and group delay. Quantization noise and processing complexity will also influence the choice of algorithm and the details of the implementation, but were considered to be beyond the scope of this paper.

The compression algorithms are directly characterized by the frequency-analysis procedure. Of the various approaches that can be used to design a digital hearing aid, this paper considered broadband compression, multi-channel filter banks, a frequency-domain compressor using the FFT, the side-branch design that separates the filtering operation from the frequency analysis, and the frequency-warped version of the side-branch approach that modifies the analysis frequency spacing to more closely match auditory perception.

The design of the compressor also influences the design of the other aspects of the hearing aid. Because of the limited processing capacity in a digital hearing aid, it is necessary not to duplicate processing steps. Therefore, the frequency analysis used for the multi-channel compression also forms the basis for the other processing, such as noise reduction, that may be included. The properties of frequency resolution and group delay described for the compressor approaches will apply to these other algorithms as well. The frequency resolution and group delay thus become fundamental properties of the hearing aid.

References

Agnew

Thornton

. Just noticeable and objectionable group delays in digital hearing aids. J Am Acad Audiol 11: 330–336, 2000.

ANSI S3.22–1996. Specification of hearing-aid characteristics. New York: American National Standards Institute.

Blauert

Laws

. Group delay distortions in electroacoustic systems. J Acoust Soc Am 63: 1478–1483, 1978.

Braida

Durlach

Lippmann

Hicks

Rabinowitz

Reed

. Hearing aids: A review of past research on linear amplification, amplitude compression, and frequency lowering. ASHA Monograph No. 10, Rockville, MD: Am Speech-Lang-Hear Assn, 1979.

Brennan

Schneider

. A flexible filterbank structure for extensive signal manipulations in digital hearing aids. Proc 1998 IEEE Int Symp Circuits and Systems, pp 569–572, 1998.

Chau

Sheikhzadeh

Brennan

. Complexity reduction and regularization of a fast affine projection algorithm for oversampled subband adaptive filters. Proc 2004 Int Conf Acoust Speech and Sig proc. Montreal, May 17–21, 2004.

Cole

. Current design options and criteria for hearing aids. J Speech-Lang Path and Audiol Monogr Suppl 1, pp 7–14, 1993.

Dillon

Katsch

Byrne

Ching

Keidser

Brewer

. The NAL-NL1 prescription procedure for non-linear hearing aids. National Acoustics Laboratories Research and Development, Annual Report 1997/98 (pp. 4–7), 1997. Chatswood NSW, Australia: National Acoustics Laboratories, 1998.

Greer

. Monaural sensitivity to dispersion in impulses and speech. PhD Thesis, University of Utah, 1975.

10.

Härmä

Karjalainen

Savioja

Välimäki

Laine

Huopaniemi

. Frequency-warped signal processing for audio applications. J Audio Eng Soc 48: 1011–1031, 2000.

11.

Kates

. A computer simulation of hearing aid response and the effects of ear canal size. J Acoust Soc Am 83: 1952–1963, 1988.

12.

Kates

. Signal processing for hearing aids. In Applications of Digital Signal Processing to Audio and Acoustics, ed. by Kahrs

Brandenberg

., Kluwer Acad Pubs: Boston, 1998.

13.

Kates

. Dynamic-range compression using digital frequency warping. Proc 37th Asilomar Conf on Signals, Systems, and Computers, Nov. 9–12, 2003, Asilomar Conf Ctr, Pacific Grove, CA, 2003.

14.

Kates

Arehart

. Multi-channel dynamic range compression using digital frequency warping. Eurasip J Appl Sig Proc 2005.

15.

Lindemann

. The continuous frequency dynamic range compressor. Proc 1997 IEEE Workshop on Applications of Sig Proc To Audio and Acoust, New Paltz, NY, 1997.

16.

Lyons

. Understanding Digital Signal Processing, Englewood Cliffs, N.J.: Prentice Hall, 1996.

17.

Moore

BCJ

Glasberg

. Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. J Acoust Soc Am 74: 750–753, 1983.

18.

Oppenheim

Johnson

Steiglitz

. Computation of spectra with unequal resolution using the fast Fourier transform. Proc IEEE 59: 299–300, 1971.

19.

Oppenheim

Schafer

. Digital Signal Processing, Englewood Cliffs, N.J.: Prentice-Hall, 1975.

20.

Rosen

Howell

. Signals and Systems for Speech and Hearing, London: Academic Press, 1991.

21.

Schweitzer

. Digital developments in hearing aid systems. Trends in Amplification 2: 38–87, 1997.

22.

Smith

Abel

. Bark and ERB bilinear transforms. IEEE Trans Speech and Audio Proc 7: 697–708, 1999.

23.

Souza

. Effects of compression on speech acoustics, intelligibility, and sound quality. Trends in Amplification 6: 131–165, 2002.

24.

Stone

Moore

BCJ

. Tolerable hearing aids delays. I: Estimation of limits imposed by the auditory path alone using simulated hearing losses. Ear and Hearing 20: 182–192, 1999.

25.

Stone

Moore

BCJ

. Tolerable hearing aids delays. II: Estimation of limits imposed during speech production. Ear and Hearing 23: 325–338, 2002.

26.

Stone

Moore

BCJ

. Tolerable hearing aids delays. III: Effects on speech production and perception of across-frequency variation in delay. Ear and Hearing 24: 175–183, 2003.

27.

Williamson

Cummins

Hecox

. Adaptive programmable signal processing and filtering for hearing aids. U.S. Patent 5,027,410, issued 25 June 1991.

28.

Zwicker

Terhardt

. Analytical expressions for critical-band rate and critical bandwidth as a function of frequency. J Acoust Soc Am 68: 1523–1525, 1980.