Abstract
We hypothesized that vibrations created by the pulmonary circulation would create sound like the vocal cords during speech and that subjects with pulmonary artery hypertension (PAH) might have a unique sound signature. We recorded heart sounds at the cardiac apex and the second left intercostal space (2LICS), using a digital stethoscope, from 27 subjects (12 males) with a median age of 7 years (range: 3 months–19 years) undergoing simultaneous cardiac catheterization. Thirteen subjects had mean pulmonary artery pressure (mPAp) < 25 mmHg (range: 8–24 mmHg). Fourteen subjects had mPAp ≥ 25 mmHg (range: 25–97 mmHg). We extracted the relative power of the frequency band, the entropy, and the energy of the sinusoid formants from the heart sounds. We applied linear discriminant analysis with leave-one-out cross validation to differentiate children with and without PAH. The significance of the results was determined with a
Keywords
Untreated pulmonary artery hypertension (PAH) is a progressive, fatal disease. 1 It complicates many conditions and may affect up to 100 million people worldwide.2,3 PAH is difficult to diagnose because symptoms appear late in the disease course and the findings on clinical examination are missed easily.
The finding on auscultation of a loud pulmonary component of the second heart sound (S2) in PAH has led to the exploration of phonocardiographic associations between S2 and pulmonary artery pressure (PAp) in the time domain.4–10 However, precise demarcation, timing, and segmentation of the components of S2 remain challenging.7-9,11-13
We have explored instead quantitative information in the frequency domain of heart sounds that distinguish between subjects with and without PAH. 14 The relative power of the frequencies between 21 and 22 Hz of the heart sounds recorded at the second left intercostal space (2LICS) was significantly reduced in subjects with PAH. 14 However, there was a 22% error in detecting PAH. Therefore, by investigating further the recordings in these same subjects, we sought to explore other features of the heart sounds in this specific frequency domain that might contain a unique feature that would identify subjects with PAH.
Normal speech patterns have a unique signature related to vocal cord vibration, which can be used, for example, to recognize a speaker as male or female. We hypothesized that vibrations created by the movement of the pulmonary valve leaflets or pulmonary artery would create sound in a manner similar to that of vocal cords during speech and that subjects with PAH might present a unique sound signature. The frequency resonance of sound is called a “formant.” Formants are concentrations of energy that are prominent in a sound spectrogram and collectively constitute the identifying frequency spectrum for a sound produced by speech.
The relative positioning of the first and second formants is usually sufficiently unique to distinguish speech sounds, thus imparting a special quality, or timbre. Therefore, we investigated whether the energy and entropy of the first formant of recorded heart sounds could distinguish subjects with PAH. We hypothesized that vibrations created by the pulmonary circulation would create sound, like the vocal cords during speech, and that subjects with PAH might have a unique sound signature.
METHODS
The University of Alberta Research Ethics Board approved the study. All subjects or their parents gave informed and written consent to participate in the study. Informed assent was obtained from children who were developmentally able to do so.
Clinical-data collection
We included consecutive subjects who were undergoing right heart catheterization as a requirement for managing their underlying cardiac condition. We excluded subjects with an abnormal or prosthetic valve.
The direct measurement of PAp, collected simultaneously with the heart sounds, was obtained with fluid-filled catheters in a standard manner. The heart sounds were recorded with a 3M Littmann 3200 electronic stethoscope (3M, Copenhagen, Denmark), which works in conjunction with Zargis Cardioscan software (Zargis Medical, Princeton, NJ) to store recorded heart sounds in *.wav mono audio format. Heart sound recordings were obtained over 20 seconds with a sampling frequency of 4,000 Hz. We recorded the heart sounds sequentially at the 2LICS and over the cardiac apical impulse. For signal analysis and optimization, MATLAB 2010b (MathWorks, Natick, MA) was used.
Definition of PAH
PAH, in adults and children, is defined as mean PAp (mPAp) ≥ 25 mmHg and pulmonary artery wedge pressure (PAWp) or left atrial pressure (LAp) ≤ 15 mmHg measured at cardiac catheterization in subjects at rest.15–17
Definition of entropy
We defined entropy as a measure of the disorder of the heart sound pattern. A lower entropy value suggests the existence of an organized heart sound pattern, while a higher entropy value indicates more disorder.
Heart sound analysis
We classified the subjects and their heart sound recordings into two groups based on whether their mPAp was ≥25 mmHg or >25 mmHg. All subjects had PAWp < 15 mmHg. We extracted three spectral features: the relative power, energy, and entropy of the first 4 sinusoids of the heart sound frequency bands. We undertook separability tests to discover which recording site (the cardiac apex or the 2LICS) was more informative for the diagnosis of PAH. We applied linear discriminant analysis (LDA) to the most informative feature, with the aim of distinguishing subjects with PAH.
Feature extraction
A main part of our data analysis was the extraction of a feature from the heart sounds that provided the highest prediction rate for PAH. As discussed above, we began this process by collecting 2 heart sound recordings from 2 separate sites (the 2LICS and the apex) for each subject. Then we determined which site was the more informative site for PAH prediction. Once this site was identified, we extracted features that identified PAH. We selected the feature that provided the highest prediction for PAH from among all of the identified features.
The features that were extracted from the optimal site were relative power of the frequency band, energy, and entropy. Below, we expand on the detailed descriptions of each feature.
Relative power of the frequency band
The relative power of a frequency band is obtained by dividing the power of the band by the total power. However, in this investigation the relative power was calculated by dividing the power of the 21–22-Hz frequency band by the power of the 1–80-Hz frequency band, as suggested by our previous work. 14
Sine wave heart sound replicas
Formants (unique heart sound signatures) were extracted by means of sine wave replicas, which distill the sound patterns down to key elements by removing extraneous noise. The audio track of each heart sound recording was transformed into sine wave replicas, as in speech analysis.
18
These sine wave replicas were transformed by tracking the frequencies and amplitudes of the first 4 formants as they varied over time. The acoustic measurements were obtained in a 2-step process. First, each sound file was resampled to 8 kHz. The resampled heart sound recording was then broken into 32-millisecond windows. Each window was subjected to an eighth-order linear-predictive-coding (LPC) analysis. LPC finds the coefficients of an eighth-order linear predictor (finite-impulse-response filter) that predicts the current value of the heart sound segment on the basis of past samples. The 4 coefficients with the highest magnitudes were converted to frequencies and stored in a data file. Each heart sound recording thus had an associated data file with 8 parameters (4 frequencies and 4 associated amplitudes) measured in each 32-millisecond window. This window captured information sufficient to track the change of the major formants in the original sound file over time. These data were submitted to a synthesis routine developed by Ellis,
19
which produced 4 sinusoidal tones that varied over time. We calculated the spectrogram (short-time Fourier transform) for each sinusoid, which we refer to as
Energy and entropy
Energy.
We calculated the energy of a sinusoid as the power of the spectrogram: energy
Entropy.
We calculated the entropy of a sinusoid as the power of the log-transformed spectrogram: entropy
LDA
We applied LDA to classify patients as either having PAH or normal, on the basis of the entropy of the first sinusoid formant of the heart sound. In studies where the sample size is small and cross validation is needed—such as our study—leave-one-out (LOO) is the only available method to estimate how accurately a predictive model will perform in a real practice setting. We assessed the classification performance with LDA through LOO cross validation to determine how the results of our statistical analysis would generalize to an independent heart sound data set.
Throughout the LOO process, each patient provided one case. Each training set was constructed by taking all cases except one, which was held out as a disjoint training set. For each training set, an LDA classifier was produced whose classification accuracy was determined on the single held-out test case. The average accuracy over all
Nonstationarity
Heart sounds are noisy and highly nonstationary. If a heart sound signal was stationary, one could use the entire signal to calculate the spectral features (relative power of the frequency band and energy and entropy of the first sinusoid formant). However, since they are nonstationary, such features can vary over time, making it no longer meaningful to estimate the features from the entire 20-second duration of the signal. Thus, to accommodate potential nonstationarity, we conducted a search over segments of the heart sound recordings to identify an appropriate window length,
Statistical tests
We calculated each spectral feature (relative power of the frequency band or energy or entropy of the first sinusoid formant) for each heart sound recording. As we had 27 subjects, each spectral-feature set contained 27 values (13 values from subjects with mPAp < 25 mmHg and 14 values from subjects with mPAP ≥ 25 mmHg).
To demonstrate significance of the mean and median of the samples within each spectral-feature set, we compared the values within each spectral-feature set by applying two tests: the 2-sample
Since we considered three different features and many different window lengths settings simultaneously, it is likely that a few
Two statistical measures were used for the output of the LDA analysis: sensitivity, which was calculated from the formula TP/(TP + FN), and specificity, which was calculated from the formula TN/(TN + FP), where TP is the number of true positives (PAH subjects detected as PAH subjects), FN is the number of false negatives (PAH subjects detected as normal-PAp subjects), TN is the number of true negatives (normal-PAp subjects detected as normal-PAp subjects), and FP is the number of false positives (normal-PAp subjects detected as PAH subjects).
RESULTS
We collected recordings from 27 subjects (12 males and 15 females) with a median age of 7 years (range: 3 months–19 years). Thirteen subjects (group 1) had mPAp < 25 mmHg (range: 8–24 mmHg), and 14 subjects (group 2) had mPAp ≥ 25 mmHg (range: 25–97 mmHg). All subjects had mean PAWp or LAp < 15 mmHg. We did not exclude any subjects or recordings from the analysis. The demographic and hemodynamic details of the subjects are summarized in Tables 1 and 2. The only statistically significant differences between the two groups were hemodynamic measurements reflecting the presence or absence of PAH. There was no difference in the PAWp, LAp, or cardiac index between the two groups. The two groups did not differ statistically by age, weight, height, body surface area, or body mass index (Tables 1, 2).
Summary of the demographic and hemodynamic data of all subjects
Note: PAp was measured during auscultation. PVRI is calculated from mean PAp measured at the time of thermodilution or oxygen consumption measurement and oximetry. BMI: body mass index; BSA: body surface area; BP: systemic blood pressure; LAp: left atrial pressure; PAp: pulmonary artery pressure; PVRI: pulmonary vascular resistance index; QPI: pulmonary blood flow index; RAp: right atrial pressure; WU · m2: Wood unit.
a For sex, the ratio is reported.
Comparison of clinical and hemodynamic data between subjects with pulmonary artery hypertension (mPAp ≥ 25 mmHg) and subjects with normal pulmonary artery pressures (mPAp < 25 mmHg)
Note: ECG: electrocardiogram; LAp: left atrial pressure; PAp: pulmonary artery pressure; PVRI: pulmonary vascular resistance index; RAp: right atrial pressure.
*
Relative power of the frequency band
The postcorrection overall
Window size analysis of three spectral features of the heart sounds at the second left intercostal space (2LICS): relative power of the frequency band 21–22 Hz, energy of first sinusoid formant, and entropy of first sinusoid formant
Note: Data are overall
Sine wave heart sound replicas
The middle panels of Figure 1 show examples of the spectrograms of the original heart sounds (

Heart sound recordings (
Feature selection (sinusoid choice) based on energy (Fig. 2)

Two-dimensional
We investigated which of the 4 sinusoid formants of the heart sounds was most informative. We found that the first sinusoid was the most informative feature for heart sounds collected at the 2LICS. The energy of the first sinusoid obtained from heart sound recordings at the 2LICS of subjects with mPAp ≥ 25 mmHg was higher than that of subjects with mPAp < 25 mmHg (overall
Entropy of the first sinusoid formant derived from the heart sounds (Fig. 3)

Box plot of the entropy of the first sinusoid formant extracted from the heart sounds recorded at the second left intercostal space. The left-hand box represents the entropy from the heart sounds of children with a mean pulmonary artery pressure (PAp) of 8–24 mmHg (
The entropy of the first sinusoid formant of the heart sounds recorded at the 2LICS of subjects with mPAp < 25 mmHg was significantly higher than that of subjects with mPAp ≥ 25 mmHg (overall
LDA
To ensure that the entropy of the first sinusoid formant of the heart sounds was the most informative feature in PAH, we conducted LDA on the recordings at the 2LICS. Table 4 shows that the entropy of the first sinusoid formant of the heart sounds incurred one false-positive and one false-negative result (Fig. 4). The sensitivity of 93% and specificity of 92% of the entropy of the first sinusoid formant of the heart sounds to detect PAH were superior to both the relative power of the frequency band 21–22 Hz and the energy of the first sinusoid formant (respectively, sensitivity: 71% and 71%, specificity: 69% and 92%).
Linear discriminant analysis error results, computed through LOO cross validation
Note: FN: false negative; FP: false positive; LOO: leave-one-out; PAH: pulmonary artery hypertension; PAp: pulmonary artery pressure; TN: true negative; TP: true positive.

Comparison of three spectral features of the heart sounds (
DISCUSSION
LOur main finding was that the entropy of the first sinusoid formant contained within an optimized 2-second window length of the heart sound recordings at the 2LICS was significantly lower in subjects with PAH (mPAp ≥ 25 mmHg), with a sensitivity of 93% and a specificity of 92% (Figs. 3, 4c). The reduced entropy of the heart sounds in subjects with PAH suggests the existence of an organized pattern within the heart sounds. A decrease in entropy suggests less chaos and more organization. We have found that the presence of mPAp > 25 mmHg imparts a unique signature to the heart sounds that is reflected by a decrease in entropy within the frequency band 21–22 Hz. Thus, within the frequency range of 21–22 Hz, the heart sounds of subjects with pulmonary hypertension are more organized (less entropy) and demonstrate a different pattern from those of subjects with normal PAp. This pattern could be captured by a noninvasive recording device and used to diagnose PAH. Figure 4 demonstrates that the energy and entropy of the first sinusoid formant were more informative than the relative power of the frequency band 21–22 Hz in distinguishing patients with PAH. 14 Moreover, the entropy of the first sinusoid formant provided better separability between subjects with PAH and those with normal PAp than the energy of the first sinusoid formant. The low entropy suggests that there was an ordered pattern in heart sounds of subjects with PAH that is clearly distinguishable (Fig. 5).

Three-second duration of heart sound recording (
A short recording time of 20 seconds for diagnostic-data acquisition is helpful in real-life clinical settings, particularly in a pediatric clinic when patient cooperation is unpredictable and of limited duration. We used recordings from a digital stethoscope (3M Littmann 3200 electronic stethoscope) and did not exclude any recordings. This suggests that this analysis of the heart sounds in the frequency domain is robust. We speculate that higher-fidelity heart sound sensors would improve the sensitivity and specificity value of the results.
We did not focus our analysis on the detection, timing, or splitting interval between the aortic and pulmonary components of S2. Although these are traditional clinical indicators of PAH, the analysis of the S2, particularly differentiating the aortic and pulmonary components of the S2 and the splitting interval, remains a significant challenge.6–9,11,12 Therefore, we concentrated on using hidden information within the frequency domain and improved on previous findings by using the relative power of the frequency band 21–22 Hz. 14 We have attempted to characterize the heart sounds of subjects with and without PAH in the same manner as speech patterns and to detect the unique signature of heart sounds in subjects with an increased PAp. This approach is advantageous because it is not necessary to register the timing of heart sound recordings with right heart or pulmonary artery events, which simplifies the approach to noninvasive diagnosis of PAH. It is interesting to note that the recording site that best distinguished patients with PAH was the 2LICS, which is the traditional area for auscultation of pulmonary artery events.
Study limitations
A larger sample size is needed to confirm the findings of this study. We acknowledge that, if this technique were to be applied to a true screening population with a different prevalence of PAH, this might decrease the sensitivity and specificity and that the positive and negative predictivity might be adversely affected.
Prospective recordings with the investigators blinded to the patients' diagnoses and in a population with a lower prevalence of PAH are required in future studies. However, the use of LDA and LOO cross validation to analyze the findings removes investigator bias considerably.
Conclusion
Our data, obtained with a digital stethoscope simultaneously with PAp measurements, showed that the entropy of the first sinusoid formant (within an optimized window length of 2 seconds within a 20-second recording of the heart sounds) at the 2LICS was significantly lower in subjects with PAH, yielding a classification sensitivity of 93% and a specificity of 92%. The reduced entropy of the first sinusoid formant of the heart sounds in subjects with PAH reveals an organized pattern in heart sounds. The analysis of this pattern reveals a unique sound signature produced by the hypertensive pulmonary artery and right ventricle that can be captured and potentially used to diagnose PAH.
