Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2016 Sep 9:6:33182.
doi: 10.1038/srep33182.

Acoustic diagnosis of pulmonary hypertension: automated speech- recognition-inspired classification algorithm outperforms physicians

Affiliations
Comparative Study

Acoustic diagnosis of pulmonary hypertension: automated speech- recognition-inspired classification algorithm outperforms physicians

Tarek Kaddoura et al. Sci Rep. .

Abstract

We hypothesized that an automated speech- recognition-inspired classification algorithm could differentiate between the heart sounds in subjects with and without pulmonary hypertension (PH) and outperform physicians. Heart sounds, electrocardiograms, and mean pulmonary artery pressures (mPAp) were recorded simultaneously. Heart sound recordings were digitized to train and test speech-recognition-inspired classification algorithms. We used mel-frequency cepstral coefficients to extract features from the heart sounds. Gaussian-mixture models classified the features as PH (mPAp ≥ 25 mmHg) or normal (mPAp < 25 mmHg). Physicians blinded to patient data listened to the same heart sound recordings and attempted a diagnosis. We studied 164 subjects: 86 with mPAp ≥ 25 mmHg (mPAp 41 ± 12 mmHg) and 78 with mPAp < 25 mmHg (mPAp 17 ± 5 mmHg) (p < 0.005). The correct diagnostic rate of the automated speech-recognition-inspired algorithm was 74% compared to 56% by physicians (p = 0.005). The false positive rate for the algorithm was 34% versus 50% (p = 0.04) for clinicians. The false negative rate for the algorithm was 23% and 68% (p = 0.0002) for physicians. We developed an automated speech-recognition-inspired classification algorithm for the acoustic diagnosis of PH that outperforms physicians that could be used to screen for PH and encourage earlier specialist referral.

PubMed Disclaimer

Figures

Figure 1
Figure 1
(ac) illustrate how we identified an area of interest that included the second heart sound (S2) using automated detection of the R and T waves in the ECG to an area of approximately 30% of the cardiac cycle on the phonocardiogram. The x –axis shows time in seconds and the y-axis is the relative amplitude of the signals. Figure 1a. Simultaneous phonocardiographic and electrocardiographic tracings. We illustrate simultaneous phonocardiographic and electrocardiographic tracings with an example of a 20 second recording of the phonocardiogram (top tracing) and electrocardiogram (lower tracing) in a patient with normal pulmonary artery pressures (mean PA pressure <25 mmHg). Automatically detected R-waves of the QRS complex are marked with an O, and T-waves are marked with an X. The second heart sound (S2) window is identified in the algorithm as 30% of the cardiac cycle around the T wave. Figure 1b. Simultaneous phonocardiographic and electrocardiographic tracings of a single cardiac cycle. The phonocardiographic and electrocardiographic tracing from a single cardiac cycle from the same subject and recording in Figure 1a,c. Identifying the window that included S2 on the phonocardiogram. There is a 0.25 second window around the T wave, which, identified the area of interest, used in the algorithm. The loudest signal in the boxed area was designated the second heart (S2) shown in the phonocardiogram at approximately 0.1 seconds on the x-axis.
Figure 2
Figure 2. Flow diagram of MFCC extraction process.
This flow diagram depicts the process for extraction of Mel-Frequency Cepstral Coefficients (MFCC) from the second heart sound (S2). The identified S2 is pre-processed with a pre-emphasis filter. It is divided into frames lasting 25 milliseconds and a Hamming window is applied to each frame. The following operations are then performed on each frame: (1) Fast Fourier Transform (FFT) (2) frequencies are linearly spaced into a mel frequency bank, (3) a logarithmic transformation is applied, and (4) a discrete cosine transformation is applied. These steps generate the mel-frequency cepstral coefficients that are used as feature vectors for the training and testing stages.
Figure 3
Figure 3. Flow diagram illustrating the training of the acoustic models.
The second heart sounds were extracted from the recordings as shown in Fig. 1. The process described in Fig. 2 obtained the Mel-Frequency Cepstral Coefficients (MFCC) feature vectors. The MFCC feature vectors for the subjects with PH (mean PA pressure ≥25 mmHg) were combined into one matrix, and the feature vectors for the subjects with normal PA pressures (mean PA pressure <25 mmHg) were combined into another matrix. A Gaussian Mixture Model (GMM) is fitted to the feature vector matrices, resulting in one model for all subjects with PH and one for all subjects without PH.
Figure 4
Figure 4. The receiver operating characteristic (ROC) curve for our algorithm to detect the presence or absence of PH.
The area-under-the-curve (AUC) was 0.74. The False Positive Rate (FPR)/True Positive Rate (TPR) point for the clinicians’ performance are also shown on the graph. The automated algorithm performs better than clinicians’ interpretation of the recorded heart sounds. X-axis shows the False Positive Rate, and the y-axis shows the True Positive Rate.
Figure 5
Figure 5. Comparison between the correct rate of the Gaussian Mixture Model (GMM) algorithm and other commonly used machine-learning algorithms.
The GMM-based algorithm has a higher correct rate than the other algorithms on our dataset.
Figure 6
Figure 6. Distribution of negative-log-likelihood ratio between the models.
PH group (left blue distribution) and non-PH group (right red distribution). A Gaussian curve is fitted for each distribution and can be seen overlaid on the distributions. The means for the PH and non-PH groups are 0.9 and 1.1 respectively, with a standard deviation of 0.1 for each group. x-axis shows the negative-log-likelihood ratios, and the y-axis shows the frequency.

Similar articles

Cited by

References

    1. Butrous G., Ghofrani H. A. & Grimminger F. Pulmonary vascular disease in the developing world. Circulation 118, 1758–1766, 10.1161/CIRCULATIONAHA.107.727289 (2008). - DOI - PubMed
    1. Rich S. & Herskowitz A. Targeting pulmonary vascular disease to improve global health: pulmonary vascular disease: the global perspective. Chest 137, 1S–5S, 10.1378/chest.09-2813 (2010). - DOI - PubMed
    1. Rich S., Dantzker D. R. & Ayres N. A. Primary pulmonary hypertension: a national prospective study. Ann Int Med 107, 216–223 (1987). - PubMed
    1. Thenappan T. et al.. Survival in pulmonary arterial hypertension: a reappraisal of the NIH risk stratification equation. Eur Respir J 35, 1079–1087, 10.1183/09031936.00072709 (2010). - DOI - PMC - PubMed
    1. Lau E. M. T., Humbert M. & Celermajer D. S. Early detection of pulmonary arterial hypertension. Nat Rev Cardiol 12, 143–155, 10.1038/nrcardio.2014.191 (2015). - DOI - PubMed

Publication types