What is it about?

Recently, a systematic comparison of existing speech intelligibility models for several spectro-temporal manipulations of speech maskers and gender combination of target and masker speakers [Schubotz et al. (2016). J. Acoust. Soc. Am. 140, 524-540] showed the importance of short-time power (intensity) features. Conversley, Jørgensen and colleagues [Jørgensen et al. (2013). J. Acoust. Soc. Am. 134, 436-446] demonstrated a higher predictive power of short-time envelope power (amplitude modulation domain) SNRs than power SNRs using reverberation and spectral subtraction. Here, the generalized power-spectrum model [GPSM; Biberger and Ewert (2016). J. Acoust. Soc. Am. 140, 1023-1038] that utilizes long-time power and short-time envelope power SNRs was extended to utilize power and envelope power SNRs on short-time scales, denoted as multi-resolution GPSM (mr-GPSM), and evaluated by using speech intelligibility experiments from Schubotz et al. (2016) and Jørgensen et al. (2013). Moreover, the suggested mr-GPSM was also evaluated by using the psychoacoustic test battery as proposed in Biberger and Ewert (2016).

Featured Image

Why is it important?

Combining short-time power (intensity) and envelope power SNRs in mr-GPSM lead to a more robust speech intelligibility prediction compared to the extended speech intelligibility index [ESII; Rhebergen et al. (2006). J. Acoust. Soc. Am. 120, 3988-3997] and multi-resolution speech-based envelope power-spectrum model (mr-sEPSM; Jørgensen et al., 2013) applying only a single metric. The use of power and envelope power SNRs reflecting energetic and amplitude modulation masking, respectively, might make the mr-GPSM a useful tool to assess the relative role of both types of masking. Moreover, the proposed mr-GPSM provides a comparable performance, with exception of the forward masking experiment, as the mre-GPSM (Biberger and Ewert, 2016) or the perception model [PEMO; Dau et al. (1997a,b). J. Acoust. Soc. Am. 102, 2892-2905, 2906-2919] for a large critical set of psychoacoustic experiments. Inclusion of a forward masking function as proposed by Ludvigsen [(1985). J. Acoust. Soc. Am. 78, 1271-1280] could make the model account for non-simultaneous masking.

Read the Original

This page is a summary of: The role of short-time intensity and envelope power for speech intelligibility and psychoacoustic masking, The Journal of the Acoustical Society of America, August 2017, Acoustical Society of America (ASA),
DOI: 10.1121/1.4999059.
You can read the full text:

Read

Contributors

The following have contributed to this page