What is it about?

Arousal dimension refers to the activation level of the speaker, accompanied by changes in the physiological attributes, primarily in the respiratory pattern, subglottal air pressure and glottal vibration pattern . In general, high arousal speech is produced with higher respiration rate and increased subglottal air pressure. Studies have reported relationship between subglottal air pressure and glottal vibration characteristics. With increase in the subglottal air pressure, there is increase in the abruptness of the glottal closure, glottal closed phase quotient and the rate of glottal vibration. These observations are based on the intra oral flow and electroglottograph (EGG) measurements. It is difficult to extract these parameters directly from the speech signal. Apart from the voice quality parameters, spectral parameters derived from the speech signal such as spectral tilt, peak amplitude and shift of the first formant, ratio of high frequency to low frequency band energy, and harmonic behavior have been studied. A fundamental characteristic of the spectrum of a speech signal is that it is sound-dependent. These spectral parameters for high arousal speech are analyzed for various sound units with reference to neutral speech. Given a parameter set of the speech signal, it is still a challenge to identify the degree of arousal. In the current study, an approach for describing the high arousal regions is presented using the spectral characteristics derived from speech segments of length less than the local pitch period, i.e., at the subsegmental level, without having any neutral reference.

Featured Image

Why is it important?

The acoustic parameters of emotional speech are expressed relative to those of neutral speech. The emotion/expressive related tasks such as detection of shout speech, depressive speech, and speech-laughter were also analyzed w.r.t. neutral speech. Emotional speech is an expressive voice with combination of several voice qualities, such as arousal and rhythm. Focusing on various voice qualities might give important insights.


The proposed work in this paper gives important insights into analysis of expressive speech.

Gangamohan Paidi
International Institute of Information Technology Hyderabad

Read the Original

This page is a summary of: Subsegmental level analysis of high arousal speech using the zero-time windowing method, The Journal of the Acoustical Society of America, January 2019, Acoustical Society of America (ASA), DOI: 10.1121/1.5087816.
You can read the full text:



The following have contributed to this page