Theses Doctoral

Neural mechanisms of attention and speech perception in complex, spatial acoustic environment

Patel, Prachi

We can hold conversations with people in environments where typically there are additional simultaneous talkers in background acoustic space or noise like vehicles on the street or music playing at a café on the sidewalk. This seemingly trivial everyday task is difficult for people with hearing deficits and is extremely hard to model in machines. This dissertation focuses on exploring the neural mechanisms of how the human brain encodes such complex acoustic environments and how cognitive processes like attention shapes processing of the attended speech. My initial experiments explore the representation of acoustic features that help us localize single sound sources in the environment- features like direction and spectrotemporal content of the sounds, and the interaction of these representations with each other. I play natural American English sentences coming from five azimuthal directions in space.

Using intracranial electrocorticography (ECoG) recordings from the human auditory cortex of the listener, I show that the direction of sound and the spectrotemporal content are encoded in two distinct aspects of neural response, the direction modulates the mean of the response and the spectrotemporal features contributes to the modulation of neural response around its mean. Furthermore, I show that these features are orthogonal to each other and do not interact. This representation enables successful decoding of both spatial and phonetic information. These findings contribute to defining the functional organization of responses in the human auditory cortex, with implications for more accurate neurophysiological models of spatial speech processing.

I take a step further to investigate the role of attention in encoding the direction and phonetic features of speech. I play a mixture of male and female spatialized talkers eg. male at left side to the listener and female at right side (talker’s locations switch randomly after each sentence). I ask the listener to follow a given talker e.g. follow male talker as they switch their location after each uttered sentence. While the listener performs this experiment, I collect intracranial EEG data from their auditory cortex. I investigate the bottom-up stimulus dependent and attention independent encoding of such a cocktail party speech and the top-down attention driven role in the encoding of location and speech features. I find a bottom-up stimulus driven contralateral preference in encoding of the mixed speech i.e. Left brain hemisphere automatically and predominantly encodes speech coming from right direction and vice-versa. On top of this bottom-up representation, I find that attended talker’s direction modulates the baseline of the neural response and attended talker’s voice modulates the spectrotemporal tuning of the neural response. Moreover, the modulation to attended talker’s location is present throughout the auditory cortex but the modulation to attended talker’s voice is present only at higher order auditory cortex areas. My findings provide crucially needed evidence to determine how bottom-up and top-down signals interact in the auditory cortex in crowded and complex acoustic scenes to enable robust speech perception. Furthermore, they shed light on the hierarchical encoding of attended speech that have implications on bettering the auditory attention decoding models.

Finally, I talk about a clinical case study where we show that electrical stimulation to specific sites in planum temporale (PT) of an epilepsy patient implanted with intracranial electrode leads to enhancement in speech in noise perception. When noisy speech is played with such an electrical stimulation, the patient perceives that the noise disappears, and that the speech is similar to clean speech that they hear without any noise. We performed series of analysis to determine functional organization of the three main sub regions of the human auditory cortex- planum temporale (PT), Heschl’s gyrus (HG) and superior temporal gyrus (STG). Using Cortico-Cortical Evoked Potentials (CCEPs), we modeled the PT sites to be located between the sites in HG and STG. Furthermore, we find that the discriminability of speech from nonspeech sounds increased in population neural responses from HG to the PT to the STG sites. These findings causally implicate the PT in background noise suppression and may point to a novel potential neuroprosthetic solution to assist in the challenging task of speech perception in noise.

Together, this dissertation shows new evidence for the neural encoding of spatial speech; interaction of stimulus driven, and attention driven neural processes in spatial multi-talker speech perception and enhancement of speech in noise perception by electrical brain stimulation.


  • thumnail for Patel_columbia_0054D_17702.pdf Patel_columbia_0054D_17702.pdf application/pdf 2.36 MB Download File

More About This Work

Academic Units
Electrical Engineering
Thesis Advisors
Mesgarani, Nima
Ph.D., Columbia University
Published Here
March 29, 2023