Theses Doctoral

Decoding auditory attention from neural representations of glimpsed and masked speech

Raghavan, Vinay S.

Humans hold the remarkable capacity to attend to a single person’s voice when many people are talking. Nevertheless, individuals with hearing loss may struggle to tune into a single voice in these types of complex acoustic situations. Current hearing aids can remove background noise but are unable to selectively amplify a single person’s voice without first knowing to whom the listener aims to attend. Studies of multitalker speech perception have demonstrated an enhanced representation of attended speech in the neural responses of a listener, giving rise to the prospect of a brain-controlled hearing aid that uses Auditory Attention Decoding (AAD) algorithms to selectively amplify the target of the listener’s attention as decoded from their neural signals.

In this dissertation, I describe experiments using non-invasive and invasive electrophysiology that investigate the encoding and decoding of speech representations that inform our understanding of the influence of attention on speech perception and advance our progress toward brain-controlled hearing devices.

First, I explore the efficacy of AAD in improving speech intelligibility when switching attention between different talkers with data recorded non-invasively from listeners with hearing loss. I show that AAD can be effective at improving intelligibility for listeners with hearing loss, but current methods for AAD with non-invasive data are unable to detect changes in attention with sufficient accuracy or speed to improve intelligibility generally.

Next, I analyze invasive neural recordings to more clearly establish the boundary between the neural encoding of target and non-target speech during multitalker speech perception. In particular, I investigate whether speech perception can be achieved through glimpses, i.e. spectrotemporal regions where a talker has more energy than the background, or if the recovery of masked regions is also necessary. I find that glimpsed speech is encoded for both target and non-target talkers, while masked speech is encoded for only the target talker, with a greater response latency and distinct anatomical organization compared to glimpsed speech. These findings suggest that glimpsed and masked speech utilize separate encoding mechanisms and that attention enables the recovery of masked speech to support higher-order speech perception.

Last, I leverage my theory of the neural encoding of glimpsed and masked speech to design a novel framework for AAD. I show that differentially classifying event-related potentials to glimpsed and masked acoustic events is more effective than current models that ignore the dynamic overlap between a talker and the background. In particular, this framework enables more accurate and stable decoding that is quicker at identifying changes in attention and capable of detecting atypical uses of attention, such as divided attention or inattention. Together, this dissertation identifies key problems in the neural decoding of a listener’s attention, expands our understanding of the influence of attention on the neural encoding of speech, and leverages this understanding to design new methods for AAD that move us closer to the development of effective and intuitive brain-controlled hearing assistive devices.


  • thumnail for Raghavan_columbia_0054D_18287.pdf Raghavan_columbia_0054D_18287.pdf application/pdf 3.58 MB Download File

More About This Work

Academic Units
Electrical Engineering
Thesis Advisors
Mesgarani, Nima
Ph.D., Columbia University
Published Here
February 21, 2024