Theses Doctoral

Selective Audio Filtering for Enabling Acoustic Intelligence in Mobile, Embedded, and Cyber-Physical Systems

Xia, Stephen

We are seeing a revolution in computing and artificial intelligence; intelligent machines have become ingrained in and improved every aspect of our lives. Despite the increasing number of intelligent devices and breakthroughs in artificial intelligence, we have yet to achieve truly intelligent environments. Audio is one of the most common sensing and actuation modalities used in intelligent devices. In this thesis, we focus on how we can more robustly integrate audio intelligence into a wide array of resource-constrained platforms that enable more intelligent environments. We present systems and methods for adaptive audio filtering that enables us to more robustly embed acoustic intelligence into a wide range of real time and resource-constrained mobile, embedded, and cyber-physical systems that are adaptable to a wide range of different applications, environments, and scenarios.

First, we introduce methods for embedding audio intelligence into wearables, like headsets and helmets, to improve pedestrian safety in urban environments by using sound to detect vehicles, localize vehicles, and alert pedestrians well in advance to give them enough time to avoid a collision. We create a segmented architecture and data processing pipeline that partitions computation between embedded front-end platform and the smartphone platform. The embedded front-end hardware platform consists of a microcontroller and commercial-off-the shelf (COTS) components embedded into a headset and samples audio from an array of four MEMS microphones. Our embedded front-end platform computes a series of spatiotemporal features used to localize vehicles: relative delay, relative power, and zero crossing rate. These features are computed in the embedded front-end headset platform and transmitted wirelessly to the smartphone platform because there is not enough bandwidth to transmit more than two channels of raw audio with low latency using standard wireless communication protocols, like Bluetooth Low-Energy. The smartphone platform runs machine learning algorithms to detect vehicles, localize vehicles, and alert pedestrians. To help reduce power consumption, we integrate an application specific integrated circuit into our embedded front-end platform and create a new localization algorithm called angle via polygonal regression (AvPR) that combines the physics of audio waves, the geometry of a microphone array, and a data driven training and calibration process that enables us to estimate the high resolution direction of the vehicle while being robust to noise resulting from movements in the microphone array as we walk the streets.

Second, we explore the challenges in adapting our platforms for pedestrian safety to more general and noisier scenarios, namely construction worker safety sounds of nearby power tools and machinery that are orders of magnitude greater than that of a distant vehicle. We introduce an adaptive noise filtering architecture that allows workers to filter out construction tool sounds and reveal low-energy vehicle sounds to better detect them. Our architecture combines the strengths of both the physics of audio waves and data-driven methods to more robustly filter out construction sounds while being able to run on a resource-limited mobile and embedded platform. In our adaptive filtering architecture, we introduce and incorporate a data-driven filtering algorithm, called probabilistic template matching (PTM), that leverages pre-trained statistical models of construction tools to perform content-based filtering. We demonstrate improvements that our adaptive filtering architecture brings to our audio-based urban safety wearable in real construction site scenarios and against state-of-art audio filtering algorithms, while having a minimal impact on the power consumption and latency of the overall system. We also explore how these methods can be used to improve audio privacy and remove privacy-sensitive speech from applications that have no need to detect and analyze speech.

Finally, we introduce a common selective audio filtering platform that builds upon our adaptive filtering architecture for a wide range of real-time mobile, embedded, and cyber-physical applications. Our architecture can account for a wide range of different sounds, model types, and signal representations by integrating an algorithm we present called content-informed beamforming (CIBF). CIBF combines traditional beamforming (spatial filtering using the physics of audio waves) with data driven machine learning sound detectors and models that developers may already create for their own applications to enhance and filter out specified sounds and noises. Alternatively, developers can also select sounds and models from a library we provide. We demonstrate how our selective filtering architecture can improve the detection of specific target sounds and filter out noises in a wide range of application scenarios. Additionally, through two case studies, we demonstrate how our selective filtering architecture can easily integrate into and improve the performance of real mobile and embedded applications over existing state-of-art solutions, while having minimal impact on latency and power consumption. Ultimately, this selective filtering architecture enables developers and engineers to more easily embed robust audio intelligence into common objects found around us and resource-constrained systems to create more intelligent environments.


  • thumnail for Xia_columbia_0054D_17384.pdf Xia_columbia_0054D_17384.pdf application/pdf 2.95 MB Download File

More About This Work

Academic Units
Electrical Engineering
Thesis Advisors
Jiang, Xiaofan
Ph.D., Columbia University
Published Here
August 10, 2022