Sound Source Localization in the SH Domain

TDOABeamformingAdversarial MaskingICASSP 2025

The Challenge

The primary challenge is the accurate localization of a target audio source in the presence of strong directional interference, background noise, and reverberation. Traditional methods often fail when unwanted sounds mask the spatial cues of the target source.

My Role & Collaborators

As a research collaborator at IIT Kanpur, my core contribution was on the machine learning side. I developed the generative adversarial model (GAN) responsible for separating the 32-channel spherical microphone array audio into speech and noise components. I implemented the complete pipeline for generating the adversarial binary mask and integrated it with the downstream CNN. Additionally, I contributed to the research on the TDOA equations that informed the model's design. The experimental setup and data collection were handled by the dedicated research team at IITK.

Dr. Priyadarshini Dwivedi (Supervisor)

Our Solution

We introduced a novel Adversarial Masking (Adv-M) framework. The system first decomposes the signal from a spherical microphone array into the Spherical Harmonic (SH) domain. Then, a generative adversarial network is used to create a real-time binary mask that effectively filters out the SH components corresponding to the interfering source. The 'clean' SH features of the target source are then fed into a CNN, which accurately estimates its Direction of Arrival (DOA).

Key Techniques

Spherical Harmonic (SH) Decomposition for spatial feature extraction.
Adversarial Masking (Adv-M) using a GAN to separate target and interference signals.
Real-time binary mask generation.
Convolutional Neural Network (CNN) for DOA estimation from cleaned SH features.

Project Links

View Demo / Results Source Code

Results

The Adv-M framework demonstrated a significant leap in performance over existing methods. In both simulations and live lab experiments, our approach achieved over a 30% increase in localization accuracy and a 44% reduction in RMSE compared to traditional techniques, proving its robustness in challenging acoustic environments. This work has been accepted for presentation at the IEEE Asilomar Conference on Signals, Systems, and Computers, to be held in Pacific Grove, California, from October 27–29, 2025.

System Architecture

Sound Source Localization in the SH Domain Architecture