Rahul Peter

M.Sc. Student in Acoustics & Audio Technology

+91 84319 40842

Summary

M.Sc. student in Acoustics & Audio Technology at Aalto University, Finland. Combined expertise in audio machine learning, software engineering (Rust, Kotlin, Python), signal processing, and acoustics research. Currently collaborating with Savox and Otos on a personalized hearing-protection system for first responders (real-time DSP on Tympan).

Work Experience

Research Assistant, IIT Kanpur

June 2024 -- Sept 2024
  • Developed the Adversarial Masking (Adv-M) framework for robust sound source localization in the Spherical Harmonic (SH) domain, designing and implementing a GAN-based real-time binary mask generator to separate target speech from directional interference.
  • Built and integrated the full ML pipeline: SH decomposition, adversarial mask generation, and downstream CNN-based DOA estimation; contributed to deriving and incorporating TDOA-informed model design choices.
  • System demonstrated substantial gains (approximately 30% localization accuracy improvement and 44% RMSE reduction) and led to a paper accepted at the IEEE Asilomar Conference on Signals, Systems & Computers 2025.

Backend Software Engineer at Pine Labs

July 2023 -- June 2025
  • Backend Developer in the Online Payments Plural Team at PineLabs, collaborating with industry-leading merchants such as Flipkart, Amazon, and brands like Apple to build and integrate affordability and EMI features into the online payments platform, streamlining business processes.
  • Enhanced Svix webhook service performance by implementing custom message serialization in Rust, reducing latency by 34%.
  • Built parallel batch processing pipelines with exponential backoff for Dead Letter Queue handling.
  • Developed retry logic with circuit breaker patterns in Kotlin and secure payload verification for financial events.
  • Integrated metrics collection for delivery success rates and processing times.

Freelance Musician & Engineer

June 2021 -- Present
  • Session pianist, composer, and live FOH assistant.
  • Produced, mixed, and mastered tracks; contributed to open-source audio tools.

Projects

Multimodal Lyric Intelligibility (ICASSP Cadenza)

Ranked 2nd out of 30 teams (intrusive track) and invited to present the paper at ICASSP 2026: NCC=0.69, RMSE=0.265. Compact hybrid system combining audio features with text-level metrics to assess lyric intelligibility in mixtures.

  • Extracted ASR hypotheses (Whisper/Wav2Vec2/WavLM/Canary) and linguistic features (NLTK POS ratios, syllables/word), computed BERTScore against references, and fused scalar blocks with attention-pooled encoder embeddings.
  • Implemented vocal separation (Demucs) and computed intrusive perceptual metrics (STOI, Zimtohrli) to model acoustic quality and lexical alignment.
  • Emphasis on clean, reproducible pipelines for data processing, scoring, and model training.

Esperanto ASR Course Competition at Aalto

Ranked 3rd in class out of 15 teams. (WER=0.13, CER=0.04). Reproducible code: RP335/speech_rec_course_comp.

  • Built NeMo manifests and fine-tuned a Conformer-Transducer with SpecAugment and partial layer freezing; averaged top-5 checkpoints for stability.
  • Ensemble decoding with wav2vec2-large-xlsr-53-esperanto via NIST ROVER; applied rule-based phonetic post-processing (word-final devoicing) for consistent pronunciation.
  • Modular scripts for normalization, inference, and submission; demonstrates multilingual text handling, hypothesis scoring, and reproducible research code.

Personalized Hearing Protection for First Responders

Defined a smart audio module between radio/comms and headset to balance protection and intelligibility while preserving speech cues.

  • Real-time DSP pipeline (WDRC, noise reduction, safety limiter, smart mixing) targeting <10ms latency on Tympan Rev F.
  • Personalization using audiogram + PAR fit-check + noise dose tracking; planned validation with STOI, LAeq, latency, and field tests.

DCASE 2025 SELD Challenge

Conformer ensemble for sound event localization and detection using synthetic data from SpatialScaper. Technical report

Query-by-Vocal-Imitation (QVIM) Challenge

Top-3 team finisher using MobileNetV3, PANNs, PaSST, and BEATs transformers on VimSketch. Focus on query-to-item matching, embeddings, and retrieval. qvim-aes.github.io | QVIM-Aalto

Voiced/Non-Voiced Detection in Speech

Research project under Prof. Anurag Nishad on detecting voiced vs. non-voiced segments using iterative Variational Mode Decomposition (VMD) to extract fundamental frequency components and their envelopes. Evaluated against Empirical Mode Decomposition and wavelet-based baselines on CMU Arctic and NOISEX-92 datasets under varied noise conditions to quantify robustness. Details

Publications

  • Adversarial Masking Approach for Robust Source Localization in the SH Domain. Accepted to IEEE Asilomar Conference on Signals, Systems & Computers 2025. Session page
  • Class-Aware Hybrid Ensemble for Query-by-Vocal Imitation. Published as a late-breaking demo in the LBDP track at the 2025 AES International Conference on Machine Learning and Artificial Intelligence for Audio (AIMLA), London, UK. LBDP page

Education

2025 -- Present M.Sc., Acoustics and Audio Technology, Aalto University, Espoo, Finland
GPA: 4.44/5.00
2019 -- 2023 B.E., Electronics & Communications, BITS Pilani, K.K. Birla Goa Campus
(GPA: 8.14/10.00)

Skills

Audio / Research
Audio Machine Learning, Sound Event Localization, Spatial Audio Synthesis, Signal Processing
Software / Dev
Rust, Kotlin, Python, Docker, CI/CD, Team Collaboration
Tools / Frameworks
TensorFlow, PyTorch, SpatialScaper, JUCE, Git, AWS
Last updated: July 24, 2025