Eliya Nachmani - Towards a Realistic Immersive 3D Audio Generation

Recent advancements in audio and language processing have yielded significant progress in audio analysis and synthesis. In the realm of audio analysis, researchers are addressing the crucial challenges of Automatic Speech Recognition (ASR), Sound Localization, Event Detection, Emotion Recognition, Speaker Diarization, and Speaker Identification. Meanwhile, in the synthesis domain efforts are focused on Speech Synthesis, Speech Separation, Audio Vocoders, and Speech-Bots. Despite the progress made, there remains a significant void in the advancement of neural audio generative models that possess the capability to understand audio landscapes and skillfully create or improve new auditory surroundings. In this talk, I will address two pivotal research directions aimed at closing this gap: 

 

(i) The development of an oracle-powered speechbot involves achieving a profound understanding of the acoustic environment and integrating comprehensive world knowledge. I'll present Spectron, a speechbot that leverages a Large Language Model (LLM) to perform question answering (QA) and speech continuation.

 

(ii) The second challenge revolves around audio separation for a multitude of sources. While current audio separation literature predominantly focuses on isolating single-source domains like speech or sound events, the real-world scenario demands the separation of diverse sources such as speech, noise, and acoustic events. I will present a solution capable of separating numerous speakers based on a single microphone recording as well as a theoretical upper bound for the single channel speech separation.

 

Concluding the discussion, I will outline future research directions, focusing on the evolution of multi-agent speechbots, the advancement of generative audio models within the 3D domain, and the fusion of synthetic sounds into real-world environments.

Date and Time: 
Thursday, February 8, 2024 - 13:30 to 14:30
Speaker: 
Eliya Nachmani
Location: 
A208
Speaker Bio: 

Eliya Nachmani currently serves as a research scientist at Google Research, specializing in machine learning for audio processing. Prior to his role at Google, he conducted research at Facebook AI Research (FAIR) and pursued his Ph.D. at Tel-Aviv University. Eliya holds a Master of Science in Electrical Engineering from Tel-Aviv University and a Bachelor of Science in Electrical Engineering from the Technion. Websitehttps://sites.google.com/view/eliya-nachmani/home