Multimodal AI Lab

School of Electrical Engineering, KAIST

Announcements

We are looking for motivated students in machine learning, speech processing and computer vision. Please read this page for more information.

Recent highlights

Speech generation from silent videos

Jihoon Kim et al. (2024), "Let There Be Sound: Reconstructing High Quality Speech from Silent Videos", Proc. AAAI

Text-to-Speech with environmental context


Yeonghyeon Lee et al. (2024), "VoiceLDM: Text-to-Speech with Environmental Context", Proc. ICASSP

Talking face synthesis


Youngjoon Jang et al. (2023), "That's What I Said: Fully-Controllable Talking Face Generation", Proc. ACMMM

Audio-visual speech separation


Suyeon Lee et al. (2024), "Seeing Through the Conversation: Audio-Visual Speech Separation based on Diffusion Model", Proc. ICASSP

Audio-visual sound source localization


Sooyoung Park et al. (2024), "Can CLIP Help Sound Source Localization?", Proc. WACV

Audio-visual image search


Arda Senocak et al. (2023), "Sound Source Localization is All about Cross-Modal Alignment", Proc. ICCV


KAIST logo