Jihoon Kim et al. (2024), "Let There Be Sound: Reconstructing High Quality Speech from Silent Videos", Proc. AAAI
Yeonghyeon Lee et al. (2024), "VoiceLDM: Text-to-Speech with Environmental Context", Proc. ICASSP
Youngjoon Jang et al. (2023), "That's What I Said: Fully-Controllable Talking Face Generation", Proc. ACMMM
Suyeon Lee et al. (2024), "Seeing Through the Conversation: Audio-Visual Speech Separation based on Diffusion Model", Proc. ICASSP
Sooyoung Park et al. (2024), "Can CLIP Help Sound Source Localization?", Proc. WACV
Arda Senocak et al. (2023), "Sound Source Localization is All about Cross-Modal Alignment", Proc. ICCV