Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
M. H. Erol, A. Senocak, J. Feng, J. S. Chung
IEEE Signal Processing Letters PDF
Bridging the Gap between Audio and Text using Parallel-attention for User-defined Keyword Spotting
Y. Kim, J. Jung, J. Park, B. Kim, J. S. Chung
IEEE Signal Processing Letters PDF
Let Me Finish My Sentence: Video Temporal Grounding with Holistic Text Understanding
J. Woo, H. Ryu, Y. Jang, J. W. Cho, J. S. Chung
ACM International Conference on Multimedia PDF
VoxSim: A perceptual voice similarity dataset
J. Ahn, Y. Kim, Y. Choi, D. Kwak, J. Kim, S. Mun, J. S. Chung
Interspeech PDF
Lightweight Audio Segmentation for Long-form Speech Translation
J. Lee, S. Kim, H. Kim, J. S. Chung
Interspeech PDF
ElasticAST: An Audio Spectrogram Transformer for All Length and Resolutions J. Feng, M. H. Erol, J. S. Chung, A. Senocak
Interspeech PDF
To what extent can ASV systems naturally defend against spoofing attacks?
J. Jung, X. Wang, N. Evans, S. Watanabe, H. Shim, H. Tak, S. Arora, J. Yamagishi, J. S. Chung
Interspeech PDF
Disentangled Representation Learning for Environment-agnostic Speaker Recognition K. Nam, H. Heo, J. Jung, J. S. Chung
Interspeech PDF
FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional Flow Matching
C. Jung, S. Lee, J. Kim, J. S. Chung
Interspeech PDF
EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning
J. Kim, H. Lee, K. Rho, J. Kim, J. S. Chung
International Conference on Machine Learning PDF
Faces that Speak: Jointly Synthesising Talking Face and Speech from Text Y. Jang, J. Kim, J. Ahn, D. Kwak, H. Yang, Y. Ju, I. Kim, B. Kim, J. S. Chung
IEEE Conference on Computer Vision and Pattern Recognition PDF
Scaling Up Video Summarization Pretraining with Large Language Models D. M. Argaw, S. Yoon, F. C. Heilbron, H. Deilamsalehy, T. Bui, Z. Wang, F. Dernoncourt, J. S. Chung
IEEE Conference on Computer Vision and Pattern Recognition PDF
Towards Automated Movie Trailer Generation D. M. Argaw, M. Soldan, A. Pardo, C. Zhao, F. C. Heilbron, J. S. Chung, B. Ghanem
IEEE Conference on Computer Vision and Pattern Recognition PDF
SlowFast Network for Continuous Sign Language Recognition
J. Ahn, Y. Jang, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing PDF
Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification
H. Heo, K. Nam, B. Lee, Y. Kwon, M. Lee, Y. J. Kim, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing PDF
Speech Guided Masked Image Modeling for Visually Grounded Speech
J. Woo, H. Ryu, A. Senocak, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing PDF
VoxMM: Rich Transcription of Conversations in the Wild
D. Kwak, J. Jung, K. Nam, Y. Jang, J. Jung, S. Watanabe, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing PDF
From Coarse To Fine: Efficient Training for Audio Spectrogram Transformers J. Feng, M. H. Erol, J. S. Chung, A. Senocak
International Conference on Acoustics, Speech, and Signal Processing PDF
VoiceLDM: Text-to-Audio Generation with Linguistic Content
Y. Lee, I. Yeon, J. Nam, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing PDFProject page
TalkNCE: Improving Active Speaker Detection with Talking-Aware Contrastive Learning
C. Jung, S. Lee, K. Nam, K. Rho, Y. J. Kim, Y. Jang, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing PDF
Seeing Through the Conversation: Audio-Visual Speech Separation based on Diffusion Model
S. Lee, C. Jung, Y. Jang, J. Kim, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing PDF
Let There Be Sound: Reconstructing High Quality Speech from Silent Videos J. Kim, J. Kim, J. S. Chung
AAAI Conference on Artificial Intelligence PDFProject page
Can CLIP Help Sound Source Localization? S. Park, A. Senocak, J. S. Chung
Winter Conference on Applications of Computer Vision PDF
2023
That's What I Said: Fully-Controllable Talking Face Generation Y. Jang, K. Rho, J. Woo, H. Lee, J. Park, Y. Lim, B. Kim, J. S. Chung
ACM International Conference on Multimedia PDFProject page
Sound Source Localization is All about Cross-Modal Alignment A. Senocak, H. Ryu, J. Kim, T. Oh, H. Pfister, J. S. Chung
International Conference on Computer Vision PDF
Disentangled Representation Learning for Multilingual Speaker Recognition K. Nam, Y. Kim, J. Huh, H. Heo, J. Jung, J. S. Chung
Interspeech PDFProject page
Curriculum learning for self-supervised speaker verification
H. Heo, J. Jung, J. Kang, Y. Kwon, B. Lee, Y. J. Kim, J. S. Chung
Interspeech PDF
Self-sufficient framework for continuous sign language recognition Y. Jang, Y. Oh, J. W. Cho, M. Kim, D. Kim, I. S. Kweon, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing PDFProject page
Metric learning for user-defined keyword spotting
J. Jung, Y. Kim, J. Park, Y. Lim, B. Kim, Y. Jang, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing PDFProject page
Hindi as a second language: improving visually grounded speech with semantically similar samples H. Ryu, A. Senocak, I. S. Kweon, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing PDF
MarginNCE: Robust Sound Localization with a Negative Margin S. Park, A. Senocak, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing PDF
Advancing the dimensionality reduction of speaker embeddings for speaker diarisation: disentangling noise and informing speech activity
Y. J. Kim, H. Heo, J. Jung, Y. Kwon, B. Lee, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing PDF
In search of strong embedding extractors for speaker diarisation
J. Jung, B. Lee, J. Huh, A. Brown, Y. Kwon, S. Watanabe, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing PDF
Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech
J. Lee, J. S. Chung, S. Chung
International Conference on Acoustics, Speech, and Signal Processing PDF
2022
Signing Outside the Studio: Benchmarking Background Robustness for Continuous Sign Language Recognition Y. Jang, Y. Oh, J. W. Cho, D. Kim, J. S. Chung, I. S. Kweon
British Machine Vision Conference PDFProject page
Augmentation adversarial training for self-supervised speaker representation learning
J. Kang, J. Huh, H. Heo, J. S. Chung
Journal of Selected Topics in Signal Processing PDF
Pushing the limits of raw waveform speaker recognition
J. Jung, Y. J. Kim, H. Heo, B. Lee, Y. Kwon, J. S. Chung
Interspeech PDF
Spell my name: Keyword boosted speech recognition
N. Jung, G. Kim, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing PDF
Multi-scale speaker embedding-based graph attention networks for speaker diarisation
Y. Kwon, H. Heo, J. Jung, Y. J. Kim, B. Lee, J. S. Chung
International Conference on Acoustics, Speech, and Signal Processing PDF
AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks
J. Jung, H. Heo, H. Tak, H. Shim, J. S. Chung, B. Lee, H. Yu, N. Evans
International Conference on Acoustics, Speech, and Signal Processing PDF