Publications


2024

  • Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
    M. H. Erol, A. Senocak, J. Feng, J. S. Chung
    IEEE Signal Processing Letters
    PDF
  • Bridging the Gap between Audio and Text using Parallel-attention for User-defined Keyword Spotting
    Y. Kim, J. Jung, J. Park, B. Kim, J. S. Chung
    IEEE Signal Processing Letters
    PDF
  • Let Me Finish My Sentence: Video Temporal Grounding with Holistic Text Understanding
    J. Woo, H. Ryu, Y. Jang, J. W. Cho, J. S. Chung
    ACM International Conference on Multimedia
    PDF
  • VoxSim: A perceptual voice similarity dataset
    J. Ahn, Y. Kim, Y. Choi, D. Kwak, J. Kim, S. Mun, J. S. Chung
    Interspeech
    PDF
  • Lightweight Audio Segmentation for Long-form Speech Translation
    J. Lee, S. Kim, H. Kim, J. S. Chung
    Interspeech
    PDF
  • To what extent can ASV systems naturally defend against spoofing attacks?
    J. Jung, X. Wang, N. Evans, S. Watanabe, H. Shim, H. Tak, S. Arora, J. Yamagishi, J. S. Chung
    Interspeech
    PDF
  • Disentangled Representation Learning for Environment-agnostic Speaker Recognition
    K. Nam, H. Heo, J. Jung, J. S. Chung
    Interspeech
    PDF
  • FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional Flow Matching
    C. Jung, S. Lee, J. Kim, J. S. Chung
    Interspeech
    PDF
  • EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning
    J. Kim, H. Lee, K. Rho, J. Kim, J. S. Chung
    International Conference on Machine Learning
    PDF
  • Faces that Speak: Jointly Synthesising Talking Face and Speech from Text
    Y. Jang, J. Kim, J. Ahn, D. Kwak, H. Yang, Y. Ju, I. Kim, B. Kim, J. S. Chung
    IEEE Conference on Computer Vision and Pattern Recognition
    PDF
  • Scaling Up Video Summarization Pretraining with Large Language Models
    D. M. Argaw, S. Yoon, F. C. Heilbron, H. Deilamsalehy, T. Bui, Z. Wang, F. Dernoncourt, J. S. Chung
    IEEE Conference on Computer Vision and Pattern Recognition
    PDF
  • Towards Automated Movie Trailer Generation
    D. M. Argaw, M. Soldan, A. Pardo, C. Zhao, F. C. Heilbron, J. S. Chung, B. Ghanem
    IEEE Conference on Computer Vision and Pattern Recognition
    PDF
  • SlowFast Network for Continuous Sign Language Recognition
    J. Ahn, Y. Jang, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification
    H. Heo, K. Nam, B. Lee, Y. Kwon, M. Lee, Y. J. Kim, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • Speech Guided Masked Image Modeling for Visually Grounded Speech
    J. Woo, H. Ryu, A. Senocak, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • VoxMM: Rich Transcription of Conversations in the Wild
    D. Kwak, J. Jung, K. Nam, Y. Jang, J. Jung, S. Watanabe, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • From Coarse To Fine: Efficient Training for Audio Spectrogram Transformers
    J. Feng, M. H. Erol, J. S. Chung, A. Senocak
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • VoiceLDM: Text-to-Audio Generation with Linguistic Content
    Y. Lee, I. Yeon, J. Nam, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF Project page
  • TalkNCE: Improving Active Speaker Detection with Talking-Aware Contrastive Learning
    C. Jung, S. Lee, K. Nam, K. Rho, Y. J. Kim, Y. Jang, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • Seeing Through the Conversation: Audio-Visual Speech Separation based on Diffusion Model
    S. Lee, C. Jung, Y. Jang, J. Kim, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • Let There Be Sound: Reconstructing High Quality Speech from Silent Videos
    J. Kim, J. Kim, J. S. Chung
    AAAI Conference on Artificial Intelligence
    PDF Project page

2023

  • That's What I Said: Fully-Controllable Talking Face Generation
    Y. Jang, K. Rho, J. Woo, H. Lee, J. Park, Y. Lim, B. Kim, J. S. Chung
    ACM International Conference on Multimedia
    PDF Project page
  • Sound Source Localization is All about Cross-Modal Alignment
    A. Senocak, H. Ryu, J. Kim, T. Oh, H. Pfister, J. S. Chung
    International Conference on Computer Vision
    PDF
  • Curriculum learning for self-supervised speaker verification
    H. Heo, J. Jung, J. Kang, Y. Kwon, B. Lee, Y. J. Kim, J. S. Chung
    Interspeech
    PDF
  • Self-sufficient framework for continuous sign language recognition
    Y. Jang, Y. Oh, J. W. Cho, M. Kim, D. Kim, I. S. Kweon, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF Project page
  • Metric learning for user-defined keyword spotting
    J. Jung, Y. Kim, J. Park, Y. Lim, B. Kim, Y. Jang, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF Project page
  • Hindi as a second language: improving visually grounded speech with semantically similar samples
    H. Ryu, A. Senocak, I. S. Kweon, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • MarginNCE: Robust Sound Localization with a Negative Margin
    S. Park, A. Senocak, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • Advancing the dimensionality reduction of speaker embeddings for speaker diarisation: disentangling noise and informing speech activity
    Y. J. Kim, H. Heo, J. Jung, Y. Kwon, B. Lee, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • In search of strong embedding extractors for speaker diarisation
    J. Jung, B. Lee, J. Huh, A. Brown, Y. Kwon, S. Watanabe, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech
    J. Lee, J. S. Chung, S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF

2022

  • Signing Outside the Studio: Benchmarking Background Robustness for Continuous Sign Language Recognition
    Y. Jang, Y. Oh, J. W. Cho, D. Kim, J. S. Chung, I. S. Kweon
    British Machine Vision Conference
    PDF Project page
  • Augmentation adversarial training for self-supervised speaker representation learning
    J. Kang, J. Huh, H. Heo, J. S. Chung
    Journal of Selected Topics in Signal Processing
    PDF
  • Pushing the limits of raw waveform speaker recognition
    J. Jung, Y. J. Kim, H. Heo, B. Lee, Y. Kwon, J. S. Chung
    Interspeech
    PDF
  • Spell my name: Keyword boosted speech recognition
    N. Jung, G. Kim, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • Multi-scale speaker embedding-based graph attention networks for speaker diarisation
    Y. Kwon, H. Heo, J. Jung, Y. J. Kim, B. Lee, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF
  • AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks
    J. Jung, H. Heo, H. Tak, H. Shim, J. S. Chung, B. Lee, H. Yu, N. Evans
    International Conference on Acoustics, Speech, and Signal Processing
    PDF

KAIST logo