Publications


2022

  • Spell my name: Keyword boosted speech recognition
    N. Jung, G. Kim, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF

  • Multi-scale speaker embedding-based graph attention networks for speaker diarisation
    Y. Kwon, H. Heo, J. Jung, Y. Kim, B. Lee, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF

  • AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks
    J. Jung, H. Heo, H. Tak, H. Shim, J. S. Chung, B. Lee, H. Yu, N. Evans
    International Conference on Acoustics, Speech, and Signal Processing
    PDF

2021

  • Adapting Speaker Embeddings for Speaker Diarization
    Y. Kwon, J. Jung, H. Heo, Y. Kim, B. Lee, J. S. Chung
    Interspeech
    PDF

  • Three-class Overlapped Speech Detection using a Convolutional Recurrent Neural Network
    J. Jung, H. Heo, Y. Kwon, J. S. Chung, B. Lee
    Interspeech
    PDF

  • Look Who's Talking: Active Speaker Detection in the Wild
    Y. Kim, H. Heo, S. Choe, S. Chung, Y. Kwon, B. Lee, Y. Kwon, J. S. Chung
    Interspeech
    PDF | Dataset

  • Playing a Part: Speaker Verification at the Movies
    A. Brown, J. Huh, A. Nagrani, J. S. Chung, A. Zisserman
    International Conference on Acoustics, Speech, and Signal Processing
    PDF

  • The ins and outs of speaker recognition: lessons from VoxSRC 2020
    Y. Kwon, H. Heo, B. Lee, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF

  • Graph Attention Networks for Speaker Verification
    J. Jung, H. Heo, H. Yu, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF

  • Look who's not talking
    Y. Kwon, H. Heo, J. Huh, B. Lee, J. S. Chung
    IEEE Spoken Language Technology Workshop
    Best Paper Finalist
    PDF

  • Metric Learning for Keyword Spotting
    J. Huh, M. Lee, H. Heo, S. Mun, J. S. Chung
    IEEE Spoken Language Technology Workshop
    PDF

  • Cross attentive pooling for speaker verification
    S. Kye, Y. Kwon, J. S. Chung
    IEEE Spoken Language Technology Workshop
    PDF

  • Supervised attention for speaker recognition
    S. Kye, J. S. Chung, H. Kim
    IEEE Spoken Language Technology Workshop
    PDF

2020

  • Perfect Match: Self-Supervised Embeddings for Cross-modal Retrieval
    S. W. Chung, J. S. Chung, H. G. Kang
    Journal of Selected Topics in Signal Processing
    PDF

  • Augmentation adversarial training for self-supervised speaker recognition
    J. Huh, H. Heo, J. Kang, S. Watanabe, J. S. Chung
    Workshop on Self-Supervised Learning for Speech and Audio Processing, NeurIPS
    PDF

  • FaceFilter: Audio-visual speech separation using still images
    S. W. Chung, S. Choe, J. S. Chung, H. G. Kang
    Interspeech
    Best Student Paper Award
    PDF | Video

  • Seeing voices and hearing voices: learning discriminative embeddings using cross-modal self-supervision
    S. W. Chung, H. G. Kang, J. S. Chung
    Interspeech
    PDF

  • Spot the conversation: speaker diarisation in the wild
    J. S. Chung*, J. Huh*, A. Nagrani*, T. Afouras, A. Zisserman
    Interspeech
    PDF | Project page

  • Now you’re speaking my language: Visual language identification
    T. Afouras, J. S. Chung, A. Zisserman
    Interspeech
    PDF | Project page

  • In defence of metric learning for speaker recognition
    J. S. Chung, J. Huh, S. Mun, M. Lee, H. Heo, S. Choe, C. Ham, S. Jung, B. Lee, I. Han
    Interspeech
    PDF | Code

  • Self-supervised learning of audio-visual objects from video
    T. Afouras, A. Owens, J. S. Chung, A. Zisserman
    European Conference on Computer Vision
    PDF

  • BSL-1K: Scaling up co-articulated sign recognition using mouthing cues
    S. Albanie, G. Varol, L. Momeni, T. Afouras, J. S. Chung, N. Fox, A. Zisserman
    European Conference on Computer Vision
    PDF

  • Delving into VoxCeleb: environment invariant speaker recognition
    J. S. Chung*, J. Huh*, S. Mun
    Speaker Odyssey
    PDF

  • ASR is all you need: Cross-modal distillation for lip reading
    T. Afouras, J. S. Chung, A. Zisserman
    International Conference on Acoustics, Speech, and Signal Processing
    PDF

  • Disentangled Speech Embeddings using Cross-Modal Self-Supervision
    A. Nagrani*, J. S. Chung*, S. Albanie*, A. Zisserman
    International Conference on Acoustics, Speech, and Signal Processing
    PDF

  • The sound of my voice: speaker representation loss for target voice separation
    S. Mun, S. Choe, J. Huh, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF

2019

  • Deep Audio-Visual Speech Recognition
    T. Afouras*, J. S. Chung*, A. Senior, O. Vinyals, A. Zisserman
    IEEE Transactions on Pattern Analysis and Machine Intelligence
    PDF | Dataset

  • You said that? : Synthesising talking faces from audio
    A. Jamaludin*, J. S. Chung*, A. Zisserman
    International Journal of Computer Vision
    PDF

  • VoxCeleb: Large-scale speaker verification in the wild
    A. Nagrani*, J. S. Chung*, W. Xie, A. Zisserman
    Computer Speech and Language
    PDF

  • Who said that?: Audio-visual speaker diarisation of real-world meetings
    J. S. Chung, B. Lee, I. Han
    Interspeech
    PDF

  • My lips are concealed: Audio-visual speech enhancement through obstructions
    T. Afouras, J. S. Chung, A. Zisserman
    Interspeech
    PDF | Project page

  • Naver at ActivityNet Challenge 2019--Task B Active Speaker Detection (AVA)
    J. S. Chung
    International Challenge on Activity Recognition
    PDF

  • Utterance-level Aggregation For Speaker Recognition In The Wild
    W. Xie, A. Nagrani, J. S. Chung, A. Zisserman
    International Conference on Acoustics, Speech, and Signal Processing
    PDF | Project page

  • Perfect match: Improved cross-modal embeddings for audio-visual synchronisation
    S. W. Chung, J. S. Chung, H. G. Kang
    International Conference on Acoustics, Speech, and Signal Processing
    PDF | Model

2018

  • Learning to Lip Read Words by Watching Videos
    J. S. Chung, A. Zisserman
    Computer Vision and Image Understanding
    PDF

  • VoxCeleb2: Deep Speaker Recognition
    J. S. Chung*, A. Nagrani*, A. Zisserman
    Interspeech
    PDF | Dataset | Mirror

  • The Conversation: Deep Audio-Visual Speech Enhancement
    T. Afouras, J. S. Chung, A. Zisserman
    Interspeech
    PDF | Project page

  • Deep Lip Reading: a comparison of models and an online application
    T. Afouras, J. S. Chung, A. Zisserman
    Interspeech
    PDF | Project page

2017

  • VoxCeleb: a large-scale speaker identification dataset
    A. Nagrani*, J. S. Chung*, A. Zisserman
    Interspeech
    Best Student Paper Award
    PDF | Dataset | Mirror

  • Lip Reading in Profile
    J. S. Chung, A. Zisserman
    British Machine Vision Conference
    PDF

2016

  • Out of time: automated lip sync in the wild
    J. S. Chung, A. Zisserman
    Workshop on Multi-view Lip-reading, ACCV
    PDF | Project page

  • Lip Reading in the Wild
    J. S. Chung, A. Zisserman
    Asian Conference on Computer Vision
    Best Student Paper Award
    PDF | Dataset

  • Signs in time: Encoding human motion as a temporal image
    J. S. Chung, A. Zisserman
    Workshop on Brave New Ideas for Motion Representations, ECCV
    PDF | Video

Preprints and Technical Reports
  • K-Celeb: a collaborative approach to face dataset curation
    J. H. Bae, B. Bebensee, et al.
    Seoul National University
    PDF

  • VoxSRC 2020: The Second VoxCeleb Speaker Recognition Challenge
    A. Nagrani, J. S. Chung, J. Huh, A. Brown, E. Coto, W. Xie, M. McLaren, D. Reynolds, A. Zisserman
    arXiv:2012.06867
    PDF

  • VoxSRC 2019: The first VoxCeleb Speaker Recognition Challenge
    J. S. Chung, A. Nagrani, E. Coto, W. Xie, M. McLaren, D. Reynolds, A. Zisserman
    arXiv:1912.02522
    PDF

  • LRS3-TED: a large-scale dataset for visual speech recognition
    T. Afouras, J. S. Chung, A. Zisserman
    arXiv:1809.00496
    PDF | Dataset


KAIST logo