VoxSRC Workshop 2022

Welcome to the VoxSRC Workshop 2022! The workshop includes presentations from the most exciting and novel submissions to the VoxCeleb Speaker Recognition Challenge (VoxSRC), as well as the announcement of the challenge winners.

The workshop was held in conjunction with Interspeech 2022.

VoxSRC 2022 was a hybrid workshop, with both in-person and virtual attendance options. It took place at 5pm KST on Thursday 22nd September 2022 (8am UTC), at Room 110 of Incheon Convensia. Please find our workshop report below.

You could see the information of all series of this workshop on this website.

VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge

Jaesung Huh, Andrew Brown, Jee-weon Jung, Joon Son Chung, Arsha Nagrani, Daniel Garcia-Romero, Andrew Zisserman

arXiv, 2023.

Bibtex | Abstract | PDF

                    @article{huh2023voxsrc,
  title={VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge},
  author={Huh, Jaesung and Brown, Andrew and Jung, Jee-weon and Chung, Joon Son and Nagrani, Arsha and Garcia-Romero, Daniel and Zisserman, Andrew},
  journal={arXiv preprint arXiv:2302.10248},
  year={2023}
}

Schedule

The workshop was held from 5:00pm - 8:00pm Korea Standard Time (KST).

5:00pm	Introduction of dataset and challenges [slides] [video]
5:25pm	Keynote speech : Junichi Yamagishi "The use of speaker embeddings in neural audio generation" [slides] [video]
6:15pm	10 min break
6:25pm	Announcement of Winners (Track 1,2)
6:30pm	Invited Talks from Track 1 and 2 [video]
7:10pm	Announcement of Winners (Track 3)
7:12pm	Invited Talks from Track 3 [video]
7:30pm	Announcement of Winners (Track 4)
7:35pm	Invited Talks from Track 4 [video]
7:55pm	Wrap up discussion and conclusion

Keynote Speaker

Junichi Yamagishi

Title

The use of speaker embeddings in neural audio generation

Abstract

Neural speaker embedding vectors are becoming an essential technology not only in speaker recognition but also in speech synthesis. In this talk, I will first outline how speaker embedding vectors are used in voice conversion, where one speaker's voice is converted to another speaker's voice, and in multi-speaker TTS systems, where multiple speakers’ natural-sounding voices can be synthesized from input sentences by a single model. Then I will explain how the performance of speaker vectors in the speaker recognition task is related to the speaker similarity of the synthesized voices. The latest performance of voice conversion systems will also be presented based on the results of the Voice Conversion Challenge 2020.

I will then introduce "speaker anonymization" as a new example of the use of speaker embeddings in the field of speech privacy. Speaker anonymization aims to convert only the speaker characteristics of the input speech so that the ASV does not identify the original speaker while preserving the usefulness of the anonymized audio in the downstream tasks the user wishes to perform. As an example of such speaker anonymization using speaker embedding vectors, we present a language-independent speaker anonymization system using ECAPA-TDNN, HuBERT, and HiFi-GAN and show its excellent evaluation results using the VoicePrivacy challenge metrics.

Biography

Junichi Yamagishi (Senior Member, IEEE) received a Ph.D. from the Tokyo Institute of Technology, Tokyo, Japan, in 2006. From 2007 to 2013, he was a Research Fellow with the Centre for Speech Technology Research, The University of Edinburgh, Edinburgh, U.K. In 2013, he was an Associate Professor with the National Institute of Informatics, Tokyo, Japan. He is currently a Professor at the National Institute of Informatics. His research interests include speech processing, machine learning, signal processing, biometrics, digital media cloning, and media forensics.

He is a co-organizer of the bi-annual ASVspoof Challenge and the bi-annual Voice Conversion Challenge. He was also a member of the IEEE speech and language technical committee during 2013–2019, an associate editor for IEEE/ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING during 2014–2017, and a chairperson of ISCA SynSIG during 2017–2021. He is currently a PI of the JST-CREST and ANR-supported VoicePersona Project and a senior area editor of IEEE/ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING.

Presentation

Organisers

Jaesung Huh, VGG, University of Oxford,
Andrew Brown, VGG, University of Oxford,
Arsha Nagrani, Google Research,
Joon Son Chung, KAIST, South Korea,
Jeeweon Jung, Naver, South Korea,
Andrew Zisserman, VGG, University of Oxford,
Daniel Garcia-Romero, AWS AI

Advisors

Mitchell McLaren, Speech Technology and Research Laboratory, SRI International, CA,
Douglas A Reynolds, Lincoln Laboratory, MIT.

Please contact jaesung[at]robots[dot]ox[dot]ac[dot]uk or abrown[at]robots[dot]ox[dot]ac[dot]uk if you have any queries, or if you would be interested in sponsoring this challenge.

Team	Track	File
`ravana - ID R&D`	1,2	PDF
`KristonAI`	1,2,4	PDF
`SJTU-AIspeech`	1,3	arXiv
`Strasbourg_spk`	2	arXiv
`NSYSU-CHT`	1,2,3	PDF
`ReturnZero`	1	arXiv
`zzdddz`	1,3	PDF
`DKU-Tencent`	1,3	PDF
`Royalflush`	1,3	arXiv
`DKU-DukeECE`	4	PDF
`AiTER`	4	arXiv
`Pyannote`	4	PDF
`BUCEA`	4	arXiv
`HYU`	3,4	PDF
`Newsbridge-Telecom SudParis`	4	PDF

The VoxSRC Workshop 2022

Schedule

Introduction

Participant talks

Track 1, 2

Track 3

Track 4

Technical reports

Workshop Registration

Keynote Speaker

Junichi Yamagishi

Title

Abstract

Biography

Presentation

Organisers

Advisors

Sponsors

VoxSRC is proudly sponsored by Naver/Line.

Acknowledgements

This work is supported by the EPSRC(Engineering and Physical Research Council) programme grant EP/T028572/1: Visual AI project.