Audio Wave
Large-scale audio-visual datasets of human speech

7,000 +


VoxCeleb contains speech from speakers spanning a wide range of different ethnicities, accents, professions and ages.

1 million +


All speaking face-tracks are captured "in the wild", with background chatter, laughter, overlapping speech, pose variation and different lighting conditions.

2,000 +


VoxCeleb consists of both audio and video. Each segment is at least 3 seconds long.


URLs and timestamps

We provide URLs for each YouTube video and timestamps for utterances. The frame number provided assumes that the video is saved at 25fps.

File MD5 Checksum
Dev Download 9c3b51e34038d1bdb2174dcc66543267
Test Download 8e06592a5f604e23e8cd10f421b36cc3

File MD5 Checksum
Dev Download 0e7a9f083c4efc27982f748f5f0b540a
Test Download f305b5347c9c45362b7c838b561cea7d

Audio and video files

You can request the audio-visual dataset here.

Trial pairs for speaker verification

List of trial pairs - VoxCeleb1
List of trial pairs - VoxCeleb1 (cleaned)
List of trial pairs - VoxCeleb1-H
List of trial pairs - VoxCeleb1-H (cleaned)
List of trial pairs - VoxCeleb1-E
List of trial pairs - VoxCeleb1-E (cleaned)

VoxCeleb1-E and VoxCeleb1-H lists are drawn from the VoxCeleb1 training set. Therefore you cannot use any files in VoxCeleb1 for training if you are using these lists for testing.


The VoxCeleb dataset is available to download for research purposes under a Creative Commons Attribution 4.0 International License. The copyright remains with the original owners of the video. A complete version of the license can be found here.


Please cite the following if you make use of the dataset.

  • VoxCeleb: a large-scale speaker identification dataset
    A. Nagrani*, J. S. Chung*, A. Zisserman
    Interspeech, 2017

  • VoxCeleb2: Deep Speaker Recognition
    J. S. Chung*, A. Nagrani*, A. Zisserman
    Interspeech, 2018

  • VoxCeleb: Large-scale speaker verification in the wild
    A. Nagrani*, J. S. Chung*, W. Xie, A. Zisserman
    Computer Speech and Language, 2019