Audio Wave
VoxCeleb
Large-scale audio-visual datasets of human speech

7,000 +

speakers

VoxCeleb contains speech from speakers spanning a wide range of different ethnicities, accents, professions and ages.

1 million +

utterances

All speaking face-tracks are captured "in the wild", with background chatter, laughter, overlapping speech, pose variation and different lighting conditions.

2,000 +

hours

VoxCeleb consists of both audio and video. Each segment is at least 3 seconds long.


Downloads

URLs and timestamps

We provide URLs for each YouTube video and timestamps for utterances. The frame number provided assumes that the video is saved at 25fps.

VoxCeleb1
File MD5 Checksum
Dev Download 9c3b51e34038d1bdb2174dcc66543267
Test Download 8e06592a5f604e23e8cd10f421b36cc3

VoxCeleb2
File MD5 Checksum
Dev Download 0e7a9f083c4efc27982f748f5f0b540a
Test Download f305b5347c9c45362b7c838b561cea7d

Audio and video files

You can request the audio-visual dataset here.

Trial pairs for speaker verification


List of trial pairs - VoxCeleb1
List of trial pairs - VoxCeleb1 (cleaned)
List of trial pairs - VoxCeleb1-H
List of trial pairs - VoxCeleb1-H (cleaned)
List of trial pairs - VoxCeleb1-E
List of trial pairs - VoxCeleb1-E (cleaned)

VoxCeleb1-E and VoxCeleb1-H lists are drawn from the VoxCeleb1 training set. Therefore you cannot use any files in VoxCeleb1 for training if you are using these lists for testing.

License

The VoxCeleb dataset is available to download for research purposes under a Creative Commons Attribution 4.0 International License. The copyright remains with the original owners of the video. A complete version of the license can be found here.

Publications

Please cite the following if you make use of the dataset.

  • VoxCeleb: a large-scale speaker identification dataset
    A. Nagrani*, J. S. Chung*, A. Zisserman
    Interspeech, 2017
    PDF

  • VoxCeleb2: Deep Speaker Recognition
    J. S. Chung*, A. Nagrani*, A. Zisserman
    Interspeech, 2018
    PDF

  • VoxCeleb: Large-scale speaker verification in the wild
    A. Nagrani*, J. S. Chung*, W. Xie, A. Zisserman
    Computer Speech and Language, 2019
    PDF