VoxSim is a dataset contatining perceptual voice similarity rating of VoxCeleb1 speech clips. There is no speaker overlap between the clips used to create the test set and train set. The dataset statistics are given in the table below.
Set | # speakers | # speaker combinations | # pairs | # ratings |
Train | 1,142 | 24,764 | 38,802 | 63,845 |
Test | 109 | 904 | 2,776 | 5,564 |
Total | 1,251 | 25,668 | 41,578 | 69,409 |
Train List (raw score)
Train List (average score per utterance pair)
Test List (csv)
Each line represents {clip1_path},{clip2_path},{same_or_different_speaker},{listener_id},{rating}.
To download the VoxCeleb1 dataset, please refer to here.
The VoxSim dataset is available to download for research purposes under a Creative Commons Attribution 4.0 International License. A complete version of the license can be found here.
@inproceedings{voxsim,
author={Junseok Ahn, Youkyum Kim, Yeunju Choi, Doyeop Kwak, Ji-Hoon Kim, Seongkyu Mun, Joon Son Chung},
booktitle={Proc. Interspeech 2024},
title={VoxSim: A perceptual voice similarity dataset},
year={2024},
}
This work was supported by Samsung Research.