Lip Reading Sentences 3 Dataset

Overview

The dataset consists of thousands of spoken sentences from TED and TEDx videos. There is no overlap between the videos used to create the test set and the ones used for the pre-train and trainval sets. The dataset statistics are given in the table below.

Set	# videos	# utterances	# word instances	Vocab
Pre-train	5,090	118,516	3.9M	51k
Trainval	4,004	31,982	358k	17k
Test	412	1,321	10k	2k

The Lip Reading Sentences 3 Languages (LRS3-Lang) dataset is an extended version of LRS3 (English-only) covering 13 different languages.

Downloads

URLs and timestamps

For every sample we provide: i) the URL ('ref' entry in the text file) and frame ids of the original YouTube video it was created from, ii) the face detection bounding box for every frame, iii) the word boundary timestamps (pre-train set only). The frame numbers provided assume that the video is sampled at 25fps.

Downloads are temporarily unavailable from this website until further notice.

Video files

Downloads are temporarily unavailable from this website until further notice.

License

The LRS3 dataset is available to download for research purposes under a Creative Commons Attribution 4.0 International License. The copyright remains with the original owners of the video. A complete version of the license can be found here.

Publications

Please cite the following if you make use of the dataset.

LRS3-TED: a large-scale dataset for visual speech recognition
T. Afouras, J. S. Chung, A. Zisserman
arXiv preprint arXiv:1809.00496
PDF