The dataset consists of thousands of spoken sentences from TED and TEDx videos. There is no overlap between the videos used to create the test set and the ones used for the pre-train and trainval sets. The dataset statistics are given in the table below.
|Set||# videos||# utterances||# word instances||Vocab|
The Lip Reading Sentences 3 Languages (LRS3-Lang) dataset is an extended version of LRS3 (English-only) covering 13 different languages.
For every sample we provide: i) the URL ('ref' entry in the text file) and frame ids of the original YouTube video it was created from, ii) the face detection bounding box for every frame, iii) the word boundary timestamps (pre-train set only). The frame numbers provided assume that the video is sampled at 25fps.
If you require the video files (loosely cropped face region), please fill this form to request a password. Please cite  below if you make use of the dataset.
Password issued for LRW and LRS2 datasets can also be used to download LRS3.
Please cite the following if you make use of the dataset.
LRS3-TED: a large-scale dataset for visual speech recognition
T. Afouras, J. S. Chung, A. Zisserman
arXiv preprint arXiv:1809.00496