Audio Wave
Lip Reading Sentences 3

Overview

The dataset consists of thousands of spoken sentences from TED and TEDx videos. There is no overlap between the videos used to create the test set and the ones used for the pre-train and trainval sets. The dataset statistics are given in the table below.

Set# videos# utterances# word instancesVocab
Pre-train5,090118,5163.9M51k
Trainval4,00431,982358k17k
Test4121,32110k2k

The Lip Reading Sentences 3 Languages (LRS3-Lang) dataset is an extended version of LRS3 (English-only) covering 13 different languages.

Downloads


Updates:
v0.3: We have had to remove a number of videos from the pre-train set, due to errors in the original version. If you have downloaded the dataset before 28 October 2018, please remove these folders from the pre-train set only. There should be 118,516 video files in the pre-train set after removing these folders. Alternatively, you can re-download the pre-train set from the updated links on this page.
v0.4: We have removed a number of videos from the test set, due to overlapped identities dupplicate videos between training and test sets. Please replace the test set only. There should be 412 folders and 1,321 utterances in the updated version.

URLs and timestamps

For every sample we provide: i) the URL ('ref' entry in the text file) and frame ids of the original YouTube video it was created from, ii) the face detection bounding box for every frame, iii) the word boundary timestamps (pre-train set only). The frame numbers provided assume that the video is sampled at 25fps.

File MD5 Checksum
All sets d6a322038ce4fb2cd53742b28901070f


Video files

If you require the video files (loosely cropped face region), please fill this form to request a password. Please cite [1] below if you make use of the dataset.

Password issued for LRW and LRS2 datasets can also be used to download LRS3.


File MD5 Checksum
Pretrain ADownload c6db35cf0bd550a6b82712b8311931f5
Pretrain BDownload f45fd4c6fcd72e55f90792bb204e8c8b
Pretrain CDownload 3cd5c1a85526097a50d04464b1e76d1e
Pretrain DDownload 541b0b449df0bfd173a351f523dbec8c
Pretrain EDownload be1bdd48e47332ab8143fcb5adefff58
Pretrain FDownload 8f7d70a6ecb8912e4dc7729b311df911
Pretrain GDownload 4fe5ff72e33e58a6cf22964239d9a30f
TrainvalDownload ed87c25127e7baae467f06db4f402838
TestDownload ffffd01e37e8da95ca1a6c27eb9d29f4


Each part is approximately 10GB. Download all parts and concatenate the files using the command cat lrs3_pretrain_part* > lrs3_pretrain.zip . The md5sum of the concatenated file should be 5cc09122c76e2b5a869283219603905e.
If you are experiencing slow connection, follow this link.

The data include excerpts of videos obtained from the TED YouTube channel. Use of this content must respect the TED terms of use and the Creative Commons BY-NC-ND 4.0 license.

Please cite the following if you make use of the dataset.

Publications
Please cite the following if you make use of the dataset.

  • LRS3-TED: a large-scale dataset for visual speech recognition
    T. Afouras, J. S. Chung, A. Zisserman
    arXiv preprint arXiv:1809.00496
    PDF