HTML5 video descriptions

How to make <video> accessible
to vision-impaired users

Use your cursor keys to move forward and backward through the slides.

HTML5 <video> element example


        <video poster="video.png" controls>
          <source src="video.mp4"  type="video/mp4">
          <source src="video.webm" type="video/webm">
        </video>
          

Accessibility Experience Improvements

  • video controls: browser provides keyboard access
  • insight: author provides a summary attribute to screen reader
  • video information: author provides a title
  • text representation: author provides a transcription
  • synchronized representation: author provides a video description

Video Description Options

Function Burnt-in solution In-band solution External solution Dynamic solution
Mixed into main audio track Extra audio track in main resource Separate audio resource Separate audio snippets
(Extended) Text Description [TTS mixed into main audio track] Extra text track in main resource Separate text resource Separate text snippets

Video Description Options

Function Burnt-in solution In-band solution External solution Dynamic solution
Mixed into main audio track Extra audio track in main resource Separate audio resource Separate audio snippets
(Extended) Text Description [TTS mixed into main audio track] Extra text track in main resource Separate text resource Separate text snippets

AD Mixed into main audio track

Audio description created by human or TTS

Choice of most appropriate video resource:

  • only main video and audio
  • main video and mixed audio
  • only mixed audio (?)

current idea: Resources given in manifest file, e.g. m3u8

User preference setting determines right resource

NOT IN SPEC YET

Text Description Options

Function Burnt-in solution In-band solution External solution Dynamic solution
Mixed into main audio track Extra audio track in main resource Separate audio resource Separate audio snippets
(Extended) Text Description [TTS mixed into main audio track] Extra text track in main resource Separate text resource Separate text snippets

HTML5 <track> element examples


<video poster="video.png" controls>
  <source src="video.mp4"  type="video/mp4">
  <source src="video.webm" type="video/webm">

  <track label="English HoH" kind="descriptions" srclang="en"
                                                 src="video_audesc_en.wsrt">
  <track label="Chapters"    kind="chapters"     srclang="en"
                                                 src="video_chapters_en.wsrt">
  <track label="English CC"  kind="captions"     srclang="en"
                                                 src="video_en.wsrt">
  <track label="French SUB"  kind="subtitles"    srclang="fr"
                                                 src="video_fr.wsrt">
</video>
        

WebSRT is made for

  • captions
  • subtitles
  • text descriptions
  • chapters
  • timed metadata

... anything time-synchronized with <video> and <audio>.

It achieves this by providing

  • plain text or minimal markup
  • useful for: captions, subtitles, lyrics, karaoke, chapters, descriptions
  •  
  • raw text with author-decided markup
  • useful for: timed metadata, educational material, rich markup needs

WebSRT text description example

1
00:00:00,000 --> 00:00:05,000
The orange open movie project presents

2
00:00:05,010 --> 00:0:12,000
Introductory titles are showing on the background of a water pool with fishes
swimming and mechanical objects lying on a stone floor.

3
00:00:12,010 --> 00:00:14,800
Title: Elephants Dream
          
  • aim: display through speech synthesis or braille
  • in sync with the audio / video resource
  • full file: websrt example

WebSRT consists of

  1. optional U+FEFF BYTE ORDER MARK (BOM) character
  2. a sequence of timed text cues containing
    • optional identifier: arbitrary string
    • start time: time at which the cue starts becoming relevant
    • end time: time at which the cue stops becoming relevant
    • cue text: raw text of the cue
    • optional voice declaration: number, narrator, music, lyric, sound, comment, credit
    • optional cue setting: positioning and formatting instruction
  3. cues are separated by 2 or more line terminators (CRLF, CR or LF)
  4. all characters are in UTF-8
  5. the mime type is text/srt

WebSRT voice identifiers

  1. digit 1-9, followed by digit 0-9 optionally
  2. one of
    • narrator
    • music
    • lyric
    • sound
    • comment
    • credit

To be replaced by <v person></v>

Text Description Options

Function Burnt-in solution In-band solution External solution Dynamic solution
Mixed into main audio track Extra audio track in main resource Separate audio resource Separate audio snippets
(Extended) Text Description [TTS mixed into main audio track] Extra text track in main resource Separate text resource Separate text snippets

TD in-band track in video resource

Audio description is given inside video resource

Video resource has multiple tracks:

  • main video
  • main audio
  • text description

Tracks are activated based on user preferences

Tracks are exposed in JavaScript

JavaScript API for in-band tracks (1)


interface HTMLMediaElement : HTMLElement {
  [...]
  // timed tracks
  readonly attribute TimedTrack[] tracks;
}

interface TimedTrack {
  readonly attribute DOMString kind;
  readonly attribute DOMString label;
  readonly attribute DOMString language;
  [...]
  readonly attribute TimedTrackCueList cues;
  readonly attribute TimedTrackCueList activeCues;
  readonly attribute Function oncuechange;
};
          

JavaScript API for in-band tracks (2)


interface TimedTrackCueList {
  readonly attribute unsigned long length;
  getter TimedTrackCue (in unsigned long index);
  TimedTrackCue getCueById(in DOMString id);
};

interface TimedTrackCue {
  readonly attribute DOMString id;
  readonly attribute double startTime;
  readonly attribute double endTime;
  DOMString getCueAsSource();
  DocumentFragment getCueAsHTML();
  readonly attribute Function onenter;
  readonly attribute Function onexit;
};
          

Example use: activate a descriptions track


video = getElementByTagname('video')[0];
for (i=0; i < video.tracks.length(); i++) {
  if (video.tracks[i].kind=="descriptions") {
    video.tracks[i].mode=SHOWING;
  }
}
          

TD in-band track examples

Ogg Theora video: encapsulates SRT through Kate

Matroska (WebM container): encapsulates SRT

MPEG-4 Video: encapsulates 3GPP TT - can be encoded from SRT using MP4Box

Text Description Options

Function Burnt-in solution In-band solution External solution Dynamic solution
Mixed into main audio track Extra audio track in main resource Separate audio resource Separate audio snippets
(Extended) Text Description [TTS mixed into main audio track] Extra text track in main resource Separate text resource Separate text snippets

TD dynamically added during playback

Audio description is given as text snippets

Video resource has only main tracks:

  • main video
  • main audio

Descriptions are added through JavaScript

JavaScript API for mutable tracks


interface HTMLMediaElement : HTMLElement {
  [...]
  // mutable timed tracks
  MutableTimedTrack addTrack(in DOMString kind, in optional DOMString label, in optional DOMString language);
}

interface MutableTimedTrack : TimedTrack {
 void addCue(in TimedTrackCue cue);
 void removeCue(in TimedTrackCue cue);
};

interface TimedTrack {
  readonly attribute DOMString kind;
  readonly attribute DOMString label;
  readonly attribute DOMString language;
  [...]
  readonly attribute TimedTrackCueList cues;
  readonly attribute TimedTrackCueList activeCues;
  readonly attribute Function oncuechange;
};
          

Example use: add a track and a cue


video = getElementByTagname('video')[0];

track = video.addTrack("descriptions", "Descriptions - en", "en");

cue = new TimedTrackCue(1, "00:00:00.000", "00:00:20.000", "Hello new description text");
track.addCue(cue);
          

HTML5 Timed Tracks Platform summary

Drawing explains how the video element can receive text cues from source sub-elements, track sub-elements, and JavaScript. It further explains how there is automatic rendering and how CSS can be applied to the cues through the HTML page.

Order of TimedTracks

  • add <track> tracks first
  • add MutableTimedTracks second
  • add resource-specific tracks last

ADDENDUM

Slides after here describe how audio descriptions work, not text descriptions.

Video Description Options

Function Burnt-in solution In-band solution External solution Dynamic solution
Mixed into main audio track Extra audio track in main resource Separate audio resource Separate audio snippets
(Extended) Text Description [TTS mixed into main audio track] Extra text track in main resource Separate text resource Separate text snippets

AD in-band track in video resource

Audio description can be created by human or TTS

Video resource has multiple tracks:

  • main video
  • main audio
  • audio description

Tracks are activated based on user preferences

Tracks are exposed in JavaScript

NOT IN SPEC YET

Video Description Options

Function Burnt-in solution In-band solution External solution Dynamic solution
Mixed into main audio track Extra audio track in main resource Separate audio resource Separate audio snippets
(Extended) Text Description [TTS mixed into main audio track] Extra text track in main resource Separate text resource Separate text snippets

AD in external resource

Audio description can be created by human or TTS

  • video has main video and audio track
  • separate audio has audio description

Tracks are activated based on user preferences

Tracks are exposed in JavaScript

NOT IN SPEC YET

AD in external resource: Design


          <video poster="video.png" controls>
            <source src="video.mp4"  type="video/mp4">
            <source src="video.webm" type="video/webm">
            <track ref="#description" label="description" srclang="en">
          </video>

          <audio id="description">
            <source src="description.mp3"  type="audio/mp3">
            <source src="description.ogg" type="audio/ogg">
          </video>
            

<track> element synchronizes external resource

Video Description Options

Function Burnt-in solution In-band solution External solution Dynamic solution
Mixed into main audio track Extra audio track in main resource Separate audio resource Separate audio snippets
(Extended) Text Description [TTS mixed into main audio track] Extra text track in main resource Separate text resource Separate text snippets

AD dynamically added during playback

Audio description can be created by human or TTS

They are then scheduled through JavaScript calls

  • video has only main video and audio track
  • a JavaScript function can schedule a audio snippet

NOT IN SPEC YET