ginger's thoughts

Silvia's blog

Attaching subtitles to HTML5 video

Posted in Digital Media, Open Source, code, video accessibility by silvia on the December 12th, 2008

During the last week, I made a proposal to the HTML5 working group about how to support out-of-band time-aligned text in HTML5. What I mean by that is basically: how to link a subtitle file to a video tag in HTML5. This would mirror the way in which in desktop-players you can load separate subtitle files by hand to go alongside a video.

My suggestion is best explained by an example:


<video src="http://example.com/video.ogv" controls>
<text category="CC" lang="en" type="text/x-srt" src="caption.srt"></text>
<text category="SUB" lang="de" type="application/ttaf+xml" src="german.dfxp"></text>
<text category="SUB" lang="jp" type="application/smil" src="japanese.smil"></text>
<text category="SUB" lang="fr" type="text/x-srt" src="translation_webservice/fr/caption.srt"></text>
</video>

  • “text” elements are subelements of the “video” element and therefore clearly related to one video (even if it comes in different formats).
  • the “category” tag allows us to specify what text category we are dealing with and allows the web browser to determine how to display it. The idea is that there would be default display for the different categories and css would allow to override these.
  • the “lang” tag allows the specification of alternative resources based on language, which allows the browser to select one by default based on browser preferences, and also to turn those tracks on by default that a particular user requires (e.g. because they are blind and have preset the browser accordingly).
  • the “type” tag allows specification of what actual time-aligned text format is being used in this instance; again, it will allow the browser to determine whether it is able to decode the file and thus make it available through an interface or not.
  • the “src” attribute obviously points to the time-aligned text resource. This could be a file, a script that extracts data from a database, or even a web service that dynamically creates the data
    based on some input.

This proposal provides for a lot of flexibility and is somewhat independent of the media file format, while still enabling the Web browser to deal with the text (as long as it can decode it). Also note that this is not meant as the only way in which time-aligned text would be delivered to the Web browser – we are continuing to investigate how to embed text inside Ogg as a more persistent means of keeping your text with your media.

Of course you are now aching to see this in action – and this is where the awesomeness starts. There are already three implementations.

First, Jan Gerber independently thought out a way to provide support for srt files that would be conformant with the existing HTML5 tags. His solution is at http://v2v.cc/~j/jquery.srt/. He is using javascript to load and parse the srt file and map it into HTML and thus onto the screen. Jan’s syntax looks like this:


<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="jquery.srt.js"></script>

<video src="http://example.com/video.ogv" id="video" controls>
<div class="srt"
data-video="video"
data-srt="http://example.com/video.srt" />

Then, Michael Dale decided to use my suggested HTML5 syntax and add it to mv_embed. The example can be seen here – it’s the bottom of the two videos. You will need to click on the “CC” button on the player and click on “select transcripts” to see the different subtitles in English and Spanish. If you click onto a text element, the video will play from that offset. Michael’s syntax looks like this:


<video src="sample_fish.ogg" poster="sample_fish.jpg" duration="26">
<text category="SUB" lang="en" type="text/x-srt" default="true"
title="english SRT subtitles" src="sample_fish_text_en.srt">
</text>
<text category="SUB" lang="es" type="text/x-srt"
title="spanish SRT subtitles" src="sample_fish_text_es.srt">
</text>
</video>

Then, after a little conversation with the W3C Timed Text working group, Philippe Le Hegaret extended the current DFXP test suite to demonstrate use of the proposed syntax with DFXP and Ogg video inside the browser. To see the result, you’ll need Firefox 3.1. If you select the “HTML5 DFXP player prototype” as test player, you can click on the tests on the left and it will load the DFXP content. Philippe actually adapted Jan’s javascript file for this. And his syntax looks like this:


<video src="example.ogv" id="video" controls>
<text lang='en' type="application/ttaf+xml" src="testsuite/Content/Br001.xml"></text>
</video>

The cool thing about these implementations is that they all work by mapping the time-aligned text to HTML – and for DFXP the styling attributes are mapped to CSS. In this way, the data can be made part of the browser window and displayed through traditional means.

For time-aligned text that is multiplexed into a media file, we just have to do the same and we will be able to achieve the same functionality. Video accessibility in HTML5 – we’re getting there!

18 Responses to 'Attaching subtitles to HTML5 video'

Subscribe to comments with RSS or TrackBack to 'Attaching subtitles to HTML5 video'.

  1. T. Rapp said,

    on December 16th, 2008 at 10:20 pm

    Is a similar functionality/proposal planned for HTML5 audio? It might also be helpful for long (political) speeches, podcasts, etc.

  2. silvia said,

    on January 4th, 2009 at 3:30 pm

    Everything that is being discussed about <video> will also apply to <audio>.

  3. Brett Zamir said,

    on March 18th, 2009 at 11:40 am

    Out of curiosity, couldn’t http://www.w3.org/TR/XHTMLplusSMIL be used ? Isn’t that what SMIL is all about? (As an aside, I really wish we’d all make a transition to true XHTML already, rather than becoming ever more dependent on HTML–sigh)

  4. silvia said,

    on March 18th, 2009 at 8:19 pm

    Brett Zamir, SMIL is much too complex to take it straight into HTML5. Possibly a subpart of it could be used for a specific purpose. But we have seen from SVG that taking over some SMIL markup isn’t always satisfactory – in fact it is often too restrictive and at the same time too broad. That doesn’t mean inspiration cannot be had from it though!

  5. Brett Zamir said,

    on March 19th, 2009 at 11:25 am

    Ok, but what do you think of the XHTML+SMIL profile? If a specific spec was designed to make them work together, wouldn’t it be doable for HTML5 too? (I see their design rationale at http://www.w3.org/TR/XHTMLplusSMIL/#Design explains why a few modules were left out, though it looks like all that could be were included.) Thanks, Brett

  6. silvia said,

    on March 21st, 2009 at 1:20 pm

    Brett,

    XHTML+SMIL do not actually include any solutions for out-of-band time-aligned text. It is still much too complicated for what we are looking to solve.

    The basic problem is that SMIL still comes from a background of creating multimedia experiences. In particular it is about animation, content control, media objects, timing and synchronization, and transition effects. The idea is to create interactive experiences with SMIL. XHTML+SMIL doesn’t change that.

    HTML5 video comes from a *much* simpler approach: let’s just be able to include <video> as an element into a Web page and possibly associate some text with it. No animations, no interactivity, no transitions.

    Where the more complex functionality is required, SMIL should indeed be regarded. But not for the simple <video> or <audio> element.


  7. on July 1st, 2009 at 3:15 am

    [...] En Ginger’s Thoughts leo una propuesta muy interesante que creo que quedará en solo eso, una propuesta. Aunque la idea es muy buena y podría ser interesante disponer de ello en el nuevo HTML5. La propuesta es de Diciembre de 2008, y por el momento no hay novedades al respecto. [...]


  8. on July 3rd, 2009 at 4:44 am

    [...] Ginger’s Thoughts I read a very interesting proposal which I believe is that in just a proposal. Although the idea is very good and it might be [...]

  9. silvia said,

    on July 6th, 2009 at 11:02 am

    Another new javascript has been released – this time as a greasmonkey script, that will automatically attache captions created in a wiki with the html5 video element. Felipe Sanches from Bazil published it at http://bighead.poli.usp.br/~juca/code/greasemonkey/wiki_subs.user.js and a test page is at http://www.gpopai.usp.br/subs/test_wikisubs.html .

    The greasemonkey script fetches the subtitles from the following wikipage:
    http://www.wstr.org/subs/index.php?title=Subtitles/URL/http://www.fabricio.org/talks/2009/fisl10/FISL10-Zuardi.ogg
    and asks for contribution of subtitles if it doesn’t exist.


  10. on January 25th, 2010 at 9:35 am

    [...] There are several examples of how captions *can* be implemented with javascript, but not standard format. <cite>http://blog.gingertech.net/2008/12/12/attaching-subtitles-to-html5-video/</cite> [...]

  11. silvia said,

    on February 11th, 2010 at 8:52 am

    Here’s another cool demo of how to do lyrics for an audio file in a really nicely animated way with SVG. Only problem is they used setInterval, so sync can be out of whack: http://svg-wow.org/audio/animated-lyrics.html

  12. Andre said,

    on April 11th, 2010 at 3:53 am

    You could simply do this with Javascript in he short term. This site suggests and approach using AJAX:

    http://blog.illyism.com/2010/02/html5-video-subtitles-experiment/

  13. silvia said,

    on April 11th, 2010 at 8:36 am

    @andre there are plenty of existing javascript-based caption approaches. Here is another one: http://colinaarts.com/code/moovie/ and here is one from Opera: http://dev.opera.com/articles/view/accessible-html5-video-with-javascripted-captions/ . The problem is that if everyone does it differently, search engines will never find and index the captions and lose a large opportunity to index into videos. A standard is absolutely necessary.

  14. Frank Lowney said,

    on June 12th, 2010 at 10:40 pm

    Since my interest in mobile computing confines my work to MPEG-4, I am more interested in HTML 5 constructs that can be used to switch between two or more alternate audio tracks and between the default (none) and one or more soft subtitle tracks. These tracks are easily installed (muxed) into an MPEG-4 file using applications such as Subler (http://code.google.com/p/subler/).

    You can see where I am heading with these at: http://hercules.gcsu.edu/~flowney/research/MPEG-4/subtitles/

    Previously, these things were handled nicely by plug-ins such as the QuickTime Plug-in so as we transition to HYML 5, there is a need (strong, I would say) for methods to switch between and among these tracks so as to be able to create a web interface that uses the same conventions to play video as popular applications such as iTunes.app, QuickTime Player X, the Videos.app on iPhone OS devices and others do. As far as I can tell, there are no such HTML 5 methods at present.

    I would very much like to hear your thoughts on this aspect of the challenge to render video in a more versatile and accessible manner.

  15. silvia said,

    on June 13th, 2010 at 6:40 am

    Hi Frank,
    Have you read about the recent proposals that were made to introduce video accessibility into HTML5? See my blog post at http://blog.gingertech.net/2010/04/11/introducing-media-accessibilit-into-html5-media/ – you will notice there is a JavaScript API, currently defined at http://www.w3.org/WAI/PF/HTML/wiki/Media_MultitrackAPI, which is also proposed for introduction. It should provide what you are after.

  16. Frank Lowney said,

    on June 13th, 2010 at 11:12 am

    Yes, it is helpful to know that this is in the offing. I had also run across: http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#timed-track-label

    BTW, there is more to this than accessibility. These are pedagogically very useful in a number of ways.

  17. silvia said,

    on June 13th, 2010 at 2:36 pm

    Frank, yes, totally agree on the usability aspect of this work – more than just accessibility!

    As for the label element – that is a tag on externally referenced timed tracks, so goes beyond just tracks inside the video resource. But indeed these attributes exist to allow for the creation of a uniform JavaScript API independent if tracks are sourced from internal data or external files.


  18. on June 16th, 2010 at 10:13 am

    [...] Also: http://blog.gingertech.net/2008/12/12/attaching-subtitles-to-html5-video/ Categories: Accessibility & Equal Access, Technology Comments (0) Trackbacks (0) [...]

Leave a Reply