Attaching subtitles to HTML5 video
During the last week, I made a proposal to the HTML5 working group about how to support out-of-band time-aligned text in HTML5. What I mean by that is basically: how to link a subtitle file to a video tag in HTML5. This would mirror the way in which in desktop-players you can load separate subtitle files by hand to go alongside a video.
My suggestion is best explained by an example:
<video src="http://example.com/video.ogv" controls>
<text category="CC" lang="en" type="text/x-srt" src="caption.srt"></text>
<text category="SUB" lang="de" type="application/ttaf+xml" src="german.dfxp"></text>
<text category="SUB" lang="jp" type="application/smil" src="japanese.smil"></text>
<text category="SUB" lang="fr" type="text/x-srt" src="translation_webservice/fr/caption.srt"></text>
</video>
- “text” elements are subelements of the “video” element and therefore clearly related to one video (even if it comes in different formats).
- the “category” tag allows us to specify what text category we are dealing with and allows the web browser to determine how to display it. The idea is that there would be default display for the different categories and css would allow to override these.
- the “lang” tag allows the specification of alternative resources based on language, which allows the browser to select one by default based on browser preferences, and also to turn those tracks on by default that a particular user requires (e.g. because they are blind and have preset the browser accordingly).
- the “type” tag allows specification of what actual time-aligned text format is being used in this instance; again, it will allow the browser to determine whether it is able to decode the file and thus make it available through an interface or not.
- the “src” attribute obviously points to the time-aligned text resource. This could be a file, a script that extracts data from a database, or even a web service that dynamically creates the data
based on some input.
This proposal provides for a lot of flexibility and is somewhat independent of the media file format, while still enabling the Web browser to deal with the text (as long as it can decode it). Also note that this is not meant as the only way in which time-aligned text would be delivered to the Web browser – we are continuing to investigate how to embed text inside Ogg as a more persistent means of keeping your text with your media.
Of course you are now aching to see this in action – and this is where the awesomeness starts. There are already three implementations.
First, Jan Gerber independently thought out a way to provide support for srt files that would be conformant with the existing HTML5 tags. His solution is at http://v2v.cc/~j/jquery.srt/. He is using javascript to load and parse the srt file and map it into HTML and thus onto the screen. Jan’s syntax looks like this:
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="jquery.srt.js"></script>
<video src="http://example.com/video.ogv" id="video" controls>
<div class="srt"
data-video="video"
data-srt="http://example.com/video.srt" />
Then, Michael Dale decided to use my suggested HTML5 syntax and add it to mv_embed. The example can be seen here – it’s the bottom of the two videos. You will need to click on the “CC” button on the player and click on “select transcripts” to see the different subtitles in English and Spanish. If you click onto a text element, the video will play from that offset. Michael’s syntax looks like this:
<video src="sample_fish.ogg" poster="sample_fish.jpg" duration="26">
<text category="SUB" lang="en" type="text/x-srt" default="true"
title="english SRT subtitles" src="sample_fish_text_en.srt">
</text>
<text category="SUB" lang="es" type="text/x-srt"
title="spanish SRT subtitles" src="sample_fish_text_es.srt">
</text>
</video>
Then, after a little conversation with the W3C Timed Text working group, Philippe Le Hegaret extended the current DFXP test suite to demonstrate use of the proposed syntax with DFXP and Ogg video inside the browser. To see the result, you’ll need Firefox 3.1. If you select the “HTML5 DFXP player prototype” as test player, you can click on the tests on the left and it will load the DFXP content. Philippe actually adapted Jan’s javascript file for this. And his syntax looks like this:
<video src="example.ogv" id="video" controls>
<text lang='en' type="application/ttaf+xml" src="testsuite/Content/Br001.xml"></text>
</video>
The cool thing about these implementations is that they all work by mapping the time-aligned text to HTML – and for DFXP the styling attributes are mapped to CSS. In this way, the data can be made part of the browser window and displayed through traditional means.
For time-aligned text that is multiplexed into a media file, we just have to do the same and we will be able to achieve the same functionality. Video accessibility in HTML5 – we’re getting there!
on December 16th, 2008 at 10:20 pm
Is a similar functionality/proposal planned for HTML5 audio? It might also be helpful for long (political) speeches, podcasts, etc.
on January 4th, 2009 at 3:30 pm
Everything that is being discussed about <video> will also apply to <audio>.
on March 18th, 2009 at 11:40 am
Out of curiosity, couldn’t http://www.w3.org/TR/XHTMLplusSMIL be used ? Isn’t that what SMIL is all about? (As an aside, I really wish we’d all make a transition to true XHTML already, rather than becoming ever more dependent on HTML–sigh)
on March 18th, 2009 at 8:19 pm
Brett Zamir, SMIL is much too complex to take it straight into HTML5. Possibly a subpart of it could be used for a specific purpose. But we have seen from SVG that taking over some SMIL markup isn’t always satisfactory – in fact it is often too restrictive and at the same time too broad. That doesn’t mean inspiration cannot be had from it though!
on March 19th, 2009 at 11:25 am
Ok, but what do you think of the XHTML+SMIL profile? If a specific spec was designed to make them work together, wouldn’t it be doable for HTML5 too? (I see their design rationale at http://www.w3.org/TR/XHTMLplusSMIL/#Design explains why a few modules were left out, though it looks like all that could be were included.) Thanks, Brett
on March 21st, 2009 at 1:20 pm
Brett,
XHTML+SMIL do not actually include any solutions for out-of-band time-aligned text. It is still much too complicated for what we are looking to solve.
The basic problem is that SMIL still comes from a background of creating multimedia experiences. In particular it is about animation, content control, media objects, timing and synchronization, and transition effects. The idea is to create interactive experiences with SMIL. XHTML+SMIL doesn’t change that.
HTML5 video comes from a *much* simpler approach: let’s just be able to include <video> as an element into a Web page and possibly associate some text with it. No animations, no interactivity, no transitions.
Where the more complex functionality is required, SMIL should indeed be regarded. But not for the simple <video> or <audio> element.
on July 1st, 2009 at 3:15 am
[...] En Ginger’s Thoughts leo una propuesta muy interesante que creo que quedará en solo eso, una propuesta. Aunque la idea es muy buena y podría ser interesante disponer de ello en el nuevo HTML5. La propuesta es de Diciembre de 2008, y por el momento no hay novedades al respecto. [...]
on July 3rd, 2009 at 4:44 am
[...] Ginger’s Thoughts I read a very interesting proposal which I believe is that in just a proposal. Although the idea is very good and it might be [...]
on July 6th, 2009 at 11:02 am
Another new javascript has been released – this time as a greasmonkey script, that will automatically attache captions created in a wiki with the html5 video element. Felipe Sanches from Bazil published it at http://bighead.poli.usp.br/~juca/code/greasemonkey/wiki_subs.user.js and a test page is at http://www.gpopai.usp.br/subs/test_wikisubs.html .
The greasemonkey script fetches the subtitles from the following wikipage:
http://www.wstr.org/subs/index.php?title=Subtitles/URL/http://www.fabricio.org/talks/2009/fisl10/FISL10-Zuardi.ogg
and asks for contribution of subtitles if it doesn’t exist.
on January 25th, 2010 at 9:35 am
[...] There are several examples of how captions *can* be implemented with javascript, but not standard format. <cite>http://blog.gingertech.net/2008/12/12/attaching-subtitles-to-html5-video/</cite> [...]
on February 11th, 2010 at 8:52 am
Here’s another cool demo of how to do lyrics for an audio file in a really nicely animated way with SVG. Only problem is they used setInterval, so sync can be out of whack: http://svg-wow.org/audio/animated-lyrics.html