TTML has been specified by the W3C Timed Text Working Group and released as a RECommendation v1.0 in November 2010. Since then, several organisations have tried to adopt it as their caption file format. This includes the SMPTE, the EBU (European Broadcasting Union), and Microsoft.
Both, Microsoft and the EBU actually looked at TTML in detail and decided that in order to make it usable for their use cases, a restriction of its functionalities is needed.
EBU-TT
The EBU released EBU-TT, which restricts the set of valid attributes and feature. “The EBU-TT format is intended to constrain the features provided by TTML, especially to make EBU-TT more suitable for the use with broadcast video and web video applications.” (see EBU-TT).
In addition, EBU-specific namespaces were introduce to extend TTML with EBU-specific data types, e.g. ebuttdt:frameRateMultiplierType or ebuttdt:smpteTimingType. Similarly, a bunch of metadata elements were introduced, e.g. ebuttm:documentMetadata, ebuttm:documentEbuttVersion, or ebuttm:documentIdentifier.
The use of namespaces as an extensibility mechanism will ascertain that EBU-TT files continue to be valid TTML files. However, any vanilla TTML parser will not know what to do with these custom extensions and will drop them on the floor.
Simple Delivery Profile
With the intention to make TTML ready for “internet delivery of Captions originated in the United States”, Microsoft proposed a “Simple Delivery Profile for Closed Captions (US)” (see Simple Profile). The Simple Profile is also a restriction of TTML.
Unfortunately, the Microsoft profile is not the same as the EBU-TT profile: for example, it contains the “set” element, which is not conformant in EBU-TT. Similarly, the supported style features are different, e.g. Simple Profile supports “display-region”, while EBU-TT does not. On the other hand, EBU-TT supports monospace, sans-serif and serif fonts, while the Simple profile does not.
Thus files created for the Simple Delivery Profile will not work on players that expect EBU-TT and the reverse.
Fortunately, the Simple Delivery Profile does not introduce any new namespaces and new features, so at least it is an explicit subpart of TTML and not both a restriction and extension like EBU-TT.
SMPTE-TT
SMPTE also created a version of the TTML standard called SMPTE-TT. SMPTE did not decide on a subset of TTML for their purposes – it was simply adopted as a complete set. “This Standard provides a framework for timed text to be supported for content delivered via broadband means,…” (see SMPTE-TT).
However, SMPTE extended TTML in SMPTE-TT with an ability to store a binary blob with captions in another format. This allows using SMPTE-TT as a transport format for any caption format and is deemed to help with “backwards compatibility”.
Now, instead of specifying a profile, SMPTE decided to define how to convert CEA-608 captions to SMPTE-TT. Even if it’s not called a “profile”, that’s actually what it is. It even has its own namespace: “m608:”.
Conclusion
With all these different versions of TTML, I ask myself what a video player that claims support for TTML will do to get something working. The only chance it has is to implement all the extensions defined in all the different profiles. I pity the player that has to deal with a SMPTE-TT file that has a binary blob in it and is expected to be able to decode this.
Now, what is a caption author supposed to do when creating TTML? They obviously cannot expect all players to be able to play back all TTML versions. Should they create different files depending on what platform they are targeting, i.e. a EBU-TT version, a SMPTE-TT version, a vanilla TTML version, and a Simple Delivery Profile version? Should they by throwing all the features of all the versions into one TTML file and hope that the players will pick out the right things that they require and drop the rest on the floor?
Maybe the best way to progress would be to make a list of the “safe” features: those features that every TTML profile supports. That may be the best way to get an “interoperable TTML” file. Here’s me hoping that this minimal set of features doesn’t just end up being the usual (starttime, endtime, text) triple.
UPDATE:
I just found out that UltraViolet have their own profile of SMPTE-TT called CFF-TT (see UltraViolet FAQ and spec). They are making some SMPTE-TT fields optional, but introduce a new @forcedDisplayMode attribute under their own namespace “cff:”.
Silvia Pfeiffer, Jan Gerber, Michael Dale “HTML5 video: how to process and publish video in an open format”, LCA 2010, Linux.conf.au, January 2010, Wellington NZ.
D. van Deursen, S. Pfeiffer, R. Troncy, Y. Lafon, E. Mannens, R. van der Walle, “Implementing the Media Fragments URI Specification”, 19th International World Wide Web Conference (WWW’10), Developer’s Track, pages 1361-1364, Raleigh, North Calorina, USA, April 28-30, 2010.
Elephant's Dream with a track of text descriptions that can function as audio descriptions with the help of a screen reader. It uses aria-live on the cues. Make sure to use the HTML5 player http://www.youtube.com/html5 .
I gave a talk at LCA 2011 in Brisbane about some of the things that I have learnt and code I have developed during writing my book, see http://www.amazon.com/Definitive-Guide-HTML5-Video/dp/1430230908/ .
The talk announcement:
The new HTML 5 specification continues to change - a particularly large number of changes are still happening for audio and video. Not just that we were provided with a new open codec format called WebM which didn't really change any functionality, but may eventually lead to a common baseline codec. But just in July 2010 features for accessibility and a new caption format called WebSRT have been introduced. Also, a new video API is being discussed that will expose analytics about the video performance, e.g. the number of dropped frames, the download rate, and the playback rate. Lastly, a audio data API is proposed that allows the programmer to access raw audio data and do cool thing such as frequency analysis.
I will provide a brief introduction to the new HTML5 video and audio elements, their JavaScript API and already standardized and available functionality in modern Web Browsers, such as pixel manipulation through the Canvas or the application of SVG filters to videos. Then I will show some cool demos of what will be possible once the newer features are standardized and rolled out.
This talk will contain lots of "bling", i.e. lots of visual and aural demonstrations, but there will also be technical content at the level required by more or less hard-core Web developers. Do not expect a kernel talk from this though.
For slides see: http://blog.gingertech.net/2011/01/27/html5-video-presentations-at-lca-2011/
Creative Commons licensed http://creativecommons.org/licenses/by-sa/3.0/ by Linux Australia
Demonstration of a Firefox Plugin developed by Jakub Sendor to show how Media Fragment URIs can be implemented. Media Fragment URIs are being specified at http://www.w3.org/TR/media-frags/ in a W3C technical report.
This is a screencast of a first demo implementation of caption/subtitle support in Firefox where captions are stored in a Kate stream in Ogg video. This is a demo for HTML5 media accessibility.
This video demonstrates an Ogg file of Elephants Dream with an additional audio track containing audio descriptions and an additional text track containing captions. The software being used to play it back is part of liboggplay.
Doug Schepers of the W3C gave an overview talk about W3C and Web standards at the Web Directions South 2009 conference in Sydney, Australia, see http://south09.webdirections.org/program/w3ctrack#the-w3c-and-web-standards-big-picture .
This is a short extract from his talk, where he announces that Twitter has bought the W3C. Awesomeness warning!
Doug sent me a transcript:
(Can you hear me all right?)
So, I just wanted to introduce you to W3C, and to do so, I have some exciting information: W3C has been acquired by Twitter. We're really excited about this... there are... there are a couple of complications. All of our specifications are now going to have to be 140 characters or less. But we think this will actually speed up our time to market. I'd like to introduce 2 new specifications: here is HTML5 and CSS3. So, we've cut out everything except the essential bits: border-radius will now let you have rounded corners... so, um, congratulations. And we expect that these will be implemented very quickly.
Here's the text of that slide:
(W3C logo in Twitter font)
W3C has been acquired by Twitter.
Now, all specifications will be 140 characters or less:
* @w3c HTML5:h1-6,p,li,div,video,audi o #rec
* @w3c CSS3:font,color,padding,border -radius #rec
This is a submission to the W4A 2009 Web accessibility conference (http://www.w4a.info/ ). In the video, we explain the current status of video accessibility on the Web and means forward for HTML5. We propose a solution for associating textual captions with video and explain it on the example of Ogg Kate, SRT and DFXP. We then explain further challenges such as Sign Language, Audio Annotations, and more general types of time-aligned text, e.g. Karaoke, music lyrics, ticker-text, transcripts, or annotations with hyperlinks.
Higher quality copy: http://www.youtube.com/watch?v=qpbtpeofN3c
This is a submission to the W4A 2009 Web accessibility conference (http://www.w4a.info/ ). In the video, we explain the current status of video accessibility on the Web and means forward for HTML5. We propose a solution for associating textual captions with video and explain it on the example of Ogg Kate, SRT and DFXP. We then explain further challenges such as Sign Language, Audio Annotations, and more general types of time-aligned text, e.g. Karaoke, music lyrics, ticker-text, transcripts, or annotations with hyperlinks.
Short 2:30min long video presented as a lightning talk at TPAC 2008 to make a case for making more out of the HTML5 video element. It demonstrates Metavid and its wiki-style editing functionality of transcripts. It further shows the use of the transcript to navigate and search in long-form video. Exposing the structure and content of video to the User Agent and the Server enables video accessibility, content adaptation, and deep search. The video also indicates links to the W3C Media Fragments working group, the W3C Timed Text working group, and the W3C Media Annotations working group.
This is a demo of the Annodex javascript API created through Shane Stephens' liboggplay. It demonstrates how annotations, video hyperlinks, and video controls are handled in an integrated fashion for a HTML5 video tag.
The screencast shows Vquence's flash player in action. It assembles slices of videos from all over the Web. There will be an authoring tool at www.vquence.com
HTML5 is an updated version of the hypertext markup language that has been empowering the World Wide Web for the last 20 years. One of the things that HTML5 introduces is a element, which make video content as simple to include into Web pages as images. Similar to the issues that had to be overcome with the introduction of the tag in 1993, we are now facing the issue of a common baseline codec for the element
In this podcast, Mark Jones interviews Pia Waugh, ICT Policy Advisor for Senator Lundy; Senator Kate Lundy; Matt Barrie, CEO and founder, Freelance.com; and Silvia Pfeiffer, CEO and co-founder, Vquence about the ICT skills shortage and ways of addressing it. Education and tax incentives are two topics under discussion.
Today I am starting a new collection – recordings of interviews, talks I have given, and slides of the talks. There are many that I’ve missed, sorry. You may find some slides also on Slideshare.
Silvia Pfeiffer, “Taking HTML5 <video> a step further”, Web Directions South Conference 2009, W3C Standards Track, Sydney Convention Centre, October 2009.