<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>ginger&#039;s thoughts &#187; video accessibility</title>
	<atom:link href="http://blog.gingertech.net/category/video-accessibility/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.gingertech.net</link>
	<description>Silvia&#039;s blog</description>
	<lastBuildDate>Tue, 13 Jul 2010 19:35:48 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>&#8220;HTML5 Audio And Video Accessibility, Internationalisation And Usability&#8221; talk at Mozilla Summit</title>
		<link>http://blog.gingertech.net/2010/07/14/html5_media_a11y_moz_summit/</link>
		<comments>http://blog.gingertech.net/2010/07/14/html5_media_a11y_moz_summit/#comments</comments>
		<pubDate>Tue, 13 Jul 2010 19:35:48 +0000</pubDate>
		<dc:creator>silvia</dc:creator>
				<category><![CDATA[Digital Media]]></category>
		<category><![CDATA[standards]]></category>
		<category><![CDATA[video accessibility]]></category>
		<category><![CDATA[2010]]></category>
		<category><![CDATA[accessibility]]></category>
		<category><![CDATA[html5 media]]></category>
		<category><![CDATA[HTML5 video]]></category>
		<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[Mozilla Summit]]></category>

		<guid isPermaLink="false">http://blog.gingertech.net/?p=1030</guid>
		<description><![CDATA[For 2 months now, I have been quietly working along on a new Mozilla contract that I received to continue working on HTML5 media accessibility. Thanks Mozilla!
Lots has been happening &#8211; the W3C HTML5 accessibility task force published a requirements document, the Media Text Associations proposal made it into the HTML5 draft as a &#60;track> [...]]]></description>
			<content:encoded><![CDATA[<p>For 2 months now, I have been quietly working along on a new Mozilla contract that I received to continue working on HTML5 media accessibility. Thanks Mozilla!</p>
<p>Lots has been happening &#8211; the W3C HTML5 accessibility task force published a <a href="http://www.w3.org/WAI/PF/HTML/wiki/Media_Accessibility_Requirements">requirements document</a>, the <a href="http://www.w3.org/WAI/PF/HTML/wiki/Media_TextAssociations">Media Text Associations proposal</a> made it into the HTML5 draft as a <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#timed-tracks">&lt;track> element</a>, and there are discussions about the advantages and disadvantages of the <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#websrt-0">new WebSRT caption</a> format that Ian Hickson created in the WHATWG HTML5 draft.</p>
<p>In attending the Mozilla Summit last week, I had a chance to <a href="http://blog.gingertech.net/wp-content/uploads/2010/07/summit2010/">present the current state of development of HTML5 media accessibility</a> and some of the ongoing work. I focused on the following four current activities on the technical side of things, which are key to satisfying many of the <a href="http://www.w3.org/WAI/PF/HTML/wiki/Media_Accessibility_Requirements">collected media accessibility requirements</a>:</p>
<ol>
<li>Multitrack Video Support</li>
<li>External Text Tracks Markup in HTML5</li>
<li>External Text Track File Format</li>
<li>Direct Access to Media Fragments</li>
</ol>
<p>The first three now already have first drafts in the HTML5 specification, though the details still need to be improved and an external text track file format agreed on. The last has had a major push ahead with the Media Fragments WG publishing a Last Call Working Draft. So, on the specification side of things, major progress has been made. On the implementation &#8211; even on the example implementation &#8211; side of things, we still fall down badly. This is where my focus will lie in the next few months.</p>
<p><a href="http://blog.gingertech.net/wp-content/uploads/2010/07/summit2010/">Follow this link to read through my slides from the Mozilla 2010 summit</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingertech.net/2010/07/14/html5_media_a11y_moz_summit/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Media Fragment URI Specification in Last Call WD</title>
		<link>http://blog.gingertech.net/2010/07/10/media-fragment-uri-specification-in-last-call-wd/</link>
		<comments>http://blog.gingertech.net/2010/07/10/media-fragment-uri-specification-in-last-call-wd/#comments</comments>
		<pubDate>Fri, 09 Jul 2010 15:44:20 +0000</pubDate>
		<dc:creator>silvia</dc:creator>
				<category><![CDATA[Digital Media]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[standards]]></category>
		<category><![CDATA[video accessibility]]></category>
		<category><![CDATA[HTML5 video]]></category>
		<category><![CDATA[media fragment URI]]></category>
		<category><![CDATA[media fragments]]></category>
		<category><![CDATA[named fragment]]></category>
		<category><![CDATA[spatial fragment]]></category>
		<category><![CDATA[temporal fragment]]></category>
		<category><![CDATA[track fragment]]></category>

		<guid isPermaLink="false">http://blog.gingertech.net/?p=1020</guid>
		<description><![CDATA[After two years of effort, the W3C Media Fragment WG has now created a Last Call Working Draft document. This means that the working group is fairly confident that they have addressed all the required issues for media fragment URIs and their implementation on HTTP and is asking for outside experts and groups for input. [...]]]></description>
			<content:encoded><![CDATA[<p>After two years of effort, the W3C Media Fragment WG has now created a <a href="http://www.w3.org/TR/media-frags/">Last Call Working Draft document</a>. This means that the working group is fairly confident that they have addressed all the required issues for media fragment URIs and their implementation on HTTP and is asking for outside experts and groups for input. This is the time for you to get active and proof-read the specification thoroughly and feed back all the concerns that you have and all the things you do not understand!</p>
<p>The <a href="http://www.w3.org/TR/media-frags/">media fragment (MF) URI specification</a> specifies two types of MF URIs: those created with a URI fragment (&#8220;#&#8221;), e.g. <b>video.ogv#t=10,20</b> and those with a URI query (&#8220;?&#8221;), e.g. <b>video.ogv?t=10,20</b>. There is a fundamental difference between the two that needs to be appreciated: with a URI fragment you can specify a subpart of a resource, e.g. a subpart of a video, while with a URI query you will refer to a different resource, i.e. a &#8220;new&#8221; video. This is an important difference to understand for media fragments, because only some things that we want to achieve with media fragments can be achieved with &#8220;#&#8221;, while others can only be achieved by transforming the resource into a different new bitstream.</p>
<p>This all sounds very abstract, so let me give you an example. Say you want to retrieve a video without its audio track. Say you&#8217;d rather not download the audio track data, since you want to save on bandwidth. So, you are only interested to get the video data. The URI that you may want to use is <b>video.ogv#track=video</b>. This means that you don&#8217;t want to change the video resource, but you only want to see the video. The user agent (UA) has two options to resolve such a URI: it can either map that request to byte ranges and just retrieve those &#8211; or it can download the full resource and ignore the data it has not been requested to display.</p>
<p>Since we do not want the extra bytes of the audio track to be retrieved, we would hope the UA can do the byte range requests. However, most Web video formats will interleave the different tracks of a media resource in time such that a video track will results in a gazillion of smaller byte ranges. This makes it impractical to retrieve just the video through a &#8220;#&#8221; media fragment. Thus, if we really want this functionality, we have to make the server more intelligent and allow creation of a new resource from the existing one which doesn&#8217;t contain the audio. Then, the server, upon receiving a request such as <b>video.ogv#track=video</b> can redirect that to <b>video.ogv?track=video</b> and actually serve a new resource that satisfies the needs.</p>
<p>This is in fact exactly what was implemented in a recently published Firefox Plugin written by Jakub Sendor &#8211; also described in his presentation <a href="http://www.w3.org/2008/WebVideo/Fragments/talks/2010-06-30-Jakub_Sendor-Media_Fragment_Firefox_Plugin.pdf">&#8220;Media Fragment Firefox plugin&#8221;</a>.</p>
<p>Media Fragment URIs are defined for four dimensions:</p>
<ul>
<li>temporal fragments</li>
<li>spatial fragments</li>
<li>track fragments</li>
<li>named fragments</li>
</ul>
<p>The temporal dimension, while not accompanied with another dimension, can be easily mapped to byte ranges, since all Web media formats interleave their tracks in time and thus create the simple relationship between time and bytes.</p>
<p>The spatial dimension is a very complicated beast. If you address a rectangular image region out of a video, you might want just the bytes related to that image region. That&#8217;s almost impossible since pixels are encoded both aggregated across the frame and across time. Also, actually removing the context, i.e. the image data outside the region of interest may not be what you want &#8211; you may only want to focus in on the region of interest. Thus, the proposal for what to do in the spatial dimension is to simply retrieve all the data and have the UA deal with the display of the focused region, e.g. putting a dark overlay over the regions outside the region of interest.</p>
<p>The track dimension is similarly complicated and here it was decided that a redirect to a URI query would be in order in the demo Firefox plugin. Since this requires an intelligent server &#8211; which is available through the Ninsuna demo server that was implemented by Davy Van Deursen, another member of the MF WG &#8211; the Firefox plugin makes use of that. If the UA doesn&#8217;t have such an intelligent server available, it may again be most useful to only blend out the non-requested data on the UA similar to the spatial dimension.</p>
<p>The named dimension is still a largely undefined beast. It is clear that addressing a named dimension cannot be done together with the other dimensions, since a named dimension can represent any of the other dimensions above, and even a combination of them. Thus, resolving a named dimension requires an understanding of either the UA or the server what the name maps to. If, for example, a track has a name in a media resource and that name is stored in the media header and the UA already has a copy of all the media headers, it can resolve the name to the track that is being requested and take adequate action.</p>
<p>But enough explaining &#8211; I have made a screencast of the Firefox plugin in action for all these dimensions, which explains things a lot more concisely than word will ever be able to &#8211; enjoy:<br />
<p><a href="http://blog.gingertech.net/2010/07/10/media-fragment-uri-specification-in-last-call-wd/"><em>Click here to view the embedded video.</em></a></p></p>
<p>And do not forget to proofread <a href="http://www.w3.org/TR/media-frags/">the specification</a> and send feedback to <a href="mailto:public-media-fragment@w3.org ">public-media-fragment@w3.org</a>. </p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingertech.net/2010/07/10/media-fragment-uri-specification-in-last-call-wd/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Introducing media accessibility into HTML5</title>
		<link>http://blog.gingertech.net/2010/04/11/introducing-media-accessibilit-into-html5-media/</link>
		<comments>http://blog.gingertech.net/2010/04/11/introducing-media-accessibilit-into-html5-media/#comments</comments>
		<pubDate>Sat, 10 Apr 2010 23:49:12 +0000</pubDate>
		<dc:creator>silvia</dc:creator>
				<category><![CDATA[Digital Media]]></category>
		<category><![CDATA[standards]]></category>
		<category><![CDATA[video accessibility]]></category>
		<category><![CDATA[accessibility]]></category>
		<category><![CDATA[audio descriptions]]></category>
		<category><![CDATA[audio element]]></category>
		<category><![CDATA[captions]]></category>
		<category><![CDATA[HTML5]]></category>
		<category><![CDATA[HTML5 audio]]></category>
		<category><![CDATA[HTML5 video]]></category>
		<category><![CDATA[subtitles]]></category>
		<category><![CDATA[video element]]></category>
		<category><![CDATA[W3C]]></category>

		<guid isPermaLink="false">http://blog.gingertech.net/?p=976</guid>
		<description><![CDATA[In recent months, people in the W3C HTML5 Accessibility Task Force developed two proposals for introducing caption, subtitle, and more generally time-aligned text support into HTML5 audio and video.
These time-aligned text files can either come as external files that are associated with the timeline of the media resource, or they come as part of the [...]]]></description>
			<content:encoded><![CDATA[<p>In recent months, people in the W3C HTML5 Accessibility Task Force developed two proposals for introducing caption, subtitle, and more generally time-aligned text support into HTML5 audio and video.</p>
<p>These time-aligned text files can either come as external files that are associated with the timeline of the media resource, or they come as part of the media resource in a binary track.</p>
<p>For both cases we now have proposals to extend the HTML5 specification.</p>
<p>Firstly, let&#8217;s look at time-aligned text in external files. The <a href="http://www.w3.org/WAI/PF/HTML/wiki/Media_TextAssociations">change proposal introduces markup to associate such external files</a> as a kind of &#8220;virtual track&#8221; with a media resource. Here is an example:<br />
<code><br />
&lt;video src="video.ogv"><br />
  &lt;track src="video_cc.ttml" type="application/ttaf+xml" language="en" role="caption">&lt;/track><br />
  &lt;track src="video_tad.srt" type="text/srt" language="en" role="textaudesc">&lt;/track><br />
  &lt;trackgroup role="subtitle"><br />
    &lt;track src="video_sub_en.srt" type="text/srt; charset='Windows-1252'" language="en">&lt;/track><br />
    &lt;track src="video_sub_de.srt" type="text/srt; charset='ISO-8859-1'" language="de">&lt;/track><br />
    &lt;track src="video_sub_ja.srt" type="text/srt; charset='EUC-JP'" language="ja">&lt;/track><br />
  &lt;/trackgroup><br />
&lt;/video><br />
</code><br />
The video resource is &#8220;video.ogv&#8221;. Associated with it are five timed text resources.</p>
<p>The first one is written in TTML (which is the new name for <a href="http://www.w3.org/TR/ttaf1-dfxp/">DFXP</a>), is a caption track and in English. TTML is particularly useful when you want to provide more than just an unformatted piece of text to the viewers. Hearing-impaired users appreciate any visual help they can be provided with to absorb the caption text more quickly. This includes colour coding of speakers, positioning of text close to the speaking person on screen, or even animated musical notes to signify music. Thus, a format like TTML that allows for formatting and positioning information is an appropriate format to specify captions.</p>
<p>All other timed text resources are provided in <a href="http://en.wikipedia.org/wiki/SubRip">SRT</a> format, which is a simpler format that TTML with only plain text in the text cues.</p>
<p>The second text track is a textual audio description track. A textual audio description is in fact targeted at the vision-impaired and contains text that is expected to be read out by a screen reader or routed to a braille device. Thus, as the video plays, a vision-impaired user receives additional information about the visual content of the scene through their screen reader or braille device.  The SRT format is particularly useful for providing textual audio descriptions since it only provides plain text, which can easily be handed on to assistive technology. When authoring such textual audio descriptions, it is very important to pick time intervals in the original media resource where no other significant audio cue is provided, such that the vision-impaired user is able to listen to the screen reader during that time.</p>
<p>The last three text tracks are subtitle tracks. They are grouped into a trackgroup element, which is not strictly necessary, but enables the author to say that these tracks are supposed to be alternatives. Thus, a Web Browser can create a menu with all the available tracks and put the tracks in the trackgroup into a menu of their own where only one option is selectable (similar to how radiobuttons work). Incidentally, the trackgroup element also allows to avoid having to repeat the role attribute in all the containing tracks. It is expected that these menus will be added to the default media controls and will thus be visible if the media element has a controls attribute.</p>
<p>With the role, type and language attributes, it is easy for a Web Browser to understand what the different tracks have to offer. A Web Browser can even decide to offer new functionality that is helpful to certain user groups. For example, an addition to a Web Browser&#8217;s default settings could be to allow users to instruct a Web Browser to always turn on captions or subtitles if they are available in the user&#8217;s main language. Or to always turn on textual audio descriptions. In this way, a user can customise their default experience of a media resource over and on top of what a Web page author decides to expose.</p>
<p>Incidentally, the choice of &#8220;track&#8221; as a name for relating external text resources to a media element has a deeper meaning. It is easily possible in future to extend &#8220;track&#8221; elements to not just point to dependent text resources, but also to dependent audio or video resources. For example, an actual audio description that is a recording of a human voice rather than a rendered text description could be association in the same way. Right now, such an implementation is not envisaged by the Browser vendors, but it will be something to work towards in the future.</p>
<p>Now, with such functionality available, there is naturally a desire to be able to control activation or de-activation of text tracks through JavaScript, not just through user interaction. A Web Developer may for example want to override the default controls provided by a Web Browser and run their own JavaScript-based controls, thus requiring to create their own selection menu for the tracks.</p>
<p>This is actually also an issue more generally and applies to all track types, including such tracks that come inside an existing media resource. In the current specification such tracks are not exposed and can therefore not be activated.</p>
<p>This is where the second specification that the W3C Accessibility Task Force has worked towards comes in: the <a href="http://www.w3.org/WAI/PF/HTML/wiki/Media_MultitrackAPI">media multitrack JavaScript API</a>.</p>
<p>This specification introduces a read-only JavaScript interface to the audio and video elements to allow Web Developers to find out about the tracks (including the virtual tracks) that a media resource offers. The only action that the interface currently provides is to enable or disable tracks.<br />
Here is an example use to turn on a french subtitle track:<br />
<code><br />
if (video.tracks[2].role == "subtitle" &#038;&#038; video.tracks[2].language == "fr") video.tracks[2].enabled = true;<br />
</code></p>
<p>There is still a need to introduce a means to actually expose the text cues as they relate to the currentTime of the media resource. This has not yet been specified in the given proposals.</p>
<p>The text cues could be exposed in several ways. They could be exposed through introducing an event, i.e. every time a new text cue becomes active, a callback is called which is given the active text cue (if such a callback had been registered previously). Another option is to simply write the text cues into a specified div-element in the DOM and thus expose them directly in the Browser. A third idea could be to expose the text cues in an iframe-like element to avoid any cross-site security issues. And a fourth idea that we have discussed is to expose the text cues in an attribute of the track.</p>
<p>All of this obviously also relates to how to actually render the text cues and whether to render them in a shadow DOM so as to make the JavaScript reading separate from the rendering and address security and copyright issues. I&#8217;d be curious in opinions here on how it should be done.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingertech.net/2010/04/11/introducing-media-accessibilit-into-html5-media/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>HTML5 Media and Accessibility presentation</title>
		<link>http://blog.gingertech.net/2010/02/23/html5-media-and-accessibility-presentation/</link>
		<comments>http://blog.gingertech.net/2010/02/23/html5-media-and-accessibility-presentation/#comments</comments>
		<pubDate>Tue, 23 Feb 2010 03:05:51 +0000</pubDate>
		<dc:creator>silvia</dc:creator>
				<category><![CDATA[Digital Media]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[open codecs]]></category>
		<category><![CDATA[standards]]></category>
		<category><![CDATA[video accessibility]]></category>
		<category><![CDATA[accessibility]]></category>
		<category><![CDATA[html5 media]]></category>
		<category><![CDATA[presentation]]></category>

		<guid isPermaLink="false">http://blog.gingertech.net/?p=952</guid>
		<description><![CDATA[Today, I was invited to give a talk at my old workplace CSIRO about the HTML5 media elements and accessibility.
A lot of the things that have gone into Ogg and that are now being worked on in the W3C in different working groups &#8211; including the Media Fragments and HTML5 WGs &#8211; were also of [...]]]></description>
			<content:encoded><![CDATA[<p>Today, I was invited to give a talk at my old workplace CSIRO about the HTML5 media elements and accessibility.</p>
<p>A lot of the things that have gone into Ogg and that are now being worked on in the W3C in different working groups &#8211; including the Media Fragments and HTML5 WGs &#8211; were also of concern in the Annodex project that I worked on while at CSIRO. So I was rather excited to be able to report back about the current status in HTML5 and where we&#8217;re at with accessibility features.</p>
<p><iframe src="http://blog.gingertech.net/wp-content/uploads/2010/02/HAIL_20100223/" width="450px" height="300px"></iframe></p>
<p><a href="http://blog.gingertech.net/wp-content/uploads/2010/02/HAIL_20100223/">Check out the presentation here</a>. It contains a good collection of links to exciting demos of what is possible with the new HTML5 media elements when combined with other HTML features.</p>
<p>I tried something now with this presentation: I wrote it in <a href="http://meyerweb.com/eric/tools/s5/">a tool called S5</a>, which makes use only of HTML features for the presentation. It was quite a bit slower than I expected, e.g. reloading a page always included having to navigate to that page. Also, it&#8217;s not easily possible to do drawings, unless you are willing to code them all up in HTML. But otherwise I have found it very useful for, in particular, including all the used URLs and video element demos directly in the slides. I was inspired with using this tool by Chris Double&#8217;s slides from LCA about <a href="http://www.bluishcoder.co.nz/2010/02/13/lca-2010-implementing-html5-video-in-firefox.html">implementing HTML 5 video in Firefox</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingertech.net/2010/02/23/html5-media-and-accessibility-presentation/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Accessibility support in Ogg and liboggplay</title>
		<link>http://blog.gingertech.net/2010/02/19/accessibility-support-in-ogg-and-liboggplay/</link>
		<comments>http://blog.gingertech.net/2010/02/19/accessibility-support-in-ogg-and-liboggplay/#comments</comments>
		<pubDate>Fri, 19 Feb 2010 01:57:34 +0000</pubDate>
		<dc:creator>silvia</dc:creator>
				<category><![CDATA[Digital Media]]></category>
		<category><![CDATA[FOMS]]></category>
		<category><![CDATA[LCA]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[open codecs]]></category>
		<category><![CDATA[standards]]></category>
		<category><![CDATA[video accessibility]]></category>
		<category><![CDATA[accessibility]]></category>
		<category><![CDATA[HTML5]]></category>
		<category><![CDATA[html5 media]]></category>
		<category><![CDATA[Ogg]]></category>
		<category><![CDATA[Ogg Theora]]></category>
		<category><![CDATA[Ogg Theora/Vorbis]]></category>
		<category><![CDATA[Ogg video]]></category>
		<category><![CDATA[open media software]]></category>

		<guid isPermaLink="false">http://blog.gingertech.net/?p=928</guid>
		<description><![CDATA[At the recent FOMS/LCA in Wellington, New Zealand, we talked a lot about how Ogg could support accessibility. Technically, this means support for multiple text tracks (subtitles/captions), multiple audio tracks (audio descriptions parallel to main audio track), and multiple video tracks (sign language video parallel to main video track).
Creating multitrack Ogg files
The creation of multitrack [...]]]></description>
			<content:encoded><![CDATA[<p>At the recent FOMS/LCA in Wellington, New Zealand, we talked a lot about how Ogg could support accessibility. Technically, this means support for multiple text tracks (subtitles/captions), multiple audio tracks (audio descriptions parallel to main audio track), and multiple video tracks (sign language video parallel to main video track).</p>
<p><strong>Creating multitrack Ogg files</strong><br />
The creation of multitrack Ogg files is already possible using one of the muxing applications, e.g. oggz-merge. For example, I have my own little collection of multitrack Ogg files at <a href="http://annodex.net/~silvia/itext/elephants_dream/multitrack/">http://annodex.net/~silvia/itext/elephants_dream/multitrack/</a>. But then you are stranded with files that no player will play back.</p>
<p><strong>Multitrack Ogg in Players</strong><br />
As Ogg is now being used in multiple Web browsers in the new HTML5 media formats, there are in particular requirements for accessibility support for the hard-of-hearing and vision-impaired. Either multitrack Ogg needs to become more of a common case, or the association of external media files that provide synchronised accessibility data (captions, audio descriptions, sign language) to the main media file needs to become a standard in HTML5.</p>
<p>As it turn out, both these approaches are being considered and worked on in the W3C. Accessibility data that are audio or video tracks will in the near future have to come out of the media resource itself, but captions and other text tracks will also be available from external associated elements.</p>
<p>The availability of internal accessibility tracks in Ogg is a new use case &#8211; something Ogg has been ready to do, but has not gone into common usage. MPEG files on the other hand have for a long time been used with internal accessibility tracks and thus frameworks and players are in place to decode such tracks and do something sensible with them. This is not so much the case for Ogg.</p>
<p>For example, a current VLC build installed on Windows will display captions, because Ogg Kate support is activated. A current VLC build on any other platform, however, has Ogg Kate support deactivated in the build, so captions won&#8217;t display. This will hopefully change soon, but we have to look also beyond players and into media frameworks &#8211; in particular those that are being used by the browser vendors to provide Ogg support.</p>
<p><strong>Multitrack Ogg in Browsers</strong><br />
Hopefully gstreamer (which is what Opera uses for Ogg support) and ffmpeg (which is what Chrome uses for Ogg support) will expose all available tracks to the browser so they can expose them to the user for turning on and off. Incidentally, a <a href="http://www.w3.org/WAI/PF/HTML/wiki/Media_MultitrackAPI">multitrack media JavaScript API</a> is in development in the W3C HTML5 Accessibility Task Force for allowing such control.</p>
<p>The current version of Firefox uses liboggplay for Ogg support, but liboggplay&#8217;s multitrack support has been sketchy this far. So, Viktor Gal &#8211; the liboggplay maintainer &#8211; and I sat down at FOMS/LCA to discuss this and Viktor developed some patches to make the demo player in the liboggplay package, the glut-player, support the accessibility use cases.</p>
<p>I applied Viktor&#8217;s patch to my local copy of liboggplay and I am very excited to show you the screencast of glut-player playing back a video file with an audio description track and an English caption track all in sync:</p>
<p><video src='http://blog.gingertech.net/wp-content/uploads/2010/02/elephants_dream_with_audiodescriptions_and_captions.ogv' poster='http://blog.gingertech.net/wp-content/uploads/2010/02/elephants_dream_with_audiodescriptions.png' width='450px' controls>elephants_dream_with_audiodescriptions_and_captions</video></p>
<p><strong>Further developments</strong><br />
There are still important questions open: for example, how will a player know that an audio description track is to be played together with the main audio track, but a dub track (e.g. a German dub for an English video) is to be played as an alternative. Such metadata for the tracks is something that Ogg is still missing, but that Ogg can be extended with fairly easily through the use of the Skeleton track. It is something the Xiph community is now working on.</p>
<p><strong>Summary</strong><br />
This is great progress towards accessibility support in Ogg and therefore in Web browsers. And there is more to come soon.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingertech.net/2010/02/19/accessibility-support-in-ogg-and-liboggplay/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
<enclosure url="http://blog.gingertech.net/wp-content/uploads/2010/02/elephants_dream_with_audiodescriptions_and_captions.ogv" length="10693593" type="video/ogg" />
		</item>
		<item>
		<title>Audio Track Accessibility for HTML5</title>
		<link>http://blog.gingertech.net/2010/02/12/audio-track-accessibility-for-html5/</link>
		<comments>http://blog.gingertech.net/2010/02/12/audio-track-accessibility-for-html5/#comments</comments>
		<pubDate>Fri, 12 Feb 2010 10:44:43 +0000</pubDate>
		<dc:creator>silvia</dc:creator>
				<category><![CDATA[Digital Media]]></category>
		<category><![CDATA[FOMS]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[open codecs]]></category>
		<category><![CDATA[standards]]></category>
		<category><![CDATA[video accessibility]]></category>
		<category><![CDATA[accessibility]]></category>
		<category><![CDATA[audio description]]></category>
		<category><![CDATA[audio element]]></category>
		<category><![CDATA[Firefox]]></category>
		<category><![CDATA[html5 media]]></category>
		<category><![CDATA[HTML5 video]]></category>
		<category><![CDATA[multitrack audio]]></category>
		<category><![CDATA[multitrack video]]></category>
		<category><![CDATA[video element]]></category>
		<category><![CDATA[W3C]]></category>

		<guid isPermaLink="false">http://blog.gingertech.net/?p=911</guid>
		<description><![CDATA[I have talked a lot about synchronising multiple tracks of audio and video content recently. The reason was mainly that I foresee a need for more than two parallel audio and video tracks, such as audio descriptions for the vision-impaired or dub tracks for internationalisation, as well as sign language tracks for the hard-of-hearing.
It is [...]]]></description>
			<content:encoded><![CDATA[<p>I have talked a lot about synchronising multiple tracks of audio and video content recently. The reason was mainly that I foresee a need for more than two parallel audio and video tracks, such as audio descriptions for the vision-impaired or dub tracks for internationalisation, as well as sign language tracks for the hard-of-hearing.</p>
<p>It is almost impossible to introduce a good scheme to deliver the right video composition to a target audience. Common people will prefer bare a/v, vision-impaired would probably prefer only audio plus audio descriptions (but will probably take the video), and the hard-of-hearing will prefer video plus captions and possibly a sign language track . While it is possible to dynamically create files that contain such tracks on a server and then deliver the right composition, implementation of such a server method has not been very successful in the last years and it would likely take many years to roll out such new infrastructure.</p>
<p>So, the only other option we have is to synchronise completely separate media resource together as they are selected by the audience.</p>
<p>It is this need that this HTML5 accessibility demo is about: Check out the <a href="http://annodex.net/~silvia/itext/elephant_separate_audesc_dub.html">demo of multiple media resource synchronisation</a>.</p>
<p>I created a Ogg video with only a video track (10m53s750). Then I created an audio track that is the original English audio track (10m53s696). Then I used a Spanish dub track that I found through BlenderNation as an alternative audio track (10m58s337). Lastly, I created an audio description track in the original language (10m53s706). This creates a video track with three optional audio tracks.</p>
<p>I took away all native controls from these elements when using the HTML5 audio and video tag and ran my own stop/play and seeking approaches, which handled all media elements in one go.</p>
<p>I was mostly interested in the quality of this experience. Would the different media files stay mostly in sync? They are normally decoded in different threads, so how big would the drift be?</p>
<p>The <a href="http://annodex.net/~silvia/itext/elephant_separate_audesc_dub.html">resulting page</a> is the basis for such experiments with synchronisation.</p>
<p>The page prints the current playback position in all of the media files at a constant interval of 500ms. Note that when you pause and then play again, I am re-synching the audio tracks with the video track, but not when you just let the files play through.</p>
<p>I have let the files play through on my rather busy Macbook and have achieved the following interesting drift over the course of about 9 minutes:</p>
<p><a href="http://blog.gingertech.net/wp-content/uploads/2010/02/elephant_multiple_files_drift.png"><img src="http://blog.gingertech.net/wp-content/uploads/2010/02/elephant_multiple_files_drift-300x203.png" alt="Drift between multiple parallel played media elements" title="elephant_multiple_files_drift" width="300" height="203" class="size-medium wp-image-914" /></a></p>
<p>You will see that the video was the slowest, only doing roughly 540s, while the Spanish dub did 560s in the same time.</p>
<p>To fix such drifts, you can always include regular re-synchronisation points into the video playback. For example, you could set a timeout on the playback to re-sync every 500ms. Within such a short time, it is almost impossible to notice a drift. Don&#8217;t re-load the video, because it will lead to visual artifacts. But do use the video&#8217;s currentTime to re-set the others. (UPDATE: Actually, it depends on your situation, which track is the best choice as the main timeline. See also comments below.)</p>
<p>It is a workable way of associating random numbers of media tracks with videos, in particular in situations where the creation of merged files cannot easily be included in a workflow.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingertech.net/2010/02/12/audio-track-accessibility-for-html5/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Government Report: &#8220;Access to Electronic Media for the Hearing and Vision Impaired&#8221;</title>
		<link>http://blog.gingertech.net/2010/01/29/government-report-access-to-electronic-media-for-the-hearing-and-vision-impaired/</link>
		<comments>http://blog.gingertech.net/2010/01/29/government-report-access-to-electronic-media-for-the-hearing-and-vision-impaired/#comments</comments>
		<pubDate>Fri, 29 Jan 2010 12:44:40 +0000</pubDate>
		<dc:creator>silvia</dc:creator>
				<category><![CDATA[Digital Media]]></category>
		<category><![CDATA[standards]]></category>
		<category><![CDATA[video accessibility]]></category>
		<category><![CDATA[Australian Government]]></category>
		<category><![CDATA[HTML5 video]]></category>
		<category><![CDATA[plans]]></category>
		<category><![CDATA[report]]></category>

		<guid isPermaLink="false">http://blog.gingertech.net/?p=894</guid>
		<description><![CDATA[Today was the last day to provide a submission and input to the Australian Government&#8217;s discussion report on &#8220;Access to Electronic Media for the Hearing and Vision Impaired: Approaches for Consideration&#8221;.
The report explains the Australian Government&#8217;s existing regulatory framework for accessibility to audio-visual content on TV, digital TV, DVDs, cinemas, and the Internet, and provides [...]]]></description>
			<content:encoded><![CDATA[<p>Today was the last day to provide a submission and input to the Australian Government&#8217;s discussion report on <a href="http://www.dbcde.gov.au/television/television_captioning/television_captioning_discussion_paper/media_access_discussion_report">&#8220;Access to Electronic Media for the Hearing and Vision Impaired: Approaches for Consideration&#8221;</a>.</p>
<p>The report explains the Australian Government&#8217;s existing regulatory framework for accessibility to audio-visual content on TV, digital TV, DVDs, cinemas, and the Internet, and provides an overview about what it is planning to do over the next 3-5 years.</p>
<p>It is interesting to read that according to the Australian Bureau of Statistics about 2.67 million Australians &#8211; one in every eight people &#8211; have some form of hearing loss and 284,000 are completely or partially blind. Also, it is expected that these numbers will increase with an ageing population and obesity-linked diabetes are expected to continue to increase these numbers.</p>
<p>For obvious reasons, I was particularly interested in the Internet-related part of the report. It was the second-last section (number five), and to be honest, I was rather disappointed: only 3 pages of the 40 page long report concerned themselves with Internet content. Also, the main message was that &#8220;at this time the costs involved with providing captions for online content were deemed to represent an undue financial impost on a relatively new and developing service.&#8221;</p>
<p>Audio descriptions weren&#8217;t even touched with a stick and both were written off with &#8220;a lack of clear online caption production and delivery standard and requirements&#8221;. There is obviously a lot of truth to the statements of the report &#8211; the Internet audio-visual content industry is still fairly young compared to e.g. TV, and there are a multitude of standards rather than a single clear path.</p>
<p>However, I believe the report neglected to mention the new HTML5 video and audio elements and the opportunity they provide. Maybe HTML5 was excluded because it wasn&#8217;t expected to be relevant within the near future. I believe this is a big mistake and governments should pay more attention to what is happening with HTML5 audio and video and the opportunities they open for accessibility.</p>
<p>In the end, I made a <a href='http://blog.gingertech.net/wp-content/uploads/2010/01/submission_accessibility_pfeiffer1.pdf'>submission</a> because I wanted the Australian Government to wake up to the HTML5 efforts and I wanted to correct a mistake they made with claiming MPEG-2 was &#8220;not compatible with the delivery of closed audio descriptions&#8221;.</p>
<p>I believe a lot more can be done with accessibility for Internet content than just &#8220;monitor international developments&#8221; and industry partnership with disability representative groups. I therefore proposed to undertake trials in particular with textual audio descriptions to see if they could be produced in a similar manner to captions, which would make their cost come down enormously. Also I suggested actually aiming for <a href="http://www.w3.org/TR/WCAG20/#media-equiv">WCAG 2.0 conformance</a> within the next 5 years &#8211; which for audio-visual content means at minimum captions and audio descriptions.</p>
<p>You can read the report <a href="http://www.dbcde.gov.au/__data/assets/pdf_file/0004/123187/Media_Access_Review_Discussion_Report.pdf">here</a> and my 4 page long submission <a href="http://blog.gingertech.net/wp-content/uploads/2010/01/submission_accessibility_pfeiffer1.pdf">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingertech.net/2010/01/29/government-report-access-to-electronic-media-for-the-hearing-and-vision-impaired/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Tutorial on HTML5 open video at LCA 2010</title>
		<link>http://blog.gingertech.net/2010/01/26/tutorial-on-html5-open-video-at-lca-2010/</link>
		<comments>http://blog.gingertech.net/2010/01/26/tutorial-on-html5-open-video-at-lca-2010/#comments</comments>
		<pubDate>Tue, 26 Jan 2010 01:40:45 +0000</pubDate>
		<dc:creator>silvia</dc:creator>
				<category><![CDATA[Digital Media]]></category>
		<category><![CDATA[LCA]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[open codecs]]></category>
		<category><![CDATA[standards]]></category>
		<category><![CDATA[video accessibility]]></category>
		<category><![CDATA[Firefox]]></category>
		<category><![CDATA[HTML5 video]]></category>
		<category><![CDATA[Ogg Theora]]></category>
		<category><![CDATA[tutorial]]></category>

		<guid isPermaLink="false">http://blog.gingertech.net/?p=881</guid>
		<description><![CDATA[During last week&#8217;s LCA, Jan Gerber, Michael Dale and I gave a 3 hour tutorial on how to publish HTML5 video in an open format.
We basically taught people how to create and publish Ogg Theora video in HTML5 Web pages and how to make them work across browsers, including much of the available tools and [...]]]></description>
			<content:encoded><![CDATA[<p>During last week&#8217;s LCA, Jan Gerber, Michael Dale and I gave a 3 hour tutorial on <a href="http://www.lca2010.org.nz/programme/schedule/view_talk/50180?day=thursday">how to publish HTML5 video in an open format</a>.</p>
<p>We basically taught people how to create and publish Ogg Theora video in HTML5 Web pages and how to make them work across browsers, including much of the available tools and libraries. We&#8217;re hoping that some people will have learnt enough to include modules in CMSes such as Drupal, Joomla and Wordpress, which will easily support the publishing of Ogg Theora.</p>
<p>I have been asked to share the material that we used. It consists of:</p>
<ul>
<li><a href='http://blog.gingertech.net/wp-content/uploads/2010/01/HTML5_Tutorial.pdf'>HTML5_Tutorial</a> (611KB)</li>
<li><a href='http://blog.gingertech.net/wp-content/uploads/2010/01/demo.tar.gz'>the example videos</a> (328MB), and</li>
<li><a href='http://blog.gingertech.net/wp-content/uploads/2010/01/exercises.tar.gz'>HTML5 video exercises</a> (3.4KB).</li>
</ul>
<p>Note that if you would like to walk through the exercises, you should install the following software beforehand:</p>
<ul>
<li>oggz-tools</li>
<li><a href="http://sourceforge.net/projects/oggvideotools/files/">oggvideotools</a></li>
<li>apache2 or a Web server of your choice</li>
<li><a href="http://v2v.cc/~j/ffmpeg2theora/">ffmpeg2theora</a></li>
<li><a href="http://getfirefox.com/">firefox3.5+</a></li>
<li><a href="http://firefogg.org/">firefogg</a> plugin</li>
<li><a href="http://getfirebug.com/">firebug</a> plugin</li>
<li>vlc, mplayer, totem or xine</li>
<li>kino or pitivi or another video editor that exports Theora, e.g. iMovie with XiphQT</li>
</ul>
<p>You might need to look for packages of your favourite OS (e.g. <a href="http://firefogg.org/nightly/">Windows or Mac</a>, <a href="https://launchpad.net/~theora/+archive/ppa">Ubuntu or Debian</a>).</p>
<p>The exercises include:</p>
<ul>
<li>creating a Ogg video from an editor</li>
<li>transcoding a video using http://firefogg.org/</li>
<li>creating a poster image using OggThumb</li>
<li>writing a first HTML5 video Web page with Ogg Theora</li>
<li>publishing it on a Web Server, with correct MIME type &#038; Duration hint</li>
<li>writing a second HTML5 video Web page with Ogg Theora &#038; MP4 to cover Safari/Webkit</li>
<li>transcoding using ffmpeg2theora in a script</li>
<li>writing a third HTML5 video Web page with Cortado fallback</li>
<li>writing a fourth Web page using &#8220;Video for Everybody&#8221;</li>
<li>writing a fifth Web page using &#8220;mwEmbed&#8221;</li>
<li>writing a sixth Web page using firefogg for transcoding before upload</li>
<li>and a seventh one with a progress bar</li>
<li>encoding srt subtitles into an Ogg Kate track</li>
<li>writing an eighth Web page using cortado to display the Ogg Kate track</li>
</ul>
<p>For those that would like to see the slides here immediately, a special flash embed:</p>
<p><object style="margin:0px" width="425" height="355"><param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=html5tutorial-100126051350-phpapp01&#038;rel=0&#038;stripped_title=html5-open-video-tutorial" /><param name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/><embed src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=html5tutorial-100126051350-phpapp01&#038;rel=0&#038;stripped_title=html5-open-video-tutorial" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="355"></embed></object></p>
<p>Enjoy!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingertech.net/2010/01/26/tutorial-on-html5-open-video-at-lca-2010/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Opera&#8217;s present for the New Year</title>
		<link>http://blog.gingertech.net/2010/01/01/operas-present-for-the-new-year/</link>
		<comments>http://blog.gingertech.net/2010/01/01/operas-present-for-the-new-year/#comments</comments>
		<pubDate>Fri, 01 Jan 2010 06:58:15 +0000</pubDate>
		<dc:creator>silvia</dc:creator>
				<category><![CDATA[open codecs]]></category>
		<category><![CDATA[standards]]></category>
		<category><![CDATA[video accessibility]]></category>
		<category><![CDATA[gstreamer]]></category>
		<category><![CDATA[HTML5 video]]></category>
		<category><![CDATA[media fragments URI]]></category>
		<category><![CDATA[Ogg Theora]]></category>
		<category><![CDATA[Ogg Vorbis]]></category>
		<category><![CDATA[Opera]]></category>
		<category><![CDATA[video element]]></category>

		<guid isPermaLink="false">http://blog.gingertech.net/?p=859</guid>
		<description><![CDATA[I am a very happy camper today! Not because of the New Year &#8211; well, yes, there are new opportunities and challenges for the New Year. But I&#8217;ve just received an email from Philip Jägenstedt announcing the New Year&#8217;s pre-alpha release of Opera 10.50 has HTML5 video support! Congratulations, Philip, congratulations Opera!
Opera&#8217;s HTML5 video support [...]]]></description>
			<content:encoded><![CDATA[<p>I am a very happy camper today! Not because of the New Year &#8211; well, yes, there are new opportunities and challenges for the New Year. But I&#8217;ve just received an email from Philip Jägenstedt announcing the <a href="http://my.opera.com/core/blog/2009/12/31/re-introducing-video">New Year&#8217;s pre-alpha release of Opera 10.50 has HTML5 video support</a>! Congratulations, Philip, congratulations Opera!</p>
<p>Opera&#8217;s HTML5 video support is based on using <a href="http://www.gstreamer.net/">GStreamer</a>, an open source multimedia framework used widely on Linux systems. On Linux, the Opera package will make sure you have GStreamer installed and thus provide HTML5 video support on all codecs that your GStreamer install supports. On other platforms, Opera will come packaged with a rudimentary version of GStreamer which provides only core codec support. Right now, that has only been done for Windows &#8211; I&#8217;m looking forward for the Mac version!</p>
<p>As core codecs, Opera has decided to support Ogg Vorbis, Ogg Theora and uncompressed WAVE PCM. This makes it the third browser to support Ogg Vorbis/Theora next to Firefox and Chrome and moves the balance in codec support in favor of open and royalty free codecs: three browsers to support Ogg Theora/Vorbis vs two browsers to support MPEG H.264/AAC.</p>
<p>It&#8217;s also cool to see Philips announcement of intending to support the W3C Media Fragments specification for directly addressing time offsets (and other fragments). This is probably related to the implementation of seeking, which is the same problem, technically. Lack of seeking is actually a bit annoying right now, since you cannot jump to time offsets in the video or find out how long the video without having played it through completely.</p>
<p>It&#8217;s also cool to see that Opera is on board with wanting to implement caption support. It has already started accessibility support for the video element with the following:</p>
<ul>
<li>you can tab onto the video controls: play/pause, transport bar, volume are tabbed to separately</li>
<li>space bar toggles between play and pause when keyboard focus is on it</li>
<li>when focused on the volume button, up/down arrow increases/decreases volume, space bar turns it on and off</li>
</ul>
<p>I&#8217;m sure that once Opera has seeking support implemented, the transport bar will get improved and display progress, and also provide keyboard accessibility through being able to jump forwards and backwards with arrow key combinations.</p>
<p>Very nice work, Opera, and an awesome New Year&#8217;s present to the world!!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingertech.net/2010/01/01/operas-present-for-the-new-year/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Manifests for exposing the structure of a Composite Media Resource</title>
		<link>http://blog.gingertech.net/2009/11/25/manifests-exposing-structure-of-a-composite-media-resource/</link>
		<comments>http://blog.gingertech.net/2009/11/25/manifests-exposing-structure-of-a-composite-media-resource/#comments</comments>
		<pubDate>Wed, 25 Nov 2009 01:21:03 +0000</pubDate>
		<dc:creator>silvia</dc:creator>
				<category><![CDATA[Digital Media]]></category>
		<category><![CDATA[standards]]></category>
		<category><![CDATA[video accessibility]]></category>
		<category><![CDATA[accessibility]]></category>
		<category><![CDATA[composite media resource]]></category>
		<category><![CDATA[declarative syntax]]></category>
		<category><![CDATA[HTML5]]></category>
		<category><![CDATA[multitrack video]]></category>
		<category><![CDATA[source element]]></category>
		<category><![CDATA[video element]]></category>
		<category><![CDATA[W3C]]></category>

		<guid isPermaLink="false">http://blog.gingertech.net/?p=739</guid>
		<description><![CDATA[In the previous post I explained that there is a need to expose the tracks of a time-linear media resource to the user agent (UA). Here, I want to look in more detail at different possibilities of how to do so, their advantages and disadvantages.
Note: A lot of this has come out of discussions I [...]]]></description>
			<content:encoded><![CDATA[<p>In the <a href="http://blog.gingertech.net/2009/11/23/model-of-a-time-linear-media-resource/">previous post</a> I explained that there is a need to expose the tracks of a time-linear media resource to the user agent (UA). Here, I want to look in more detail at different possibilities of how to do so, their advantages and disadvantages.</p>
<p>Note: A lot of this has come out of discussions I had at the recent <a href="http://www.w3.org/2009/11/TPAC/">W3C TPAC</a> and is still in flux, so I am writing this to start discussions and brainstorm.</p>
<h3>Declarative Syntax vs JavaScript API</h3>
<p>We can expose a media resource&#8217;s tracks either through a JavaScript function that can loop through the tracks and provide access to the tracks and their features, or we can do this through declarative syntax.</p>
<p>Using declarative syntax has the advantage of being available even if JavaScript is disabled in a UA. The markup can be parsed easily and default displays can be prepared without having to actually decode the media file(s).</p>
<p>OTOH, it has the disadvantage that it may not necessarily represent what is actually in the binary resource, but instead what the Web developer assumed was in the resource (or what he forgot to update). This may lead to a situation where a &#8220;404&#8243; may need to be given on a media track.</p>
<p>A further disadvantage is that when somebody copies the media element onto another Web page, together with all the track descriptions, and then the original media resource is changed (e.g. a subtitle track is added), this has not the desired effect, since the change does not propagate to the other Web page.</p>
<p>For these reasons, I thought that a JavaScript interface was preferable over declarative syntax.</p>
<p>However, recent discussions, in particular with some accessibility experts, have convinced me that declarative syntax is preferable, because it allows the creation of a menu for turning tracks on/off without having to even load the media file. Further, declarative syntax allows to treat multiple files  and &#8220;native tracks&#8221;  of a  <a href="http://blog.gingertech.net/2009/11/23/model-of-a-time-linear-media-resource/">virtual media resource</a> in an identical manner.</p>
<h3>Extending Existing Declarative Syntax</h3>
<p>The HTML5 media elements already have declarative syntax to specify <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#the-source-element">multiple source media files for media elements</a>. The &lt;source> element is typically used to list video in mpeg4 and ogg format for support in different browsers, but has also been envisaged for different screensize and bandwidth encodings.</p>
<p>The &lt;source> elements are generally meant to list different resources that contribute towards the media element. In that respect, let&#8217;s try using it for declaring a manifest of tracks of the virtual media resource on an example:</p>
<pre>
  &lt;video>
    &lt;source id='av1' src='video.3gp' type='video/mp4' media='mobile' lang='en'
                     role='media' >
    &lt;source id='av2' src='video.mp4' type='video/mp4' media='desktop' lang='en'
                     role='media' >
    &lt;source id='av3' src='video.ogv' type='video/ogg' media='desktop' lang='en'
                     role='media' >
    &lt;source id='dub1' src='video.ogv?track=audio[de]' type='audio/ogg' lang='de'
                     role='dub' >
    &lt;source id='dub2' src='audio_ja.oga' type='audio/ogg' lang='ja'
                     role='dub' >
    &lt;source id='ad1' src='video.ogv?track=auddesc[en]' type='audio/ogg' lang='en'
                     role='auddesc' >
    &lt;source id='ad2' src='audiodesc_de.oga' type='audio/ogg' lang='de'
                     role='auddesc' >
    &lt;source id='cc1' src='video.mp4?track=caption[en]' type='application/ttaf+xml'
                     lang='en' role='caption' >
    &lt;source id='cc2' src='video.ogv?track=caption[de]' type='text/srt; charset="ISO-8859-1"'
                     lang='de' role='caption' >
    &lt;source id='cc3' src='caption_ja.ttaf' type='application/ttaf+xml' lang='ja'
                     role='caption' >
    &lt;source id='sign1' src='signvid_ase.ogv' type='video/ogg; codecs="theora"'
                     media='desktop' lang='ase' role='sign' >
    &lt;source id='sign2' src='signvid_gsg.ogv' type='video/ogg; codecs="theora"'
                     media='desktop' lang='gsg' role='sign' >
    &lt;source id='sign3' src='signvid_sfs.ogv' type='video/ogg; codecs="theora"'
                     media='desktop' lang='sfs' role='sign' >
    &lt;source id='tad1' src='tad_en.srt' type='text/srt; charset="ISO-8859-1"'
                     lang='en' role='tad' >
    &lt;source id='tad2' src='video.ogv?track=tad[de]' type='text/srt; charset="ISO-8859-1"'
                     lang='de' role='tad' >
    &lt;source id='tad3' src='tad_ja.srt' type='text/srt; charset="EUC-JP"' lang='ja'
                     role='tad' >
  &lt;/video>
</pre>
<p>Note that this somewhat ignores my previously proposed special itext tag for handling text tracks. I am doing this here to experiment with a more integrative approach with the virtual media resource idea from the previous post. This may well be a better solution than a specific new text-related element. Most of the attributes of the itext element are, incidentally, covered.</p>
<p>You will also notice that some of the tracks are references to tracks inside binary media files using the <a href="http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/">Media Fragment URI specification</a> while others link to full files. An example is <em>video.ogv?track=auddesc[en]</em>. So, this is a uniform means of exposing all the tracks that are part of a (virtual) media resource to the UA, no matter whether in-band or in external files. It actually relies on the UA or server being able to resolve these URLs.</p>
<h4>&#8220;type&#8221; attribute</h4>
<p>&#8220;media&#8221; and &#8220;type&#8221; are existing attributes of the &lt;source> element in HTML5 and meant to help the UA determine what to do with the referenced resource. The current spec states:</p>
<blockquote><p>The &#8220;type&#8221; attribute gives the type of the media resource, to help the user agent determine if it can play this media resource before fetching it.</p></blockquote>
<p>The word &#8220;play&#8221; might need to be replaced with &#8220;decode&#8221; to cover several different MIME types.</p>
<p>The &#8220;type&#8221; attribute was also extended with the possibility to add the &#8220;charset&#8221; MIME parameter of a linked text resource &#8211; this is particularly important for SRT files, which don&#8217;t handle charsets very well. It avoids having to add an additional attribute and is analogous to the &#8220;codecs&#8221; MIME parameter used by audio and video resources.</p>
<h4>&#8220;media&#8221; attribute</h4>
<p>Further, the spec states:</p>
<blockquote><p>The &#8220;media&#8221; attribute gives the intended media type of the media resource, to help the user agent determine if this media resource is useful to the user before fetching it. Its value must be a valid media query.</p></blockquote>
<p>The &#8220;mobile&#8221; and &#8220;desktop&#8221; values are hints that I&#8217;ve used for simplicity reasons. They could be improved by giving appropriate bandwidth limits and width/height values, etc. Other values could be different camera angles such as topview, frontview, backview. The media query aspect has to be looked into in more depth.</p>
<h4>&#8220;lang&#8221; attribute</h4>
<p>The above example further uses &#8220;lang&#8221; and &#8220;role&#8221; attributes:</p>
<p>The &#8220;lang&#8221; attribute is an existing global attribute of HTML5, which typically indicates the language of the data inside the element. Here, it is used to indicate the language of the referenced resource. This is possibly not quite the best name choice and should maybe be called &#8220;hreflang&#8221;, which is already used in multiple other elements to signify the language of the referenced resource.</p>
<h4>&#8220;role&#8221; attribute</h4>
<p>The &#8220;role&#8221; attribute is also an existing attribute in HTML5, included from <a href="http://www.w3.org/TR/wai-aria/">ARIA</a>. It currently doesn&#8217;t cover media resources, but could be extended. The suggestion here is to specify the roles of the different media tracks &#8211; the ones I have used here are:</p>
<ul>
<li>&#8220;media&#8221;: a main media resource &#8211; typically contains audio and video and possibly more</li>
<li>&#8220;dub&#8221;: a audio track that provides an alternative dubbed language track</li>
<li>&#8220;auddesc&#8221;: a audio track that provides an additional audio description track</li>
<li>&#8220;caption&#8221;: a text track that provides captions</li>
<li>&#8220;sign&#8221;: a video-only track that provides an additional sign language video track</li>
<li>&#8220;tad&#8221;: a text track that provides textual audio descriptions to be read by a screen reader or a braille device</li>
</ul>
<p>Further roles could be &#8220;music&#8221;, &#8220;speech&#8221;, &#8220;sfx&#8221; for audio tracks, &#8220;subtitle&#8221;, &#8220;lyrics&#8221;, &#8220;annotation&#8221;, &#8220;chapters&#8221;, &#8220;overlay&#8221; for text tracks, and &#8220;alternate&#8221; for a alternate main media resource, e.g. a different camera angle.</p>
<h4>Track activation</h4>
<p>The given attributes help the UA decide what to display.</p>
<p>It will firstly find out from the &#8220;type&#8221; attribute if it is capable of decoding the track.</p>
<p>Then, the UA will find out from the &#8220;media&#8221; query, &#8220;role&#8221;, and &#8220;lang&#8221; attributes whether a track is relevant to its user. This will require checking the capabilities of the device, network, and the user preferences.</p>
<p>Further, it could be possible for Web authors to influence whether a track is displayed or not through CSS parameters on the &lt;source> element: &#8220;display: none&#8221; or &#8220;visibility: hidden/visible&#8221;.</p>
<p>Examples for track activation that a UA would undertake using the example above:</p>
<p>Given a desktop computer with Firefox, German language preferences, captions and sign language activated, the UA will fetch the original video at video.ogv (for Firefox), the German caption track at video.ogv?track=caption[de], and the German sign language track at signvid_gsg.ogv (maybe also the German dubbed audio track at video.ogv?track=audio[de], which would then replace the original one).</p>
<p>Given a desktop computer with Safari, English language preferences and audio descriptions activated, the UA will fetch the original video at video.mp4 (for Safari) and the textual audio description at tad_en.srt to be displayed through the screen reader, since it cannot decode the Ogg audio description track at video.ogv?track=auddesc[en].</p>
<p>Also, all decodeable tracks could be exposed in a right-click menu and added on-demand.</p>
<h4>Display styling</h4>
<p>Default styling of these tracks could be:</p>
<ul>
<li>video or alternate video in the video display area,</li>
<li>sign language probably as picture-in-picture (making it useless on a mobile and only of limited use on the desktop),</li>
<li>captions/subtitles/lyrics as overlays on the bottom of the video display area (or whatever the caption format prescribes),</li>
<li>textual audio descriptions as ARIA live regions hidden behind the video or off-screen.</li>
</ul>
<p>Multiple audio tracks can always be played at the same time.</p>
<p>The Web author could also define the display area for a track through CSS styling and the UA would then render the data into that area at the rate that is required by the track.</p>
<h4>How good is this approach?</h4>
<p>The advantage of this new proposal is that it builds basically on existing HTML5 components with minimal additions to satisfy requirements for content selection and accessibility of media elements. It is a declarative approach to the multi-track media resource challenge.</p>
<p>However, it leaves most of the decision on what tracks are alternatives of/additions to each other and which tracks should be displayed to the UA. The UA makes an informed decision because it gets a lot of information through the attributes, but it still has to make decisions that may become rather complex. Maybe there needs to be a grouping level for alternative tracks and additional tracks &#8211; similar to what I did with the <a href="https://wiki.mozilla.org/Accessibility/HTML5_captions_v2">second itext proposal</a>, or similar to the &lt;switch> and &lt;par> elements of SMIL.</p>
<p>A further issue is one that is currently being discussed within the Media Fragments WG: how can you discover the track composition and the track naming/uses of a particular media resource? How, e.g., can a Web author on another Web site know how to address the tracks inside your binary media resource? A HTML specification like the above can help. But what if that doesn&#8217;t exist? And what if the file is being used offline?</p>
<h3>Alternative Manifest descriptions</h3>
<p>The need to manifest the track composition of a media resource is not a new one. Many other formats and applications had to deal with these challenges before &#8211; some have defined and published their format.</p>
<p>I am going to list a few of these formats here with examples. They could inspire a next version of the above proposal with grouping elements.</p>
<h4>Microsoft ISM files (SMIL subpart)</h4>
<p>With the release of IIS7, Microsoft introduced &#8220;Smooth Streaming&#8221;, which uses chunking on files on the server to deliver adaptive streaming to Silverlight clients over HTTP. To inform a smooth streaming client of the tracks available for a media resource, Microsoft defined ism files: <a href="http://msdn.microsoft.com/en-us/library/ee230817.aspx">IIS Smooth Streaming Server Manifest</a> files.</p>
<p>This is a short example &#8211; a longer one can be found <a href="http://msdn.microsoft.com/en-us/library/ee230810.aspx">here</a>:</p>
<pre>
&lt;?xml version=”1.0? encoding=”utf-8??>
  &lt;smil xmlns=”http://www.w3.org/2001/SMIL20/Language”>
  &lt;head>
    &lt;meta name=”clientManifestRelativePath” content=”manifest” />
  &lt;/head>
  &lt;body>
    &lt;switch>
      &lt;video src=”video.ismv” systemBitrate=”490000?>
        &lt;param name=”trackID” value=”1? valueType=”data” />
      &lt;/video>
      &lt;audio src=”video.ismv” systemBitrate=”76000?>
        &lt;param name=”trackID” value=”2? valueType=”data” />
      &lt;/audio>
      &lt;textstream src="video.ismv" systemBitrate="700000" systemLanguage="en">
        &lt;param name="trackID" value="3" valuetype="data" />
      &lt;/textstream>
    &lt;/switch>
  &lt;/body>
&lt;/smil>
</pre>
<p>This short example is a simple video file with an audio, a video, and text track. The ismv file is actually a mpeg4 file, but pre-chunked. Bitrate and trackID of the three tracks are specified and the parallel nature of these three tracks is described through being parallel inside the &lt;switch> element.</p>
<p><a href="http://msdn.microsoft.com/en-us/library/ee230817.aspx">According to Microsoft,</a> the server manifest serves three key roles:</p>
<ul>
<li>Specify the group of media files that comprise the presentation.</li>
<li>Specify heuristic parameters, such as bit rate and fragment quality index, for each track.</li>
<li>Abstract the layout of the tracks into files on disk for consumption by the client.</li>
</ul>
<p>This is very similar to our needs here and thus the specification also looks very similar to what we ended up with above, though the &lt;source> element&#8217;s specification is much denser than the SMIL subpart used here.</p>
<h4>Xiph ROE files</h4>
<p>The Xiph community also realised the for varying use cases there is a need for a manifest file format for multi-track media files. Authoring of a multi-track file, content negotiation, and content representation are three example uses of <a href="http://wiki.xiph.org/ROE">ROE</a>, the Rich Open multitrack media Exposition format.</p>
<p>This is an example ROE file:</p>
<pre>
&lt;?xml version="1.0"?>
 &lt;ROE>
   &lt;head>
     &lt;link id="html_linkback" rel="alternate" type="text/html"
                 href="http://example.com/full_video.html"/>
   &lt;/head>
   &lt;body>
     &lt;track id="v" provides="video">
       &lt;switch distinction="angle" default="v1">
         &lt;mediaSource id="v1" content-type="video/theora"
                                  src="http://example.com/angle1.ogv?track=v1" />
         &lt;mediaSource id="v2" content-type="video/theora"
                                   src="http://example.com/angle2.ogv" />
       &lt;/switch>
     &lt;/track>
     &lt;track id="a" provides="audio">
       &lt;switch distinction="Content-Language" default="a3">
           &lt;mediaSource id="a1" lang="en" content-type="audio/vorbis"
                                     src="http://example.com/lang1.oga" />
           &lt;mediaSource id="a2" lang="de" content-type="audio/vorbis"
                                     src="http://example.com/lang2.oga" />
           &lt;mediaSource id="a3" lang="fr" content-type="audio/vorbis"
                                     src="http://example.com/lang3.oga" />
       &lt;/switch>
     &lt;/track>
   &lt;/body>
 &lt;/ROE>
</pre>
<p>ROE is using many SMIL features, just like ISM, but has also introduced further attributes and elements. Since ROE is usable for authoring, it includes the &lt;seq> element to sequence audio, video or text files. This is not necessary for a simple manifests of multi-track media resources and in fact destroys the single timeline paradigm.</p>
<h4>Matterhorn MediaPackage Manifest</h4>
<p>The Opencast Matterhorn project, which is defining an enterprise-level, easy-to-install open source podcast and rich media capture, processing and delivery system has defined a <a href="https://wiki.opencastproject.org/confluence/display/open/MediaPackage+Manifest">media package manifest</a>, which lists the packages (tracks and metadata) contents along with their core technical properties. The track description part of the manifest is again very similar to all the above described formats, even while it contains a lot more technical details:</p>
<pre>
&lt;mediapackage duration="2704016" id="1">
    &lt;media>
        &lt;track id="track-1" type="track/presentation">
            &lt;mimetype>video/quicktime&lt;/mimetype>
            &lt;checksum type="md5">0adc841a6dfd47bd7c8cf8db6cbb71c9&lt;/checksum>
            &lt;url>http://repository.opencastproject.org/123/tracks/slides-vga.mov&lt;/url>
            &lt;size>8754667&lt;/size>
            &lt;duration>2704016&lt;/duration>
            &lt;audio id="stream-1">
                &lt;encoder type="AAC"/>
                &lt;channels>2&lt;/channels>
                &lt;bitdepth>16&lt;/bitdepth>
                &lt;bitrate>256000.0&lt;/bitrate>
            &lt;/audio>
            &lt;video id="stream-2">
                &lt;encoder type="AVC"/>
                &lt;resolution>1024x768&lt;/resolution>
                &lt;scantype type="progressive"/>
                &lt;bitrate>454904.0&lt;/bitrate>
                &lt;framerate>2&lt;/framerate>
            &lt;/video>
        &lt;/track>
    &lt;/media>
&lt;/mediapackage>
</pre>
<p>Most of this technical information should only be relevant to a decoder, but some of it is helpful to making a choice between tracks.</p>
<h4>Further Formats</h4>
<p>Further formats that are capable of describing a media resource manifest, but go with their functionality far beyond that goal are: <a href="http://xml.coverpages.org/mpeg7.html">MPEG-7</a>, <a href="http://xml.coverpages.org/MPEG21-WG-11-N3971-200103.pdf">MPEG-21 DIDL</a>, and general <a href="http://www.w3.org/AudioVideo/">SMIL</a>.</p>
<p>Since their functionalities go much beyond a mere description of the manifest of a multi-track media resource, they are not regarded here as options &#8211; it would be too hard to reduce them to the bare necessities for such a simple exercise. Apart from that, subparts of SMIL have already been used further up.</p>
<h3>Summary</h3>
<p>It is possible that the manifest stated above, which is already almost entirely supported by HTML5, is sufficient for much of the use cases and requirements that underpin this post. Maybe the introduction of a &lt;text> or rather &lt;itext> element is not necessary when the UA knows from the MIME type what kind of a data stream it is dealing with. However, a grouping element to specify alternate and additional tracks and which tracks should be displayed together with another track choice may be a good idea.</p>
<p>For content discovery issues and negotiation over the network, the existence of a manifest on the server that can describe the virtual media resource can also be a valuable addition. It could also be used to communicate the currently available tracks to an embedded location. This is, in fact, how <a href="http://metavid.org/w/index.php?title=Special:MvExportStream&#038;feed_format=roe&#038;stream_name=House_proceeding_06-09-08_01&#038;t=0%3A01%3A38%2F0%3A10%3A00">ROE is being used on metavid</a> &#8211; as an additional attribute on the video or audio element.</p>
<p>Please leave your feedback: Do you agree with the idea of re-using the &lt;source> element for describing all the available tracks for a (virtual composite) media resource instead of defining new elements for specific track types (text, sign language, audio description etc.)? How should we solve the need to describe dependencies and relationships between tracks? Do you agree with the need to have an explicit manifest file on the server that accompanies the media resource?</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingertech.net/2009/11/25/manifests-exposing-structure-of-a-composite-media-resource/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
	</channel>
</rss>
