<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>ginger&#039;s thoughts &#187; Search Results  &#187;  media fragment URI</title>
	<atom:link href="http://blog.gingertech.net/?s=media+fragment+URI&#038;feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://blog.gingertech.net</link>
	<description>Silvia&#039;s blog</description>
	<lastBuildDate>Sat, 07 Aug 2010 08:36:56 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>WebSRT and HTML5 media accessibility</title>
		<link>http://blog.gingertech.net/2010/08/07/websrt-and-html5-media-accessibility/</link>
		<comments>http://blog.gingertech.net/2010/08/07/websrt-and-html5-media-accessibility/#comments</comments>
		<pubDate>Sat, 07 Aug 2010 01:09:13 +0000</pubDate>
		<dc:creator>silvia</dc:creator>
				<category><![CDATA[Digital Media]]></category>
		<category><![CDATA[standards]]></category>
		<category><![CDATA[video accessibility]]></category>
		<category><![CDATA[accessibility]]></category>
		<category><![CDATA[HTML5]]></category>
		<category><![CDATA[html5 media]]></category>
		<category><![CDATA[HTML5 video]]></category>
		<category><![CDATA[media fragments URI]]></category>
		<category><![CDATA[W3C]]></category>
		<category><![CDATA[WHATWG]]></category>

		<guid isPermaLink="false">http://blog.gingertech.net/?p=1040</guid>
		<description><![CDATA[On 23rd July, Ian Hickson, the HTML5 editor, posted an update to the WHATWG mailing list introducing the first draft of a platform for accessibility for the HTML5 &#60;video> element. The platform provides for captions, subtitles, audio descriptions, chapter markers and similar time-synchronized text both in-band (inside the video resource) and out-of-band (as external text [...]]]></description>
			<content:encoded><![CDATA[<p>On 23rd July, Ian Hickson, the HTML5 editor, <a href="http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-July/027386.html">posted an update to the WHATWG mailing list</a> introducing the first draft of a platform for accessibility for the HTML5 &lt;video> element. The platform provides for captions, subtitles, audio descriptions, chapter markers and similar time-synchronized text both in-band (inside the video resource) and out-of-band (as external text files). Right now, the proposal only regards &lt;video>, but I personally believe the same can be applied to the &lt;audio> element, except we have to be a bit more flexible with the rendering approach. Anyway&#8230;</p>
<p>What I want to do here is to summarize what was introduced, together with the improvements that I and some <a href="http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-July/027283.html">others</a> have proposed in <a href="http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-July/027339.html">follow-up</a> <a href="http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-August/027648.html">emails</a>, and list some of the media accessibility needs that we are not yet dealing with.</p>
<p>For those wanting to only selectively read some sections, here is a clickable table of contents of this rather long blog post:</p>
<ul>
<li><a href="#WebSRT">THE WebSRT TIMED TEXT FORMAT</a></li>
<li><a href="#track">ASSOCIATING EXTERNAL TIMED TEXT RESOURCES WITH A VIDEO</a></li>
<li><a href="#API">EXPOSING A LIST OF TimedTracks TO JAVASCRIPT</a></li>
<li><a href="#rendering">RENDERING TimedTracks</a></li>
<li><a href="#summary">SUMMARY AND FURTHER NEEDS</a></li>
</ul>
<h3 id="WebSRT">THE <a href="http://www.whatwg.org/specs/web-apps/current-work/websrt.html">WebSRT</a> TIMED TEXT FORMAT</h3>
<p>The first and to everyone probably most surprising part is the new file format that is being proposed to contain out-of-band time-synchronized text for video. A new format was necessary after the <a href="http://wiki.whatwg.org/wiki/Timed_track_formats">analysis of all relevant existing formats</a> determined that they were either insufficient or hard to use in a Web environment.</p>
<p>The new format is called <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#websrt-0">WebSRT</a> and is an extension to the existing <a href="http://en.wikipedia.org/wiki/SubRip">SRT SubRip</a> format. It is actually also the part of the new specification that I am personally most uncomfortable with. Not that WebSRT is a bad format. It&#8217;s just not sufficient yet to provide all the functionality that a good time-synchronized text format for Web media should. Let&#8217;s look at some examples.</p>
<p>WebSRT is composed of a sequence of <em>timed text cues</em> (that&#8217;s what we&#8217;ve decided to call the pieces of text that are active during a certain time interval). Because of its ancestry of SRT, the text cues can optionally be numbered through. The content of the text cues is currently allowed to contain three different types of text: plain text, minimal markup, and anything at all (also called &#8220;metadata&#8221;).</p>
<p>In its most simple form, a WebSRT file is just an ordinary old SRT file with optional cue numbers and only plain text in cues:</p>
<pre>
  1
  00:00:15.00 --> 00:00:17.95
  At the left we can see...

  2
  00:00:18.16 --> 00:00:20.08
  At the right we can see the...

  3
  00:00:20.11 --> 00:00:21.96
  ...the head-snarlers
</pre>
<p>A bit of a more complex example results if we introduce minimal markup:</p>
<pre>
  00:00:15.00 --> 00:00:17.95 A:start
  Auf der &lt;i>linken&lt;/i> Seite sehen wir...

  00:00:18.16 --> 00:00:20.08 A:end
  Auf der &lt;b>rechten&lt;/b> Seite sehen wir die....

  00:00:20.11 --> 00:00:21.96 A:end
  &lt;1>...die Enthaupter.

  00:00:21.99 --> 00:00:24.36 A:start
  &lt;2>Alles ist sicher.
  Vollkommen &lt;b>sicher&lt;/b>.
</pre>
<p>and add to this a CSS to provide for some colors and special formatting:</p>
<pre>
    ::cue { background: rgba(0,0,0,0.5); }
    ::cue-part(1) { color: red; }
    ::cue-part(2, b) { font-style: normal; text-decoration: underline; }
</pre>
<p>Minimal markup accepts &lt;i>, &lt;b>, &lt;ruby> and a timestamp in &lt;>, providing for italics, bold, and ruby markup as well as karaoke timestamps. Any further styling can be done using the CSS pseudo-elements ::cue and ::cue-part, which accept the features &#8216;color&#8217;, &#8216;text-shadow&#8217;, &#8216;text-outline&#8217;, &#8216;background&#8217;, &#8216;outline&#8217;, and &#8216;font&#8217;.</p>
<p>Note that positioning requires some special notes at the end of the start/end timestamps which can provide for vertical text, line position, text position, size and alignment cue setting. Here is an example with vertically rendered Chinese text, right-aligned at 98% of the video frame:</p>
<pre>
  00:00:15.00 --> 00:00:17.95 A:start D:vertical L:98%
  在左边我们可以看到...

  00:00:18.16 --> 00:00:20.08 A:start D:vertical L:98%
  在右边我们可以看到...

  00:00:20.11 --> 00:00:21.96 A:start D:vertical L:98%
  ...捕蝇草械.

  00:00:21.99 --> 00:00:24.36 A:start D:vertical L:98%
  一切都安全.
  非常地安全.
</pre>
<p>Finally, WebSRT files can be authored with abstract metadata inside cues, which practically means anything at all. Here&#8217;s an example with HTML content:</p>
<pre>
  00:00:15.00 --> 00:00:17.95 A:start
  &lt;img src="pic1.png"/>Auf der &lt;i>linken&lt;/i> Seite sehen wir...

  00:00:18.16 --> 00:00:20.08 A:end
  &lt;img src="pic2.png"/>Auf der &lt;b>rechten&lt;/b> Seite sehen wir die....

  00:00:20.11 --> 00:00:21.96 A:end
  &lt;img src="pic3.png"/>...die &lt;a href="http://members.chello.nl/j.kassenaar/
elephantsdream/subtitles.html">Enthaupter&lt;/a>.

  00:00:21.99 --> 00:00:24.36 A:start
  &lt;img src="pic4.png"/>Alles ist &lt;mark>sicher&lt;/mark>.&lt;br/>Vollkommen &lt;b>sicher&lt;/b>.
</pre>
<p>Here is another example with JSON in the cues:</p>
<pre>
  00:00:00.00 --> 00:00:44.00
  {
    slide: intro.png,
    title: "Really Achieving Your Childhood Dreams" by Randy Pausch,
             Carnegie Mellon University, Sept 18, 2007
  }

  00:00:44.00 --> 00:01:18.00
  {
    slide: elephant.png,
    title: The elephant in the room...
  }

  00:01:18.00 --> 00:02:05.00
  {
    slide: denial.png,
    title: I'm not in denial...
  }
</pre>
<p>What I like about WebSRT:</p>
<ol>
<li>it allows for all sorts of different content in the text cues &#8211; plain text is useful for <a href="http://www.w3.org/WAI/PF/HTML/wiki/Media_Accessibility_Requirements#Texted_Audio_Description">texted audio descriptions</a>, minimal markup is useful for subtitles, captions, karaoke and <a href="http://wiki.whatwg.org/wiki/Use_cases_for_API-level_access_to_timed_tracks#Chapter_Markers">chapters</a>, and &#8220;metadata&#8221; is useful for, well, any data.</li>
<li>it can be easily encapsulated into media resources and thus turned into in-band tracks by regarding each cue as a data packet with time stamps.</li>
<li>it is not verbose</li>
</ol>
<p>Where I think WebSRT still needs improvements:</p>
<ol>
<li>break with the SRT history: since WebSRT and SRT files are so different, WebSRT should get its own MIME type, e.g. text/websrt, and file extensions, e.g. .wsrt; this will free WebSRT for changes that wouldn&#8217;t be possible by trying to keep conformant with SRT</li>
<li>introduce some header fields into WebSRT: the format needs
<ul>
<li>file-wide name-value metadata, such as author, date, copyright, etc</li>
<li>language specification for the file as a hint for font selection and speech synthesis</li>
<li>a possibility for style sheet association in the file header</li>
<li>a means to identify which parser is required for the cues</li>
<li>a magic identifier and a version string of the format</li>
</ul>
</li>
<li>allow innerHTML as an additional format in the cues with the CSS pseudo-elements applying to all HTML elements</li>
<li>allow full use of CSS instead of just the restricted features and also use it for positioning instead of the hard to understand positioning hints</li>
<li>on the minimum markup, provide a neutral structuring element such as &lt;span @id @class @lang> to associate specific styles or specific languages with a subpart of the cue</li>
</ol>
<p>Note that I undertook some experiments with an alternative format that is XML-based and called <a href="https://wiki.mozilla.org/Accessibility/Video_Text_Format">WMML</a> to gain most of these insights and determine the advantages/disadvantages of a xml-based format. The foremost advantage is that there is no automatism with newlines and displayed new lines, which can make the source text file more readable. The foremost disadvantages are verbosity and that there needs to be a simple encoding step to remove all encapsulating header-type content from around the timed text cues before encoding it into a binary media resource.</p>
<h3 id="track"><a href="http://www.whatwg.org/specs/web-apps/current-work/complete/video.html#the-track-element">ASSOCIATING EXTERNAL TIMED TEXT RESOURCES WITH A VIDEO</a></h3>
<p>Now that we have a timed text format, we need to be able to associate it with a media resource in HTML5. This is what the <a href="http://www.whatwg.org/specs/web-apps/current-work/complete/video.html#the-track-element"><i>&lt;track> element</i></a> was introduced for. It associates the timestamps in the timed text cues with the timeline of the video resource. The browser is then expected to render these during the time interval in which the cues are expected to be active.</p>
<p>Here is an example for how to associate multiple subtitle tracks with a video:</p>
<pre>
  &lt;video src="california.webm" controls>
    &lt;track label="English" kind="subtitles" src="calif_eng.wsrt" srclang="en">
    &lt;track label="German" kind="subtitles" src="calif_de.wsrt" srclang="de">
    &lt;track label="Chinese" kind="subtitles" src="calif_zh.wsrt" srclang="zh">
  &lt;/video>
</pre>
<p>In this case, the UA is expected to provide a text menu with a subtitle entry with these three tracks and their label as part of the video controls. Thus, the user can interactively activate one of the tracks.</p>
<p>Here is an example for multiple tracks of different kinds:</p>
<pre>
  &lt;video src="california.webm" controls>
    &lt;track label="English" kind="subtitles" src="calif_eng.wsrt" srclang="en">
    &lt;track label="German" kind="captions" src="calif_de.wsrt" srclang="de">
    &lt;track label="French" kind="chapter" src="calif_fr.wsrt" srclang="fr">
    &lt;track label="English" kind="metadata" src="calif_meta.wsrt" srclang="en">
    &lt;track label="Chinese" kind="descriptions" src="calif_zh.wsrt" srclang="zh">
  &lt;/video>
</pre>
<p>In this case, the UA is expected to provide a text menu with a list of track kinds with one entry each for subtitles, captions and descriptions through the controls. The chapter tracks are expected to provide some sort of visual subdivision on the timeline and the metadata tracks are not exposed visually, but are only available through the JavaScript API.</p>
<p>Here are several ideas for improving the &lt;track> specification:</p>
<ul>
<li>&lt;track> is currently only defined for WebSRT resources &#8211; it should be made generic and then browsers can compete on the formats for which they provide support. WebSRT could be the baseline format. A @type attribute could be added to hint at the MIME type of the provided resource.</li>
<li>&lt;track> needs a means for authors to mark certain tracks as active, others as inactive. This can be overruled by browser settings e.g. on @srclang and by user interaction.</li>
<li>karaoke and lyrics are supported by WebSRT, but aren&#8217;t in the HTML5 spec as track kinds &#8211; they should be added and made visible like subtitles or captions.</li>
</ul>
<h3 id="API"><a href="http://www.whatwg.org/specs/web-apps/current-work/complete/video.html#timed-tracks">EXPOSING A LIST OF TimedTracks TO JAVASCRIPT</a></h3>
<p>This is where we take an extra step and move to a uniform handling of both in-band and out-of-band timed text tracks. Futher, a third type of timed text track has been introduced in the form of a MutableTimedTrack &#8211; i.e. one that can be authored and added through JavaScript alone.</p>
<p>The JavaScript API that is exposed for any of these track type is identical. A media element now has this additional IDL interface:</p>
<pre>
interface HTMLMediaElement : HTMLElement {
...
  readonly attribute TimedTrack[] tracks;
  MutableTimedTrack addTrack(in DOMString label, in DOMString kind,
                                 in DOMString language);
};
</pre>
<p>A media element thus manages a list of TimedTracks and provides for adding TimedTracks through addTrack().</p>
<p>The timed tracks are associated with a media resource in the following order:</p>
<ol>
<li>The &lt;track> element children of the media element, in tree order.</li>
<li>Tracks created through the addTrack() method, in the order they were added, oldest first.</li>
<li>In-band timed text tracks, in the order defined by the media resource&#8217;s format specification.</li>
</ol>
<p>The IDL interface of a TimedTrack is as follows:</p>
<pre>
interface TimedTrack {
  readonly attribute DOMString kind;
  readonly attribute DOMString label;
  readonly attribute DOMString language;
  readonly attribute unsigned short readyState;
           attribute unsigned short mode;
  readonly attribute TimedTrackCueList cues;
  readonly attribute TimedTrackCueList activeCues;
  readonly attribute Function onload;
  readonly attribute Function onerror;
  readonly attribute Function oncuechange;
};
</pre>
<p>The first three capture the value of the @kind, @label and @srclang attributes and are provided by the addTrack() function for MutableTimedTracks and exposed from metadata in the binary resource for in-band tracks.</p>
<p>The readyState captures whether the data is available and is one of &#8220;not loaded&#8221;, &#8220;loading&#8221;, &#8220;loaded&#8221;, &#8220;failed to load&#8221;. Data is only availalbe in &#8220;loaded&#8221; state.</p>
<p>The mode attribute captures whether the data is activate to be displayed and is one of &#8220;disabled&#8221;, &#8220;hidden&#8221; and &#8220;showing&#8221;. In the &#8220;disabled&#8221; mode, the UA doesn&#8217;t have to download the resource, allowing for some bandwidth management.</p>
<p>The cues and activeCues attributes provide the list of parsed cues for the given track and the subpart thereof that is currently active.</p>
<p>The onload, onerror, and oncuechange functions are event handlers for the load, error and cuechange events of the TimedTrack.</p>
<p>Individual cues expose the following IDL interface:</p>
<pre>
interface TimedTrackCue {
  readonly attribute TimedTrack track;
  readonly attribute DOMString id;
  readonly attribute float startTime;
  readonly attribute float endTime;
  DOMString getCueAsSource();
  DocumentFragment getCueAsHTML();
  readonly attribute boolean pauseOnExit;
  readonly attribute Function onenter;
  readonly attribute Function onexit;
  readonly attribute DOMString direction;
  readonly attribute boolean snapToLines;
  readonly attribute long linePosition;
  readonly attribute long textPosition;
  readonly attribute long size;
  readonly attribute DOMString alignment;
  readonly attribute DOMString voice;
};
</pre>
<p>The @track attribute links the cue to its TimedTrack.</p>
<p>The @id, @startTime, @endTime attributes expose a cue identifier and its associated time interval. The getCueAsSource() and getCueAsHTML() functions provide either an unparsed cue text content or a text content parsed into a HTML DOM subtree.</p>
<p>The @pauseOnExit attribute can be set to true/false and indicates whether at the end of the cue&#8217;s time interval the media playback should be paused and wait for user interaction to continue. This is particularly important as we are trying to support <a href="http://www.w3.org/WAI/PF/HTML/wiki/Media_Accessibility_Requirements#Extended_audio_description">extended audio descriptions</a> and <a href="http://www.w3.org/WAI/PF/HTML/wiki/Media_Accessibility_Requirements#Extended_Time-aligned_Captions.2FSubtitles">extended captions</a>.</p>
<p>The onenter and onexit functions are event handlers for the enter and exit events of the TimedTrackCue.</p>
<p>The @direction, @snapToLines, @linePosition, @textPosition, @size, @alignment and @voice attributes expose WebSRT positioning and semantic markup of the cue.</p>
<p>My only concerns with this part of the specification are:</p>
<ul>
<li>The WebSRT-related attributes in the TimedTrackCue are in conflict with CSS attributes and really should not be introduced into HTML5, since they are WebSRT-specific. They will not exist in other types of in-band or out-of-band timed text tracks. As there is a mapping to do already, why not rely on already available CSS features.</li>
<li>There is no API to expose header-specific metadata from timed text tracks into JavaScript. This such as the copyright holder, the creation date and the usage rights of a timed text track would be useful to have available. I would propose to add a list of name-value metadata elements to the TimedTrack API.</li>
<li>In addition, I would propose to allow media fragment hyperlinks in a &lt;video> @src attribute to point to the @id of a TimedTextCue, thus defining that the playback position should be moved to the time offset of that TimedTextCue. This is a useful feature and builds on bringing named media fragment URIs and TimedTracks together.</li>
</ul>
<h3 id="rendering"><a href="http://www.whatwg.org/specs/web-apps/current-work/complete/rendering.html#timed-tracks-0">RENDERING TimedTracks</a></h3>
<p>The third part of the timed track framework deals with how to render the timed text cues in a Web page.  The rendering rules are explained in the <a href="http://www.whatwg.org/specs/web-apps/current-work/complete/rendering.html#timed-tracks-0">HTML5 rendering section</a>.</p>
<p>I&#8217;ve extracted the following rough steps from the rendering algorithm:</p>
<ol>
<li>All timed tracks of a media resource that are in &#8220;showing&#8221; mode are rendered together to avoid overlapping text from multiple tracks.</li>
<li>The timed tracks cues that are to be rendered are collected from the active timed tracks and ordered by the timed track order first and by their start time second. Where there are identical start times, the cues are ordered by their end time, earliest first, or by their creation order if all else is identical.</li>
<li> Each cue gets its own CSS box.</li>
<li>The text in the CSS boxes is positioned and formated by interpreting the positioning and formatting instructions of WebSRT that are provided on the cues.</li>
<li>An anonymous inline CSS box is created into which all the cue CSS boxes are wrapped.</li>
<li>The wrapping CSS box gets the dimensions of the video viewport. The cue CSS boxes are positioned so they don&#8217;t overlap. The text inside the cue CSS boxes inside the wrapping CSS box is wrapped at the edges if necessary.</li>
</ol>
<p>To overcome security concerns with this kind of direct rendering of a CSS box into the Web page where text comes potentially from a different and malicious Web site, it is required to have the cues come from the same origin as the Web page.</p>
<p>To allow application of a restricted set of CSS properties to the timed text cues, a set of pseudo-selectors was introduced. This is necessary since all the CSS boxes are anonymous and cannot be addressed from the Web page. The introduced pseudo-selectors are ::cue to address a complete cue CSS box, and ::cue-part to address a subpart of a cue CSS box based on a set of identifiers provided by WebSRT.</p>
<p>I have several issues with this approach:</p>
<ul>
<li>I believe that it is not a good idea to only restrict rendering to same-origin files. This will disallow the use of external captioning services (or even just a separate caption server of the same company) to link to for providing the captions to a video. <a href="http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-July/027283.html">Henri Sivonen proposed</a> a means to overcome this by parsing every cue basically as its own HTML document (well, the body of a document) and then rendering these in iFrame-manner into the Web page. This would overcome the same-origin restriction. It would also allow to do away with the new ::cue CSS selectors, thus simplifying the solution.</li>
<li>In general I am concerned about how tightly the rendering is tied to WebSRT. Step 4 should not be in the HTML5 specification, but only apply to WebSRT. Every external format should provide its own mapping to CSS. As it is specified right now, other formats, such as e.g. 3GPP in MPEG-4 or Kate in Ogg, are required to map their format and positioning information to WebSRT instructions. These are then converted again using the WebSRT to CSS mapping rules. That seems overkill.</li>
<li>I also find step 6 very limiting, since only the video viewport is regarded as a potential rendering area &#8211; this is also the reason why there is no rendering for audio elements. Instead, it would make a lot more sense if a CSS box was provided by the HTML page &#8211; the default being the video viewport, but it could be changed to any area on screen. This would allow to render music lyrics under or above an audio element, or render captions below a video element to avoid any overlap at all.</li>
</ul>
<h3 id="summary">SUMMARY AND FURTHER NEEDS</h3>
<p>We&#8217;ve made huge progress on accessibility features for HTML5 media elements with the specifications that Ian proposed. I think we can move it to a flexible and feature-rich framework as the improvements that Henri, myself and others have proposed are included.</p>
<p>This will meet most of the <a href="http://www.w3.org/WAI/PF/HTML/wiki/Media_Accessibility_Requirements">requirements that the W3C HTML Accessibility Task Force has collected for media elements</a> where the requirements relate to accessibility functionality provided through alternative <strong>text</strong> resources.</p>
<p>However, we are not solving any of the accessibility needs that relate to alternative audio-visual tracks and resources. In particular there is no solution yet to deal with multi-track audio or video files that have e.g. sign language or audio description tracks in them &#8211; not to speak of the issues that can be introduced through dealing with separate media resources from several sites that need to be played back in sync. This latter may be a challenge for future versions of HTML5, since needs for such synchoronisation of multiple resources have to be explored further.</p>
<p>In a first instance, we will require an API to expose in-band tracks, a means to control their activation interactively in a UI, and a description of how they should be rendered. E.g. should a sign language track be rendered as pciture-in-picture? <a href="http://www.w3.org/WAI/PF/HTML/wiki/Media_Accessibility_Requirements#Clear_audio">Clear audio</a> and <a href="http://www.w3.org/WAI/PF/HTML/wiki/Media_Accessibility_Requirements#Sign_Translation">Sign translation</a> are the two key accessibility needs that can be satisfied with such a multi-track solution.</p>
<p>Finally, another key requirement area for media accessibility is described in a section called <a href="http://www.w3.org/WAI/PF/HTML/wiki/Media_Accessibility_Requirements#Content_Navigation_by_Content_Structure">&#8220;Content Navigation by Content Structure&#8221;</a>. This describes the need for vision-impaired users to be able to navigate through a media resource based on semantic markup &#8211; think of it as similar to a navigation through a book by book chapters and paragraphs. The introduction of chapter markers goes some way towards satisfying this need, but chapter markers tend to address only big time intervals in a video and don&#8217;t let you navigate on a different level to subchapters and paragraphs. It is possible to provide that navigation through providing several chapter tracks at different resolution levels, but then they are not linked together and navigation cannot easily swap between resolution levels.</p>
<p>An alternative might be to include different resolution levels inside a single chapter track and somehow control the UI to manage them as different resolutions. This would only require an additional attribute on text cues and could be useful to other types of text tracks, too. For example, captions could be navigated based on scenes, shots, coversations, or individual captions. Some experimentation will be required here before we can introduce a sensible extension to the given media accessibility framework.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingertech.net/2010/08/07/websrt-and-html5-media-accessibility/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>2010: Implementing the Media Fragments URI Specification</title>
		<link>http://blog.gingertech.net/publications/conference-articles/2010-implementing-the-media-fragments-uri-specification/</link>
		<comments>http://blog.gingertech.net/publications/conference-articles/2010-implementing-the-media-fragments-uri-specification/#comments</comments>
		<pubDate>Fri, 30 Jul 2010 00:44:24 +0000</pubDate>
		<dc:creator>silvia</dc:creator>
				<category><![CDATA[random]]></category>

		<guid isPermaLink="false">http://blog.gingertech.net/?page_id=1032</guid>
		<description><![CDATA[D. van Deursen, S. Pfeiffer, R. Troncy, Y. Lafon, E. Mannens, R. van der Walle,]]></description>
			<content:encoded><![CDATA[<p>D. van Deursen, S. Pfeiffer, R. Troncy, Y. Lafon, E. Mannens, R. van der Walle, </p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingertech.net/publications/conference-articles/2010-implementing-the-media-fragments-uri-specification/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Media Fragment URI Specification in Last Call WD</title>
		<link>http://blog.gingertech.net/2010/07/10/media-fragment-uri-specification-in-last-call-wd/</link>
		<comments>http://blog.gingertech.net/2010/07/10/media-fragment-uri-specification-in-last-call-wd/#comments</comments>
		<pubDate>Fri, 09 Jul 2010 15:44:20 +0000</pubDate>
		<dc:creator>silvia</dc:creator>
				<category><![CDATA[Digital Media]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[standards]]></category>
		<category><![CDATA[video accessibility]]></category>
		<category><![CDATA[HTML5 video]]></category>
		<category><![CDATA[media fragment URI]]></category>
		<category><![CDATA[media fragments]]></category>
		<category><![CDATA[named fragment]]></category>
		<category><![CDATA[spatial fragment]]></category>
		<category><![CDATA[temporal fragment]]></category>
		<category><![CDATA[track fragment]]></category>

		<guid isPermaLink="false">http://blog.gingertech.net/?p=1020</guid>
		<description><![CDATA[After two years of effort, the W3C Media Fragment WG has now created a Last Call Working Draft document. This means that the working group is fairly confident that they have addressed all the required issues for media fragment URIs and their implementation on HTTP and is asking for outside experts and groups for input. [...]]]></description>
			<content:encoded><![CDATA[<p>After two years of effort, the W3C Media Fragment WG has now created a <a href="http://www.w3.org/TR/media-frags/">Last Call Working Draft document</a>. This means that the working group is fairly confident that they have addressed all the required issues for media fragment URIs and their implementation on HTTP and is asking for outside experts and groups for input. This is the time for you to get active and proof-read the specification thoroughly and feed back all the concerns that you have and all the things you do not understand!</p>
<p>The <a href="http://www.w3.org/TR/media-frags/">media fragment (MF) URI specification</a> specifies two types of MF URIs: those created with a URI fragment (&#8220;#&#8221;), e.g. <b>video.ogv#t=10,20</b> and those with a URI query (&#8220;?&#8221;), e.g. <b>video.ogv?t=10,20</b>. There is a fundamental difference between the two that needs to be appreciated: with a URI fragment you can specify a subpart of a resource, e.g. a subpart of a video, while with a URI query you will refer to a different resource, i.e. a &#8220;new&#8221; video. This is an important difference to understand for media fragments, because only some things that we want to achieve with media fragments can be achieved with &#8220;#&#8221;, while others can only be achieved by transforming the resource into a different new bitstream.</p>
<p>This all sounds very abstract, so let me give you an example. Say you want to retrieve a video without its audio track. Say you&#8217;d rather not download the audio track data, since you want to save on bandwidth. So, you are only interested to get the video data. The URI that you may want to use is <b>video.ogv#track=video</b>. This means that you don&#8217;t want to change the video resource, but you only want to see the video. The user agent (UA) has two options to resolve such a URI: it can either map that request to byte ranges and just retrieve those &#8211; or it can download the full resource and ignore the data it has not been requested to display.</p>
<p>Since we do not want the extra bytes of the audio track to be retrieved, we would hope the UA can do the byte range requests. However, most Web video formats will interleave the different tracks of a media resource in time such that a video track will results in a gazillion of smaller byte ranges. This makes it impractical to retrieve just the video through a &#8220;#&#8221; media fragment. Thus, if we really want this functionality, we have to make the server more intelligent and allow creation of a new resource from the existing one which doesn&#8217;t contain the audio. Then, the server, upon receiving a request such as <b>video.ogv#track=video</b> can redirect that to <b>video.ogv?track=video</b> and actually serve a new resource that satisfies the needs.</p>
<p>This is in fact exactly what was implemented in a recently published Firefox Plugin written by Jakub Sendor &#8211; also described in his presentation <a href="http://www.w3.org/2008/WebVideo/Fragments/talks/2010-06-30-Jakub_Sendor-Media_Fragment_Firefox_Plugin.pdf">&#8220;Media Fragment Firefox plugin&#8221;</a>.</p>
<p>Media Fragment URIs are defined for four dimensions:</p>
<ul>
<li>temporal fragments</li>
<li>spatial fragments</li>
<li>track fragments</li>
<li>named fragments</li>
</ul>
<p>The temporal dimension, while not accompanied with another dimension, can be easily mapped to byte ranges, since all Web media formats interleave their tracks in time and thus create the simple relationship between time and bytes.</p>
<p>The spatial dimension is a very complicated beast. If you address a rectangular image region out of a video, you might want just the bytes related to that image region. That&#8217;s almost impossible since pixels are encoded both aggregated across the frame and across time. Also, actually removing the context, i.e. the image data outside the region of interest may not be what you want &#8211; you may only want to focus in on the region of interest. Thus, the proposal for what to do in the spatial dimension is to simply retrieve all the data and have the UA deal with the display of the focused region, e.g. putting a dark overlay over the regions outside the region of interest.</p>
<p>The track dimension is similarly complicated and here it was decided that a redirect to a URI query would be in order in the demo Firefox plugin. Since this requires an intelligent server &#8211; which is available through the Ninsuna demo server that was implemented by Davy Van Deursen, another member of the MF WG &#8211; the Firefox plugin makes use of that. If the UA doesn&#8217;t have such an intelligent server available, it may again be most useful to only blend out the non-requested data on the UA similar to the spatial dimension.</p>
<p>The named dimension is still a largely undefined beast. It is clear that addressing a named dimension cannot be done together with the other dimensions, since a named dimension can represent any of the other dimensions above, and even a combination of them. Thus, resolving a named dimension requires an understanding of either the UA or the server what the name maps to. If, for example, a track has a name in a media resource and that name is stored in the media header and the UA already has a copy of all the media headers, it can resolve the name to the track that is being requested and take adequate action.</p>
<p>But enough explaining &#8211; I have made a screencast of the Firefox plugin in action for all these dimensions, which explains things a lot more concisely than word will ever be able to &#8211; enjoy:<br />
<p><a href="http://blog.gingertech.net/2010/07/10/media-fragment-uri-specification-in-last-call-wd/"><em>Click here to view the embedded video.</em></a></p></p>
<p>And do not forget to proofread <a href="http://www.w3.org/TR/media-frags/">the specification</a> and send feedback to <a href="mailto:public-media-fragment@w3.org ">public-media-fragment@w3.org</a>. </p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingertech.net/2010/07/10/media-fragment-uri-specification-in-last-call-wd/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Media Fragment URI Demo</title>
		<link>http://blog.gingertech.net/?post_type=external-videos&#038;p=1096</link>
		<comments>http://blog.gingertech.net/?post_type=external-videos&#038;p=1096#comments</comments>
		<pubDate>Thu, 08 Jul 2010 05:46:00 +0000</pubDate>
		<dc:creator>silvia</dc:creator>
				<category><![CDATA[External Videos]]></category>
		<category><![CDATA[audio]]></category>
		<category><![CDATA[hyperlinking]]></category>
		<category><![CDATA[media]]></category>
		<category><![CDATA[technology]]></category>
		<category><![CDATA[URI]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[W3C]]></category>
		<category><![CDATA[Web]]></category>

		<guid isPermaLink="false">http://blog.gingertech.net/?post_type=external-videos&#038;p=1096</guid>
		<description><![CDATA[Demonstration of a Firefox Plugin developed by Jakub Sendor to show how Media Fragment URIs can be implemented. Media Fragment URIs are being specified at http://www.w3.org/TR/media-frags/ in a W3C technical report. Category: Science &#38; TechnologyUploaded by: silviapfeifferHosted: youtube]]></description>
			<content:encoded><![CDATA[<p><object width="500" height="306"><param name="movie" value="http://www.youtube.com/v/LfRRYp6mnu0?fs=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/LfRRYp6mnu0?fs=1" type="application/x-shockwave-flash" width="500" height="306" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>Demonstration of a Firefox Plugin developed by Jakub Sendor to show how Media Fragment URIs can be implemented. Media Fragment URIs are being specified at http://www.w3.org/TR/media-frags/ in a W3C technical report.</p>
<p><i>Category:</i> Science &amp; Technology<br /><i>Uploaded by:</i> <a href='http://www.youtube.com/user/silviapfeiffer'>silviapfeiffer</a><br /><i>Hosted:</i> <a href='http://www.youtube.com/watch?v=LfRRYp6mnu0'>youtube</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingertech.net/?post_type=external-videos&#038;p=1096/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>W3C Media Annotations API standard</title>
		<link>http://blog.gingertech.net/2010/04/10/w3c-media-annotations-api-standard/</link>
		<comments>http://blog.gingertech.net/2010/04/10/w3c-media-annotations-api-standard/#comments</comments>
		<pubDate>Sat, 10 Apr 2010 00:39:03 +0000</pubDate>
		<dc:creator>silvia</dc:creator>
				<category><![CDATA[Digital Media]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[open codecs]]></category>
		<category><![CDATA[standards]]></category>
		<category><![CDATA[API]]></category>
		<category><![CDATA[audio]]></category>
		<category><![CDATA[HTML5]]></category>
		<category><![CDATA[media annotations]]></category>
		<category><![CDATA[media elements]]></category>
		<category><![CDATA[media fragments]]></category>
		<category><![CDATA[meta data]]></category>
		<category><![CDATA[metadata]]></category>
		<category><![CDATA[Ogg]]></category>
		<category><![CDATA[skeleton]]></category>
		<category><![CDATA[specification]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[vorbiscomment]]></category>
		<category><![CDATA[W3C]]></category>

		<guid isPermaLink="false">http://blog.gingertech.net/?p=964</guid>
		<description><![CDATA[Recently, I was asked to review the W3C Media Annotations specifications as they are about to go into Last Call (a state that comes before the request for implementations at the W3C). The W3C Media Annotations group has defined a set of metadata that they believe is representative and common for media resources. The ontology [...]]]></description>
			<content:encoded><![CDATA[<p>Recently, I was asked to review the W3C Media Annotations specifications as they are about to go into <a href="http://www.w3.org/2005/10/Process-20051014/tr#rec-advance">Last Call</a> (a state that comes before the request for implementations at the W3C).</p>
<p>The W3C Media Annotations group has defined a set of metadata that they believe is representative and common for media resources. The <a href="http://www.w3.org/TR/2010/WD-mediaont-10-20100309/">ontology</a> consist of the following fields:</p>
<ul>
<li><strong>ma:identifier</strong>: a URI or string to identify a resource</li>
<li><strong>ma:title</strong>:	a string providing the title of the resource</li>
<li><strong>ma:language</strong>: a language code describing the language used in the resource</li>
<li><strong>ma:locator</strong>: the URI at which the resource can be accessed</li>
<li><strong>ma:contributor</strong>: a URI or string identifying the contributor and the nature of the contribution</li>
<li><strong>ma:creator</strong>: a URI or string identifying an author</li>
<li><strong>ma:createDate</strong>: a date of creation or publication of the resource</li>
<li><strong>ma:location</strong>: a string or geo code identifying where the resource has been shot/recorded</li>
<li><strong>ma:description</strong>: a string describing the content of the resource</li>
<li><strong>ma:keyword</strong>: a word or word combination providing a topic, keyword or tag representing the resource</li>
<li><strong>ma:genre</strong>: a string providing the genre of the resource</li>
<li><strong>ma:rating</strong>: rating value, including the rating scale</li>
<li><strong>ma:relation</strong>: a URI and string identifying a related resource and the relationship</li>
<li><strong>ma:collection</strong>:	a URI or string providing the name of a collection to which the resource belongs</li>
<li><strong>ma:copyright</strong>: a URI or string with the copyright statement.</li>
<li><strong>ma:license</strong>: a string or URI with the usage license</li>
<li><strong>ma:publisher</strong>: a string or URI with the publisher of the resource</li>
<li><strong>ma:targetAudience</strong>: a URI and classification string providing the issuer of the classification and the classification value</li>
<li><strong>ma:fragments</strong>: a list of string and URI values that identify media fragments and their type</li>
<li><strong>ma:namedFragments</strong>: a list of string and URI values the provide names to media fragments</li>
<li><strong>ma:frameSize</strong>: a width &#8211; height pair in pixels</li>
<li><strong>ma:compression</strong>: a string providing the compression algorithm</li>
<li><strong>ma:duration</strong>: a float to provide the resource duration in seconds</li>
<li><strong>ma:format	String</strong>: the mime type of the resource</li>
<li><strong>ma:samplingrate</strong>: a float with the audio sampling rate</li>
<li><strong>ma:framerate</strong>: a float with the video frame rate</li>
<li><strong>ma:bitrate</strong>: a float providing the average bit rate in kbps</li>
<li><strong>ma:numTracks</strong>: an int of the number of tracks</li>
</ul>
<p>Note that some of these fields are not single values, but simple constructs of multiple values. Thus, they are actually more complex than name-value pairs that, e.g. are typically used in HTML meta headers or in Dublin Core. I regard this as an issue for implementations.</p>
<p>The fields were chosen as typical metadata being available about media resources. The media fragments fields are a bit dubious in this respect, but could be useful in future.</p>
<p>The metadata is determined either from within the resource itself or from a metadata collection about the resource. As such, the document maps several existing metadata and media resource formats to this interface, amongst them:</p>
<ul>
<li><a href="http://www.w3.org/TR/2010/WD-mediaont-10-20100309/XMP.html">XMP</a></li>
<li><a href="http://www.w3.org/TR/2010/WD-mediaont-10-20100309/ID3.html">ID3</a></li>
<li><a href="http://www.w3.org/TR/2010/WD-mediaont-10-20100309/iTunes.html">iTunes</a></li>
<li><a href="http://www.w3.org/TR/2010/WD-mediaont-10-20100309/Quicktime.html">QT</a></li>
<li><a href="http://www.w3.org/TR/2010/WD-mediaont-10-20100309/SearchMonkey.html">SearchMonkey</a></li>
<li><a href="http://www.w3.org/TR/2010/WD-mediaont-10-20100309/MediaRDF.html">MediaRDF</a></li>
<li><a href="http://www.w3.org/TR/2010/WD-mediaont-10-20100309/LOM.html">LOM</a></li>
<li><a href="http://www.w3.org/TR/2010/WD-mediaont-10-20100309/METS.html">METS</a></li>
<li><a href="http://www.w3.org/TR/2010/WD-mediaont-10-20100309/EXIF.html">EXIF</a></li>
<li><a href="http://www.w3.org/TR/2010/WD-mediaont-10-20100309/CableLabs1.html">CableLabs 1.1</a></li>
<li><a href="http://www.w3.org/TR/2010/WD-mediaont-10-20100309/CableLabs2.html">CableLabs 2.0</a></li>
<li><a href="http://www.w3.org/TR/2010/WD-mediaont-10-20100309/DIG.html">DIG35</a></li>
<li><a href="http://www.w3.org/TR/2010/WD-mediaont-10-20100309/MIX.html">MIX</a></li>
<li><a href="http://www.w3.org/TR/2010/WD-mediaont-10-20100309/FRBR.html">FRBR</a></li>
<li><a href="http://www.w3.org/TR/2010/WD-mediaont-10-20100309/MediaRSS.html">MediaRSS</a></li>
<li><a href="http://www.w3.org/TR/2010/WD-mediaont-10-20100309/TXFeed.html">TXFeed</a></li>
<li><a href="http://www.w3.org/TR/2010/WD-mediaont-10-20100309/YouTube.html">YouTube</a></li>
<li><a href="http://www.w3.org/TR/2010/WD-mediaont-10-20100309/VRA.html">VRA</a></li>
<li><a href="http://www.w3.org/TR/2010/WD-mediaont-10-20100309/IPTC.html">IPTC</a></li>
<li><a href="http://www.w3.org/TR/2010/WD-mediaont-10-20100309/TVA.html">TVA</a></li>
<li><a href="http://www.w3.org/TR/2010/WD-mediaont-10-20100309/EBUCore.html">EBUCore</a></li>
<li><a href="http://www.w3.org/TR/2010/WD-mediaont-10-20100309/EBUP.html">EBUP</a></li>
<li><a href="http://www.w3.org/TR/2010/WD-mediaont-10-20100309/MPEG7.html">MPEG7</a></li>
<li><a href="http://www.w3.org/TR/2010/WD-mediaont-10-20100309/SMTPD.html">SMTPD</a></li>
</ul>
<p>As they didn&#8217;t have a mapping table for Ogg content, I offered the following:</p>
<table class="ta12" border="1">
<tbody>
<tr class="ro-header">
<th class="col-mawg">MAWG </th>
<th class="col-relation">Relation </th>
<th class="col-attribute">Ogg properties</th>
<th class="col-how">How to do the mapping</th>
<th class="col-datatype">Datatype</th>
</tr>
<tr class="ro-header">
<td class="col-mawg" colspan="5"><strong>Descriptive Properties (Core Set)</strong></td>
</tr>
<tr class="ro-header">
<td class="col-mawg" colspan="5"><em>Identification</em></td>
</tr>
<tr class="ro-even">
<td class="ma">ma:identifier </td>
<td class="cell">exact</td>
<td class="cell">Name</td>
<td class="cell">Name field in skeleton header (new)</td>
<td class="cell">String</td>
</tr>
<tr class="ro-odd">
<td class="ma">ma:title </td>
<td class="cell">exact</td>
<td class="cell">Title</td>
<td class="cell">TITLE field in vorbiscomment header</td>
<td class="cell">String</td>
</tr>
<tr>
<td class="ro-odd"></td>
<td class="ro-odd">exact</td>
<td class="ro-odd">Title</td>
<td class="ro-odd">Title field in skeleton header (new)</td>
<td class="ro-odd">String</td>
</tr>
<tr>
<td class="ro-odd"></td>
<td class="ro-odd">related</td>
<td class="ro-odd">Album</td>
<td class="ro-odd">ALBUM title in vorbiscomment header</td>
<td class="ro-odd">String</td>
</tr>
<tr class="ro-even">
<td class="ma">ma:language </td>
<td class="cell">exact</td>
<td class="cell">Language</td>
<td class="cell">Language field in skeleton header (new)</td>
<td class="cell">language code</td>
</tr>
<tr class="ro-odd">
<td class="ma">ma:locator </td>
<td class="cell">exact</td>
<td class="cell"></td>
<td class="cell">file URI from system</td>
<td class="cell">URI</td>
</tr>
<tr class="ro-header">
<td class="col-mawg" colspan="5"><em>Creation</em></td>
</tr>
<tr class="ro-even">
<td class="ma">ma:contributor </td>
<td class="cell">exact</td>
<td class="cell">Artist, Performer</td>
<td class="cell">ARTIST and PERFORMER vorbiscomment headers</td>
<td class="cell">Strings</td>
</tr>
<tr class="ro-odd">
<td class="ma">ma:creator </td>
<td class="cell">related</td>
<td class="cell">Organization</td>
<td class="cell">ORGANIZATION field in vorbiscomment header</td>
<td class="cell"></td>
</tr>
<tr class="ro-even">
<td class="ma">ma:createDate </td>
<td class="cell">exact</td>
<td class="cell">Date</td>
<td class="cell">DATE field in vorbiscomment header</td>
<td class="cell">ISO date format</td>
</tr>
<tr class="ro-odd">
<td class="ma">ma:location </td>
<td class="cell">exact</td>
<td class="cell">Location</td>
<td class="cell">LOCATION field in vorbiscomment header</td>
<td class="cell">String</td>
</tr>
<tr class="ro-header">
<td class="col-mawg" colspan="5"><em>Content description</em></td>
</tr>
<tr class="ro-even">
<td class="ma">ma:description </td>
<td class="cell">exact</td>
<td class="cell">Description</td>
<td class="cell">DESCRIPTION field in vorbiscomment header</td>
<td class="cell">String</td>
</tr>
<tr class="ro-odd">
<td class="ma">ma:keyword </td>
<td class="cell">N/A</td>
<td class="cell"></td>
<td class="cell"></td>
<td class="cell"></td>
</tr>
<tr class="ro-even">
<td class="ma">ma:genre </td>
<td class="cell">exact</td>
<td class="cell">Genre</td>
<td class="cell">GENRE field in vorbiscomment header</td>
<td class="cell">String</td>
</tr>
<tr class="ro-odd">
<td class="ma">ma:rating </td>
<td class="cell">N/A</td>
<td class="cell"></td>
<td class="cell"></td>
<td class="cell"></td>
</tr>
<tr class="ro-header">
<td class="col-mawg" colspan="6"><em>Relational</em></td>
</tr>
<tr class="ro-even">
<td class="ma">ma:relation </td>
<td class="cell">related</td>
<td class="cell">Version, Tracknumber</td>
<td class="cell">VERSION (version of a title), TRACKNUMBER (CD track) fields in vorbiscomment header</td>
<td class="cell">Strings</td>
</tr>
<tr class="ro-odd">
<td class="ma">ma:collection </td>
<td class="cell">related</td>
<td class="cell">Album</td>
<td class="cell">ALBUM field of vorbiscomment header</td>
<td class="cell">String</td>
</tr>
<tr class="ro-header">
<td class="col-mawg" colspan="5"><em>Rights</em></td>
</tr>
<tr class="ro-even">
<td class="ma">ma:copyright </td>
<td class="cell">exact</td>
<td class="cell">Copyright</td>
<td class="cell">COPYRIGHT field of vorbiscomment header</td>
<td class="cell">String</td>
</tr>
<tr class="ro-odd">
<td class="ma">ma:license </td>
<td class="cell">exact</td>
<td class="cell">License</td>
<td class="cell">LICENSE field of vorbiscomment header</td>
<td class="cell">String</td>
</tr>
<tr class="ro-header">
<td class="col-mawg" colspan="5"><em>Distribution</em></td>
</tr>
<tr class="ro-even">
<td class="ma">ma:publisher </td>
<td class="cell">related</td>
<td class="cell">Organization</td>
<td class="cell">ORGNIZATION field of vorbiscomment header</td>
<td class="cell">String</td>
</tr>
<tr class="ro-odd">
<td class="ma">ma:targetAudience </td>
<td class="cell">more specific</td>
<td class="cell">Role</td>
<td class="cell">Role field of Skeleton header (new)</td>
<td class="cell">String</td>
</tr>
<tr class="ro-header">
<td class="col-mawg" colspan="5"><em>Fragments</em></td>
</tr>
<tr class="ro-even">
<td class="ma">ma:fragments </td>
<td class="cell">N/A</td>
<td class="cell"></td>
<td class="cell"></td>
<td class="cell"></td>
</tr>
<tr class="ro-odd">
<td class="ma">ma:namedFragments </td>
<td class="cell">N/A</td>
<td class="cell"></td>
<td class="cell"></td>
<td class="cell"></td>
</tr>
<tr class="ro-header">
<td class="col-mawg" colspan="5"><strong>Technical Properties</strong></td>
</tr>
<tr class="ro-even">
<td class="ma">ma:frameSize </td>
<td class="cell">exact</td>
<td class="cell"></td>
<td class="cell">extract from binary header of video track</td>
<td class="cell">int, int (width x height)</td>
</tr>
<tr class="ro-odd">
<td class="ma">ma:compression </td>
<td class="cell">exact</td>
<td class="cell">Content-type</td>
<td class="cell">Content-type field of Skeleton header</td>
<td class="cell">MIME type</td>
</tr>
<tr class="ro-even">
<td class="ma">ma:duration </td>
<td class="cell">exact</td>
<td class="cell"></td>
<td class="cell">calculate as duration = last_sample_time &#8211; first_sample_time of OggIndex header of skeleton</td>
<td class="cell">Float (or rather: rational &#8211; rational)</td>
</tr>
<tr class="ro-odd">
<td class="ma">ma:format </td>
<td class="cell">exact </td>
<td class="cell">Content-type</td>
<td class="cell">Content-type field of Skeleton header</td>
<td class="cell">MIME type</td>
</tr>
<tr class="ro-even">
<td class="ma">ma:samplingrate </td>
<td class="cell">exact</td>
<td class="cell"></td>
<td class="cell">calculate as granulerate = granulerate_numerator / granulerate_denominator of Skeleton header</td>
<td class="cell">Rational (or rather int / int)</td>
</tr>
<tr class="ro-odd">
<td class="ma">ma:framerate </td>
<td class="cell">exact</td>
<td class="cell"></td>
<td class="cell">calculate as granulerate = granulerate_numerator / granulerate_denominator of Skeleton header</td>
<td class="cell">Rational (or rather int / int)</td>
</tr>
<tr class="ro-even">
<td class="ma">ma:bitrate </td>
<td class="cell">exact</td>
<td class="cell"></td>
<td class="cell">calculate as bitrate = length_of_segment / duration from OggIndex headers of skeleton</td>
<td class="cell">Float</td>
</tr>
<tr class="ro-odd">
<td class="ma">ma:numTracks </td>
<td class="cell">exact</td>
<td class="cell">Tracknumber</td>
<td class="cell">TRACKNUMBER field of vorbiscomment header (track number on album)</td>
<td class="cell">Int</td>
</tr>
</tbody>
</table>
<p>You will notice that the table mentions 4 fields in skeleton with a &#8220;new&#8221; marker &#8211; they are actually proposed fields in skeleton &#8211; a bit of coding will be necessary to introduce them into software. The space for these fields already exists in message header fields, so it won&#8217;t require a change of the skeleton format.</p>
<p>In the <a href="http://www.w3.org/TR/mediaont-api-1.0/">second specification</a> of the Media Annotations WG, the group offers a standard API to access (i.e. read) the defined fields. They also intend to create an API to write the fields, but I doubt that will be easy because of the vast amount of file types they intend to support.</p>
<p>There is basically a single function that allows the extraction of metadata:<br />
<code><br />
MAObject[] getProperty(in DOMString propertyName, in optional DOMString sourceFormat, in optional DOMString subtype, in optional DOMString language, in optional DOMString fragment );<br />
</code></p>
<p>I proposed it may be possible to include this into HTML5 as follows:<br />
<code><br />
interface HTMLMediaElement : HTMLElement {<br />
 ...<br />
 getter MAObject getProperty(in DOMString propertyName, in optional unsigned long trackIndex);<br />
 ...<br />
}<br />
</code></p>
<p>This would either extract the property for a particular track in a media resource or for the complete resource if no track index is given. The only problem I see is that the returned object is different depending on the requested property &#8211; the MAObject is only a parent class for the returned object types. I am not sure it is therefore possible to specify this easily in HTML5.</p>
<p>Overall I thought the specification was a nice piece of work. I am not sure I agree with all the chosen fields, but that is always an issue with metadata. The most important fields are there and that&#8217;s what matters.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingertech.net/2010/04/10/w3c-media-annotations-api-standard/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Manifests for exposing the structure of a Composite Media Resource</title>
		<link>http://blog.gingertech.net/2009/11/25/manifests-exposing-structure-of-a-composite-media-resource/</link>
		<comments>http://blog.gingertech.net/2009/11/25/manifests-exposing-structure-of-a-composite-media-resource/#comments</comments>
		<pubDate>Wed, 25 Nov 2009 01:21:03 +0000</pubDate>
		<dc:creator>silvia</dc:creator>
				<category><![CDATA[Digital Media]]></category>
		<category><![CDATA[standards]]></category>
		<category><![CDATA[video accessibility]]></category>
		<category><![CDATA[accessibility]]></category>
		<category><![CDATA[composite media resource]]></category>
		<category><![CDATA[declarative syntax]]></category>
		<category><![CDATA[HTML5]]></category>
		<category><![CDATA[multitrack video]]></category>
		<category><![CDATA[source element]]></category>
		<category><![CDATA[video element]]></category>
		<category><![CDATA[W3C]]></category>

		<guid isPermaLink="false">http://blog.gingertech.net/?p=739</guid>
		<description><![CDATA[In the previous post I explained that there is a need to expose the tracks of a time-linear media resource to the user agent (UA). Here, I want to look in more detail at different possibilities of how to do so, their advantages and disadvantages. Note: A lot of this has come out of discussions [...]]]></description>
			<content:encoded><![CDATA[<p>In the <a href="http://blog.gingertech.net/2009/11/23/model-of-a-time-linear-media-resource/">previous post</a> I explained that there is a need to expose the tracks of a time-linear media resource to the user agent (UA). Here, I want to look in more detail at different possibilities of how to do so, their advantages and disadvantages.</p>
<p>Note: A lot of this has come out of discussions I had at the recent <a href="http://www.w3.org/2009/11/TPAC/">W3C TPAC</a> and is still in flux, so I am writing this to start discussions and brainstorm.</p>
<h3>Declarative Syntax vs JavaScript API</h3>
<p>We can expose a media resource&#8217;s tracks either through a JavaScript function that can loop through the tracks and provide access to the tracks and their features, or we can do this through declarative syntax.</p>
<p>Using declarative syntax has the advantage of being available even if JavaScript is disabled in a UA. The markup can be parsed easily and default displays can be prepared without having to actually decode the media file(s).</p>
<p>OTOH, it has the disadvantage that it may not necessarily represent what is actually in the binary resource, but instead what the Web developer assumed was in the resource (or what he forgot to update). This may lead to a situation where a &#8220;404&#8243; may need to be given on a media track.</p>
<p>A further disadvantage is that when somebody copies the media element onto another Web page, together with all the track descriptions, and then the original media resource is changed (e.g. a subtitle track is added), this has not the desired effect, since the change does not propagate to the other Web page.</p>
<p>For these reasons, I thought that a JavaScript interface was preferable over declarative syntax.</p>
<p>However, recent discussions, in particular with some accessibility experts, have convinced me that declarative syntax is preferable, because it allows the creation of a menu for turning tracks on/off without having to even load the media file. Further, declarative syntax allows to treat multiple files  and &#8220;native tracks&#8221;  of a  <a href="http://blog.gingertech.net/2009/11/23/model-of-a-time-linear-media-resource/">virtual media resource</a> in an identical manner.</p>
<h3>Extending Existing Declarative Syntax</h3>
<p>The HTML5 media elements already have declarative syntax to specify <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#the-source-element">multiple source media files for media elements</a>. The &lt;source> element is typically used to list video in mpeg4 and ogg format for support in different browsers, but has also been envisaged for different screensize and bandwidth encodings.</p>
<p>The &lt;source> elements are generally meant to list different resources that contribute towards the media element. In that respect, let&#8217;s try using it for declaring a manifest of tracks of the virtual media resource on an example:</p>
<pre>
  &lt;video>
    &lt;source id='av1' src='video.3gp' type='video/mp4' media='mobile' lang='en'
                     role='media' >
    &lt;source id='av2' src='video.mp4' type='video/mp4' media='desktop' lang='en'
                     role='media' >
    &lt;source id='av3' src='video.ogv' type='video/ogg' media='desktop' lang='en'
                     role='media' >
    &lt;source id='dub1' src='video.ogv?track=audio[de]' type='audio/ogg' lang='de'
                     role='dub' >
    &lt;source id='dub2' src='audio_ja.oga' type='audio/ogg' lang='ja'
                     role='dub' >
    &lt;source id='ad1' src='video.ogv?track=auddesc[en]' type='audio/ogg' lang='en'
                     role='auddesc' >
    &lt;source id='ad2' src='audiodesc_de.oga' type='audio/ogg' lang='de'
                     role='auddesc' >
    &lt;source id='cc1' src='video.mp4?track=caption[en]' type='application/ttaf+xml'
                     lang='en' role='caption' >
    &lt;source id='cc2' src='video.ogv?track=caption[de]' type='text/srt; charset="ISO-8859-1"'
                     lang='de' role='caption' >
    &lt;source id='cc3' src='caption_ja.ttaf' type='application/ttaf+xml' lang='ja'
                     role='caption' >
    &lt;source id='sign1' src='signvid_ase.ogv' type='video/ogg; codecs="theora"'
                     media='desktop' lang='ase' role='sign' >
    &lt;source id='sign2' src='signvid_gsg.ogv' type='video/ogg; codecs="theora"'
                     media='desktop' lang='gsg' role='sign' >
    &lt;source id='sign3' src='signvid_sfs.ogv' type='video/ogg; codecs="theora"'
                     media='desktop' lang='sfs' role='sign' >
    &lt;source id='tad1' src='tad_en.srt' type='text/srt; charset="ISO-8859-1"'
                     lang='en' role='tad' >
    &lt;source id='tad2' src='video.ogv?track=tad[de]' type='text/srt; charset="ISO-8859-1"'
                     lang='de' role='tad' >
    &lt;source id='tad3' src='tad_ja.srt' type='text/srt; charset="EUC-JP"' lang='ja'
                     role='tad' >
  &lt;/video>
</pre>
<p>Note that this somewhat ignores my previously proposed special itext tag for handling text tracks. I am doing this here to experiment with a more integrative approach with the virtual media resource idea from the previous post. This may well be a better solution than a specific new text-related element. Most of the attributes of the itext element are, incidentally, covered.</p>
<p>You will also notice that some of the tracks are references to tracks inside binary media files using the <a href="http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/">Media Fragment URI specification</a> while others link to full files. An example is <em>video.ogv?track=auddesc[en]</em>. So, this is a uniform means of exposing all the tracks that are part of a (virtual) media resource to the UA, no matter whether in-band or in external files. It actually relies on the UA or server being able to resolve these URLs.</p>
<h4>&#8220;type&#8221; attribute</h4>
<p>&#8220;media&#8221; and &#8220;type&#8221; are existing attributes of the &lt;source> element in HTML5 and meant to help the UA determine what to do with the referenced resource. The current spec states:</p>
<blockquote><p>The &#8220;type&#8221; attribute gives the type of the media resource, to help the user agent determine if it can play this media resource before fetching it.</p></blockquote>
<p>The word &#8220;play&#8221; might need to be replaced with &#8220;decode&#8221; to cover several different MIME types.</p>
<p>The &#8220;type&#8221; attribute was also extended with the possibility to add the &#8220;charset&#8221; MIME parameter of a linked text resource &#8211; this is particularly important for SRT files, which don&#8217;t handle charsets very well. It avoids having to add an additional attribute and is analogous to the &#8220;codecs&#8221; MIME parameter used by audio and video resources.</p>
<h4>&#8220;media&#8221; attribute</h4>
<p>Further, the spec states:</p>
<blockquote><p>The &#8220;media&#8221; attribute gives the intended media type of the media resource, to help the user agent determine if this media resource is useful to the user before fetching it. Its value must be a valid media query.</p></blockquote>
<p>The &#8220;mobile&#8221; and &#8220;desktop&#8221; values are hints that I&#8217;ve used for simplicity reasons. They could be improved by giving appropriate bandwidth limits and width/height values, etc. Other values could be different camera angles such as topview, frontview, backview. The media query aspect has to be looked into in more depth.</p>
<h4>&#8220;lang&#8221; attribute</h4>
<p>The above example further uses &#8220;lang&#8221; and &#8220;role&#8221; attributes:</p>
<p>The &#8220;lang&#8221; attribute is an existing global attribute of HTML5, which typically indicates the language of the data inside the element. Here, it is used to indicate the language of the referenced resource. This is possibly not quite the best name choice and should maybe be called &#8220;hreflang&#8221;, which is already used in multiple other elements to signify the language of the referenced resource.</p>
<h4>&#8220;role&#8221; attribute</h4>
<p>The &#8220;role&#8221; attribute is also an existing attribute in HTML5, included from <a href="http://www.w3.org/TR/wai-aria/">ARIA</a>. It currently doesn&#8217;t cover media resources, but could be extended. The suggestion here is to specify the roles of the different media tracks &#8211; the ones I have used here are:</p>
<ul>
<li>&#8220;media&#8221;: a main media resource &#8211; typically contains audio and video and possibly more</li>
<li>&#8220;dub&#8221;: a audio track that provides an alternative dubbed language track</li>
<li>&#8220;auddesc&#8221;: a audio track that provides an additional audio description track</li>
<li>&#8220;caption&#8221;: a text track that provides captions</li>
<li>&#8220;sign&#8221;: a video-only track that provides an additional sign language video track</li>
<li>&#8220;tad&#8221;: a text track that provides textual audio descriptions to be read by a screen reader or a braille device</li>
</ul>
<p>Further roles could be &#8220;music&#8221;, &#8220;speech&#8221;, &#8220;sfx&#8221; for audio tracks, &#8220;subtitle&#8221;, &#8220;lyrics&#8221;, &#8220;annotation&#8221;, &#8220;chapters&#8221;, &#8220;overlay&#8221; for text tracks, and &#8220;alternate&#8221; for a alternate main media resource, e.g. a different camera angle.</p>
<h4>Track activation</h4>
<p>The given attributes help the UA decide what to display.</p>
<p>It will firstly find out from the &#8220;type&#8221; attribute if it is capable of decoding the track.</p>
<p>Then, the UA will find out from the &#8220;media&#8221; query, &#8220;role&#8221;, and &#8220;lang&#8221; attributes whether a track is relevant to its user. This will require checking the capabilities of the device, network, and the user preferences.</p>
<p>Further, it could be possible for Web authors to influence whether a track is displayed or not through CSS parameters on the &lt;source> element: &#8220;display: none&#8221; or &#8220;visibility: hidden/visible&#8221;.</p>
<p>Examples for track activation that a UA would undertake using the example above:</p>
<p>Given a desktop computer with Firefox, German language preferences, captions and sign language activated, the UA will fetch the original video at video.ogv (for Firefox), the German caption track at video.ogv?track=caption[de], and the German sign language track at signvid_gsg.ogv (maybe also the German dubbed audio track at video.ogv?track=audio[de], which would then replace the original one).</p>
<p>Given a desktop computer with Safari, English language preferences and audio descriptions activated, the UA will fetch the original video at video.mp4 (for Safari) and the textual audio description at tad_en.srt to be displayed through the screen reader, since it cannot decode the Ogg audio description track at video.ogv?track=auddesc[en].</p>
<p>Also, all decodeable tracks could be exposed in a right-click menu and added on-demand.</p>
<h4>Display styling</h4>
<p>Default styling of these tracks could be:</p>
<ul>
<li>video or alternate video in the video display area,</li>
<li>sign language probably as picture-in-picture (making it useless on a mobile and only of limited use on the desktop),</li>
<li>captions/subtitles/lyrics as overlays on the bottom of the video display area (or whatever the caption format prescribes),</li>
<li>textual audio descriptions as ARIA live regions hidden behind the video or off-screen.</li>
</ul>
<p>Multiple audio tracks can always be played at the same time.</p>
<p>The Web author could also define the display area for a track through CSS styling and the UA would then render the data into that area at the rate that is required by the track.</p>
<h4>How good is this approach?</h4>
<p>The advantage of this new proposal is that it builds basically on existing HTML5 components with minimal additions to satisfy requirements for content selection and accessibility of media elements. It is a declarative approach to the multi-track media resource challenge.</p>
<p>However, it leaves most of the decision on what tracks are alternatives of/additions to each other and which tracks should be displayed to the UA. The UA makes an informed decision because it gets a lot of information through the attributes, but it still has to make decisions that may become rather complex. Maybe there needs to be a grouping level for alternative tracks and additional tracks &#8211; similar to what I did with the <a href="https://wiki.mozilla.org/Accessibility/HTML5_captions_v2">second itext proposal</a>, or similar to the &lt;switch> and &lt;par> elements of SMIL.</p>
<p>A further issue is one that is currently being discussed within the Media Fragments WG: how can you discover the track composition and the track naming/uses of a particular media resource? How, e.g., can a Web author on another Web site know how to address the tracks inside your binary media resource? A HTML specification like the above can help. But what if that doesn&#8217;t exist? And what if the file is being used offline?</p>
<h3>Alternative Manifest descriptions</h3>
<p>The need to manifest the track composition of a media resource is not a new one. Many other formats and applications had to deal with these challenges before &#8211; some have defined and published their format.</p>
<p>I am going to list a few of these formats here with examples. They could inspire a next version of the above proposal with grouping elements.</p>
<h4>Microsoft ISM files (SMIL subpart)</h4>
<p>With the release of IIS7, Microsoft introduced &#8220;Smooth Streaming&#8221;, which uses chunking on files on the server to deliver adaptive streaming to Silverlight clients over HTTP. To inform a smooth streaming client of the tracks available for a media resource, Microsoft defined ism files: <a href="http://msdn.microsoft.com/en-us/library/ee230817.aspx">IIS Smooth Streaming Server Manifest</a> files.</p>
<p>This is a short example &#8211; a longer one can be found <a href="http://msdn.microsoft.com/en-us/library/ee230810.aspx">here</a>:</p>
<pre>
&lt;?xml version=</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingertech.net/2009/11/25/manifests-exposing-structure-of-a-composite-media-resource/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>The model of a time-linear media resource for HTML5</title>
		<link>http://blog.gingertech.net/2009/11/23/model-of-a-time-linear-media-resource/</link>
		<comments>http://blog.gingertech.net/2009/11/23/model-of-a-time-linear-media-resource/#comments</comments>
		<pubDate>Mon, 23 Nov 2009 01:15:28 +0000</pubDate>
		<dc:creator>silvia</dc:creator>
				<category><![CDATA[Digital Media]]></category>
		<category><![CDATA[standards]]></category>
		<category><![CDATA[video accessibility]]></category>
		<category><![CDATA[audio element]]></category>
		<category><![CDATA[HTML5]]></category>
		<category><![CDATA[timing model]]></category>
		<category><![CDATA[video element]]></category>
		<category><![CDATA[W3C]]></category>

		<guid isPermaLink="false">http://blog.gingertech.net/?p=710</guid>
		<description><![CDATA[HTML5 has been criticised for not having a timing model of the media resource in its new media elements. This article spells it out and builds a framework of how we should think about HTML5 media resources. Note: these are my thoughts and nothing offical from HTML5 &#8211; just conclusions I have drawn from the [...]]]></description>
			<content:encoded><![CDATA[<p>HTML5 has been criticised for not having a timing model of the media resource in its new media elements. This article spells it out and builds a framework of how we should think about HTML5 media resources. Note: these are my thoughts and nothing offical from HTML5 &#8211; just conclusions I have drawn from the specs and from discussions I had.</p>
<p><strong>What is a time-linear media resource?</strong></p>
<p>In <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html">HTML5</a> and also in the <a href="http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/">Media Fragment URI specification</a> we deal only with audio and video resources that represent a single timeline exclusively. Let&#8217;s call such Web resources a time-linear media resource.</p>
<p>The <a href="http://www.w3.org/TR/media-frags-reqs/">Media Fragment requirements</a> document actually has a <a href="http://www.w3.org/TR/media-frags-reqs/#side1">very nice picture</a> to describe such resources &#8211; replicated here for your convenience:</p>
<p><img alt="Model of a Media Resource" src="http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-reqs/800px-Model_of_a_Video_Resource.png" title="Model of a Media Resource" width="500" /></p>
<p>The resource can potentially consist of any number of audio, video, text, image or other time-aligned data tracks. All these tracks adhere to a single timeline, which tends to be defined by the main audio or video track, while other tracks have been created to synchronise with these main tracks.</p>
<p>This model matches with the world view of video on YouTube and any other video hosting service. It also matches with video used on any video streaming service.</p>
<p><strong>Background on the choice of &#8220;time-linear&#8221;</strong></p>
<p>I&#8217;ve deliberately chosen the word &#8220;time-linear&#8221; because we are talking about a single, gap-free, linear timeline here and not multiple timelines that represent the single resource.</p>
<p>The word &#8220;linear&#8221; is, however, somewhat over-used, since the introduction of digital systems into the world of analog film introduced what is now known as &#8220;non-linear video editing&#8221;. This term originates from the fact that non-linear video editing systems don&#8217;t have to linearly spool through film material to get to a edit point, but can directly access any frame in the footage as easily as any other.</p>
<p>When talking about a time-linear media resource, we are referring to a digital resource and therefore direct access to any frame in the footage is possible. So, a time-linear media resource will still be usable within a non-linear editing process.</p>
<p>As a Web resource, a time-linear media resource is not addressed as a sequence of frames or samples, since these are encoding specific. Rather, the resource is handled abstractly as an object that has track and time dimensions &#8211; and possibly spatial dimensions where image or video tracks are concerned. The framerate encoding of the resource itself does not matter and could, in fact, be changed without changing the resource&#8217;s time, track and spatial dimensions and thus without changing the resource&#8217;s address.</p>
<p><strong>Interactive Multimedia</strong></p>
<p>The term &#8220;time-linear&#8221; is used to specify the difference between a media resource that follows a single timeline, in contrast to one that deals with multiple timelines, linked together based on conditions, events, user interactions, or other disruptions to make a fully interactive multi-media experience. Thus, media resources in HTML5 and Media Fragments do not qualify as interactive multimedia themselves because they are not regarded as a graph of interlinked media resources, but simply as a single time-linear resource.</p>
<p>In this respect, time-linear media resources are also different from the kind of interactive mult-media experiences that an Adobe Shockwave Flash, Silverlight, or a SMIL file can create. These can go far beyond what current typical video publishing and communication applications on the Web require and go far beyond what the HTML5 media elements were created for. If your application has a need for multiple timelines, it may be necessary to use SMIL, Silverlight, or Adobe Flash to create it.</p>
<p>Note that the fact that the HTML5 media elements are part of the Web, and therefore expose states and integrate with JavaScript, provides Web developers with a certain control over the playback order of a time-linear media resource. The simple functions pause(), play(), and the currentTime attribute allow JavaScript developers to control the current playback offset and whether to stop or start playback. Thus, it is possible to interrupt a playback and present, e.g. a overlay text with a hyperlink, or an additional media resource, or anything else a Web developer can imagine right in the middle of playing back a media resource.</p>
<p>In this way, time-linear media resources can contribute towards an interactive multi-media experience, created by a Web developer through a combination of multiple media resources, image resources, text resources and Web pages. The limitations of this approach are not yet clear at this stage &#8211; how far will such a constructed multi-media experience be able to take us and where does it become more complicated than an Adobe Flash, Silverlight, or SMIL experience. The answer to this question will, I believe, become clearer through the next few years of HTML5 usage and further extensions to HTML5 media may well be necessary then.</p>
<p><strong>Proper handling of time-linear media resources in HTML5</strong></p>
<p>At this stage, however, we have already determined several limitations of the existing HTML5 media elements that require resolution without changing the time-linear nature of the resource.</p>
<p><em>1. Expose structure</em></p>
<p>Above all, there is a need to expose the above painted structure of a time-linear media resource to the Web page. Right now, when the &lt;video> element links to a video file, it only accesses the main audio and video tracks, decodes them and displays them. The media framework that sits underneath the user agent (UA) and does the actual decoding for the UA might know about other tracks and might even decode, e.g. a caption track and display it by default, but the UA has no means of knowing this happens and controlling this.</p>
<p>We need a means to expose the available tracks inside a time-linear media resource and allow the UA some control over it &#8211; e.g. to choose whether to turn on/off a caption track, to choose which video track to display, or to choose which dubbed audio track to display.</p>
<p>I&#8217;ll discuss in another article different approaches on how to expose the structure. Suffice for now that we recognise the need to expose the tracks.</p>
<p><em>2. Separate the media resource concept from actual files</em></p>
<p>A HTML page is a sequence of HTML tags delivered over HTTP to a UA. A HTML page is a Web resource. It can be created dynamically and contain links to other Web resources such as images which complete its presentation.</p>
<p>We have to move to a similar &#8220;virtual&#8221; view of a media resource. Typically, a video is a single file with a video and an audio track. But also typically, caption and subtitle tracks for such a video file are stored in other files, possibly even on other servers. The caption or subtitle tracks are still in sync with the video file and therefore are actual tracks of that time-linear media resource. There is no reason to treat this differently to when the caption or subtitle track is inside the media file.</p>
<p>When we separate the media resource concept from actual files, we will find it easier to deal with time-linear media resources in HTML5.</p>
<p><em>3. Track activation and Display styling</em></p>
<p>A time-linear media resource, when regarded completely abstractly, can contain all sorts of alternative and additional tracks.</p>
<p>For example, the existing &lt;source> elements inside a video or audio element are currently mostly being used to link to alternative encodings of the main media resource &#8211; e.g. either in mpeg4 or ogg format. We can regard these as alternative tracks within the same (virtual) time-linear media resource.</p>
<p>Similarly, the &lt;source> elements have also been suggested to be used for alternate encodings, such as for mobile and Web. Again, these can be regarded as alternative tracks of the same time-linear media resource.</p>
<p>Another example are subtitle tracks for a main media resource, which are currently discussed to be referenced using the &lt;itext> element. These are in principle alternative tracks amongst themselves, but additional to the main media resource. Also, some people are actually interested in displaying two subtitle tracks at the same time to learn translations.</p>
<p>Another example are sign language tracks, which are video tracks that can be regarded as an alternative to the audio tracks for hard-of-hearing users. They are then additional video tracks to the original video track and it is not clear how to display more than one video track. Typically, sign language tracks are displayed as picture-in-picture, but on the Web, where video is usually displayed in a small area, this may not be optimal.</p>
<p>As you can see, when deciding which tracks need to be displayed one needs to analyse the relationships between the tracks. Further, user preferences need to come into play when activating tracks. Finally, the user should be able to interactively activate tracks as well.</p>
<p>Once it is clear, what tracks need displaying, there is still the challenge of how to display them. It should be possible to provide default displays for typical track types, and allow Web authors to override these default display styles since they know what actual tracks their resource is dealing with.</p>
<p>While the default display seems to be typically an issue left to the UA to solve, the display overrides are typically dealt with on the Web through CSS approaches. How we solve this is for another time &#8211; right now we can just state the need for algorithms for track activiation and for default and override styling.</p>
<p><strong>Hypermedia</strong></p>
<p>To make media resources a prime citizens on the Web, we have to go beyond simply replicating digital media files. The Web is based on hyperlinks between Web resources, and that includes hyperlinking out of resources (e.g. from any word within a Web page) as well as hyperlinking into resources (e.g. fragment URIs into Web pages).</p>
<p>To turn video and audio into hypervideo and hyperaudio, we need to enable hyperlinking into and out of them.</p>
<p>Hyperlinking into media resources is fortunately already being addressed by the W3C Media Fragments working group, which also regards media resources in the same way as HTML5. The <a href="http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/#fragment-dimensions">addressing schemes under consideration</a> are the following:</p>
<ul>
<li>temporal fragment URI addressing: address a time offset/region of a media resource</li>
<li>spatial fragment URI addressing: address a rectangular region of a media resource (where available)</li>
<li>track fragment URI addressing: address one or more tracks of a media resource</li>
<li>named fragment URI addressing: address a named region of a media resource</li>
<li>a combination of the above addressing schemes</li>
</ul>
<p>With such addressing schemes available, there is still a need to hook up the addressing with the resource. For the temporal and the spatial dimension, resolving the addressing into actual byte ranges is relatively obvious across any media type. However, track addressing and named addressing need to be resolved. Track addressing will become easier when we solve the above stated requirement of exposing the track structure of a media resource. The name definition requires association of an id or name with temporal offsets, spatial areas, or tracks. The addressing scheme will be available soon &#8211; whether our media resources can support them is another challenge to solve.</p>
<p>Finally, hyperlinking out of media resources is something that is not generally supported at this stage. Certainly, some types of media resources &#8211; QuickTime, Flash, MPEG4, Ogg &#8211; support the definition of tracks that can contain HTML marked-up text and thus can also contain hyperlinks. But standardisation in this space has not really happened yet. It seems to be clear that hyperlinks out of media files will come from some type of textual track. But a standard format for such time-aligned text tracks doesn&#8217;t yet exist. This is a challenge to be addressed in the near future.</p>
<p><strong>Summary</strong></p>
<p>The Web has always tried to deal with new extensions in the simplest possible manner, providing support for the majority of current use cases and allowing for the few extraordinary use cases to be satisfied by use of JavaScript or embedding of external, more complex objects.</p>
<p>With the new media elements in HTML5, this is no different. So far, the most basic need has been satisfied: that of including simple video and audio files into Web pages. However, many basic requirements are not being satisfied yet: accessibility needs, codec choice, device-independence needs are just some of the core requirements that make it important to extend our view of &lt;audio> and &lt;video> to a broader view of a Web media resource without changing the basic understanding of an audio and video resource.</p>
<p>This post has created the concept of a &#8220;media resource&#8221;, where we keep the simplicity of a single timeline. At the same time, it has tried to classify the list of shortcomings of the current media elements in a way that will help us address these shortcomings in a Web-conformant means.</p>
<p>If we accept the need to expose the structure of a media resource, the need to separate the media resource concept from actual files, the need for an approach to track activation, and the need to deal with styling of displayed tracks, we can take the next steps and propose solutions for these.</p>
<p>Further, understanding the structure of a media resources allows us to start addressing the harder questions of how to associate events with a media resource, how to associate a navigable structure with a media resource, or how to turn media resources into hypermedia.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingertech.net/2009/11/23/model-of-a-time-linear-media-resource/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Dealing with multi-track video (and audio)</title>
		<link>http://blog.gingertech.net/2009/10/17/dealing-with-multi-track-video-and-audio/</link>
		<comments>http://blog.gingertech.net/2009/10/17/dealing-with-multi-track-video-and-audio/#comments</comments>
		<pubDate>Sat, 17 Oct 2009 12:15:08 +0000</pubDate>
		<dc:creator>silvia</dc:creator>
				<category><![CDATA[Digital Media]]></category>
		<category><![CDATA[standards]]></category>
		<category><![CDATA[video accessibility]]></category>
		<category><![CDATA[media fragment addressing]]></category>
		<category><![CDATA[multitrack audio]]></category>
		<category><![CDATA[multitrack video]]></category>
		<category><![CDATA[track names]]></category>

		<guid isPermaLink="false">http://blog.gingertech.net/?p=632</guid>
		<description><![CDATA[We are slowly approaching the stage where we want to make multi-track video of the following type available and accessible: original video track original audio track dubbed audio tracks in n different languages audio description track in n different langauges sign language video tracks in n different sign langauges caption tracks in n different langauges [...]]]></description>
			<content:encoded><![CDATA[<p>We are slowly approaching the stage where we want to make multi-track video of the following type available and accessible:</p>
<ul>
<li>original video track</li>
<li>original audio track</li>
<li>dubbed audio tracks in n different languages</li>
<li>audio description track in n different langauges</li>
<li>sign language video tracks in n different sign langauges</li>
<li>caption tracks in n different langauges</li>
<li>multiple other time-aligned text tracks in different langauges</li>
<li>audio and video track from different camera angles</li>
<li>music and speech tracks can be separate</li>
<li>different quality tracks are available</li>
<li>accompanying images, e.g. slides for a presentation</li>
</ul>
<p>One of the issues with such a sizeable number of tracks is how to display them. Some of them are alternatives, some of them additions. Sign language is typically presented in a PiP (picture-in-picture) approach. If we have a music and a speech (or singing) track, we may want to have control over removing certain tracks &#8211; e.g. to be able to do karaoke. Caption and subtitle tracks in the same language are probably alternatives, while in different languages they could be additions. It is not a trivial challenge to handle such complex files in an application.</p>
<p>At this point, I am only trying to solve a sub-challenge. As we talk about a particular track in a multi-track media file, we will want to identify it by name. Should there be a standard for naming the track, so that we can e.g. address them by a URL, e.g. with the intention of only delivering a subset of tracks from the larger file? We could introduce that for Ogg &#8211; but maybe there is an opportunity to do this across file formats?</p>
<p>To find some answers to these and related questions, I want to discuss two approaches.</p>
<p>The first approach is a simple numbering approach. In it, the audio, video, and annotation tracks are all ordered and then numbered through. This will result in the following sets of track names: video[0] &#8230; [n], audio[0] &#8230; [n], timed text[0] &#8230; [n], and possibly even timed images[0] &#8230; [n]. This approach is simple, easy to understand, and only requires ordering the tracks within their types. It allows addressing of a particular track &#8211; e.g. as required by the media fragment URI scheme for track addressing. However, it does not allow identification of alternatives, additions, or presentation styles.</p>
<p>Should alternatives, additions, and presentation styles be encoded in the name of  track? Or should this information go into a meta description area of the multi-track video? Something like skeleton in Ogg? Or should it go a step further and be buried in an external information file such as an m3u file (or <a href="http://wiki.xiph.org/ROE">ROE</a> for Ogg)?</p>
<p>I want to experiment here with the naming scheme and what we would need to specify to be able to decide which tracks to ignore and which to combine for a presentation. And I want to ask for your comments and advice.</p>
<p>This requires listing exactly what types of content tracks we may have to deal with.</p>
<p>In the video space, we have at minimum the following track types:</p>
<ul>
<li>main video content &#8211; with alternative camera angles</li>
<li>subsidiary video content &#8211; with alternative camera angles</li>
<li>sign language videos &#8211; in alternative languages</li>
</ul>
<p>Alternatives are defined by camera angle and language. Also, each track can be made available in a different quality. I&#8217;d also regard additional image content, such as slides in a presentation, into subsidiary video content. So, here we could use a scheme such as <em>video_[main,side,sign]_language_angle</em>.</p>
<p>In the audio space, we have at minimum the following track types:</p>
<ul>
<li>main audio content &#8211; in alternative languages</li>
<li>background audio content &#8211; e.g.music, SFX, noise</li>
<li>foreground speech or singing content &#8211; in alternative languages</li>
<li>audio descriptions &#8211; in alternative languages</li>
</ul>
<p>Alternatives are defined by language and content type. Again, each track can be made available in a different quality. Here we could use a scheme such as <em>audio_type_language</em>.</p>
<p>In the text space, we have at minimum the following track types:</p>
<ul>
<li>subtitles &#8211; in different languages</li>
<li>captions &#8211; in different languages</li>
<li>textual audio descriptions &#8211; in different languages</li>
<li>other time-aligned text &#8211; in different languages</li>
</ul>
<p>Alternatives are defined by language and content type &#8211; e.g. lyrics, captions and subtitles really compete for the same screen space. Here we could use a scheme such as <em>text_type_language</em>.</p>
<p><b>A generic track naming scheme</b><br />
It seems, the generic naming scheme of</p>
<blockquote><p>
<em>&lt;content_type>_&lt;track_type>_&lt;language> [_&lt;angle>]</em>
</p></blockquote>
<p>can cover all cases.</p>
<p>Are there further track types, further alternatives I have missed? What do you think?</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingertech.net/2009/10/17/dealing-with-multi-track-video-and-audio/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>URI fragments vs URI queries for media fragment addressing</title>
		<link>http://blog.gingertech.net/2009/09/08/uri-fragments-vs-uri-queries-for-media-fragment-addressing/</link>
		<comments>http://blog.gingertech.net/2009/09/08/uri-fragments-vs-uri-queries-for-media-fragment-addressing/#comments</comments>
		<pubDate>Mon, 07 Sep 2009 14:34:45 +0000</pubDate>
		<dc:creator>silvia</dc:creator>
				<category><![CDATA[Digital Media]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[open codecs]]></category>
		<category><![CDATA[standards]]></category>
		<category><![CDATA[fragment]]></category>
		<category><![CDATA[media frag]]></category>
		<category><![CDATA[media fragments URI]]></category>
		<category><![CDATA[MFWG]]></category>
		<category><![CDATA[query]]></category>
		<category><![CDATA[W3C]]></category>

		<guid isPermaLink="false">http://blog.gingertech.net/?p=544</guid>
		<description><![CDATA[In the W3C Media Fragment Working Group (MFWG) we have had long discussions about the use of the URI query (&#8220;?&#8221;) or the URI fragment (&#8220;#&#8221;) addressing approach for addressing directly into media fragments, and the diverse new HTTP headers required to serve such URI requests, considering such side conditions as the stripping-off of fragment [...]]]></description>
			<content:encoded><![CDATA[<p>In the W3C Media Fragment Working Group (MFWG) we have had long discussions about the use of the URI query (&#8220;?&#8221;) or the URI fragment (&#8220;#&#8221;) addressing approach for addressing directly into media fragments, and the diverse new HTTP headers required to serve such URI requests, considering such side conditions as the stripping-off of fragment parameters from a URI by Web browsers, or the existence of caching Web proxies.</p>
<p>As <a href="http://blog.gingertech.net/2008/11/10/media-fragment-uri-addressing/">explained earlier</a>, URI queries request (primary) resources, while URI fragments address secondary resources, which have a relationship to their primary resource. So, in the strictest sense of their specifications, to address segments in media resources without losing the context of the primary resource, we can only use URI fragments.</p>
<p><strong>Browser-supported Media Fragment URIs</strong></p>
<p>For this reason, URI fragments are also the way in which my <a href="http://blog.gingertech.net/2009/09/02/demo-of-deep-hyperlinking-into-html5-video/">last media fragment addressing demo</a> has been implemented. For example, I would address </p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingertech.net/2009/09/08/uri-fragments-vs-uri-queries-for-media-fragment-addressing/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Demo of deep hyperlinking into HTML5 video</title>
		<link>http://blog.gingertech.net/2009/09/02/demo-of-deep-hyperlinking-into-html5-video/</link>
		<comments>http://blog.gingertech.net/2009/09/02/demo-of-deep-hyperlinking-into-html5-video/#comments</comments>
		<pubDate>Wed, 02 Sep 2009 00:27:09 +0000</pubDate>
		<dc:creator>silvia</dc:creator>
				<category><![CDATA[Digital Media]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[open codecs]]></category>
		<category><![CDATA[standards]]></category>
		<category><![CDATA[video accessibility]]></category>
		<category><![CDATA[deep hyperlinking]]></category>
		<category><![CDATA[HTML5 video]]></category>
		<category><![CDATA[media fragment]]></category>
		<category><![CDATA[media fragment URI]]></category>
		<category><![CDATA[reference implementation]]></category>

		<guid isPermaLink="false">http://blog.gingertech.net/?p=526</guid>
		<description><![CDATA[In an effort to give a demo of some of the W3C Media Fragment WG specification capabilities, I implemented a HTML5 page with a video element that reacts to fragment offset changes to the URL bar and the &#60;video> element. Demo Features The demo can be found on the Annodex Web server. It has the [...]]]></description>
			<content:encoded><![CDATA[<p>In an effort to give a demo of some of the W3C Media Fragment WG specification capabilities, I implemented a HTML5 page with a video element that reacts to fragment offset changes to the URL bar and the &lt;video> element.</p>
<p><strong>Demo Features</strong></p>
<p>The demo can be found <a href="http://www.annodex.net/~silvia/itext/mediafrag.html">on the Annodex Web server</a>. It has the following features:</p>
<p>If you simply load that Web page, you will <strong>see the video jump to an offset</strong> because it is referred to as &#8220;elephants_dream/elephant.ogv#t=20&#8243;.</p>
<p>If you <strong>change or add a temporal fragment in the URL bar</strong>, the video jumps to this time offset and overrules the video&#8217;s fragment addressing. (This only works in Firefox 3.6, see below &#8211; in older Firefoxes you actually have to reload the page for this to happen.) This functionality is <a href="http://ccnmtl.columbia.edu/enhanced/noted/youtube_hd.html">similar to a time linking functionality that YouTube also provides</a>.</p>
<p>When you hit the &#8220;play&#8221; button on the video and let it play a bit before hitting &#8220;pause&#8221; again &#8211; the <strong>second at which you hit &#8220;pause&#8221; is displayed in the page&#8217;s URL bar </strong>. In Firefox, this even leads to an addition to the browser&#8217;s history, so you can jump back to the previous pause position.</p>
<p>Three input boxes allow for experimentation with different functionality.</p>
<ul>
<li>The first one contains a <strong>link to the current Web page with the media fragment</strong> for the current video playback position. This text is displayed for cut-and-paste purposes, e.g. to send it in an email to friends.</li>
<li>The second one is an entry box which accepts float values as time offsets. Once entered, the <strong>video will jump to the given time offset</strong>. The URL of the video and the page URL will be updated.</li>
<li>The third one is an entry box which <strong>accepts a video URL that replaces the &lt;video> element&#8217;s @src attribute value</strong>. It is meant for experimentation with different temporal media fragment URLs as they get loaded into the &lt;video> element.</li>
</ul>
<p><strong>Javascript Hacks</strong></p>
<p>You can look at the source code of the page &#8211; all the javascript in use is actually at the bottom of the page. Here are some of the juicy bits of what I&#8217;ve done:</p>
<p>Since Web browsers do not support the parsing and reaction to media fragment URIs, I implemented this in javascript. Once the video is loaded, i.e. the <strong>&#8220;loadedmetadata&#8221; event</strong> is called on the video, I parse the video&#8217;s @currentSrc attribute and jump to a time offset if given. I use the @currentSrc, because it will be the URL that the video element is using after having parsed the @src attribute and all the containing &lt;source> elements (if they exist). This function is also called when the video&#8217;s @src attribute is changed through javascript.</p>
<p>This is the only bit from the demo that the browsers should do natively. The remaining functionality hooks up the temporal addressing for the video with the browser&#8217;s URL bar.</p>
<p>To display a URL in the URL bar that people can cut and paste to send to their friends, I hooked up the video&#8217;s <strong>&#8220;pause&#8221; event</strong> with an update to the URL bar. If you are jumping around through javascript calls to video.currentTime, you will also have to make these changes to the URL bar.</p>
<p>Finally, I am capturing the window&#8217;s <strong>&#8220;hashchange&#8221; event</strong>, which is new in HTML5 and only implemented in <a href="https://developer.mozilla.org/en/DOM/window.onhashchange">Firefox 3.6</a>. This means that if you change the temporal offset on the page&#8217;s URL, the browser will parse it and jump the video to the offset time.</p>
<p><strong>Optimisation</strong></p>
<p>Doing these kinds of jumps around on video can be very slow when the seeking is happening on the remote server. Firefox actually implements seeking over the network, which in the case of Ogg can require multiple jumps back and forth on the remote video file with byte range requests to locate the correct offset location.</p>
<p>To reduce as much as possible the effort that Firefox has to make with seeking, I referred to <a href="https://developer.mozilla.org/en/Configuring_servers_for_Ogg_media">Mozilla&#8217;s very useful help page to speed up video</a>. It is recommended to deliver the X-Content-Duration HTTP header from your Web server. For Ogg media, this can be provided through the oggz-chop CGI. Since I didn&#8217;t want to install it on my Apache server, I hard coded X-Content-Duration in a .htaccess file in the directory that serves the media file. The .htaccess file looks as follows:</p>
<p><code>&lt;Files "elephant.ogv"><br />
Header set X-Content-Duration "653.791"<br />
&lt;/Files></code></p>
<p>This should now help Firefox to avoid the extra seek necessary to determine the video&#8217;s duration and display the transport bar faster.</p>
<p>I also added the @autobuffer attribute to the &lt;video> element, which should make the complete video file available to the browser and thus speed up seeking enormously since it will not need to do any network requests and can just do it on the local file.</p>
<p><strong>ToDos</strong></p>
<p>This is only a first and very simple demo of media fragments and video. I have not made an effort to capture any errors or to parse a URL that is more complicated than simply containing &#8220;#t=&#8221;. Feel free to report any bugs to me in the comments or send me patches.</p>
<p>Also, I have not made an effort to use time ranges, which is part of the <a href="http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/#fragment-dimensions">W3C Media Fragment spec</a>. This should be simple to add, since it just requires to stop the video playback at the given end time.</p>
<p>Also, I have only implemented parsing of the most simple default time spec in seconds and fragments. None of the more complicated npt, smpte, or clock specifications have been implemented yet.</p>
<p>The possibilities for deeper access to video and for improved video accessibility with these URLs are vast. Just imagine hooking up the caption elements of e.g. an srt file with temporal hyperlinks and you can provide deep interaction between the video content and the captions. You could even drive this to the extreme and jump between single words if you mark up each with its time relationship. Happy experimenting!</p>
<p>UPDATE: I forgot to mention that it is really annoying that the video has to be re-loaded when the @src attribute is changed, even if only the hash changes. As support for media fragments is implemented in &lt;video> and &lt;audio> elements, it would be advantageous if the &#8220;load()&#8221; function checked whether only the hash changed and does not re-load the full resource in these cases.</p>
<p>Thanks go to Chris Double and Chris Pearce from Mozilla for their feedback and suggestions for improvement on an early version of this.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gingertech.net/2009/09/02/demo-of-deep-hyperlinking-into-html5-video/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
	</channel>
</rss>
