<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Dealing with multi-track video (and audio)</title>
	<atom:link href="http://blog.gingertech.net/2009/10/17/dealing-with-multi-track-video-and-audio/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.gingertech.net/2009/10/17/dealing-with-multi-track-video-and-audio/</link>
	<description>Silvia&#039;s blog</description>
	<lastBuildDate>Sat, 13 Mar 2010 01:13:36 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: silvia</title>
		<link>http://blog.gingertech.net/2009/10/17/dealing-with-multi-track-video-and-audio/comment-page-1/#comment-2497</link>
		<dc:creator>silvia</dc:creator>
		<pubDate>Sun, 22 Nov 2009 02:28:08 +0000</pubDate>
		<guid isPermaLink="false">http://blog.gingertech.net/?p=632#comment-2497</guid>
		<description>@Gabriel I wasn&#039;t referring to technical restrictions on the Web, just to what people are ready for - and with &quot;people&quot; I mean users, publishers, as well as tool providers. If there were good products out there that are able to produce your background/foreground separation automatically and they were important players in the market place, then there would be a need to look at it in the Web, too. Right now we are mostly trying to solve issues that YouTube and others have had to deal with and partially solved, but not in a manner that will work across the Web. Web standards generally don&#039;t tend to be doing new technology, but rather standardise good existing practice.</description>
		<content:encoded><![CDATA[<p>@Gabriel I wasn&#8217;t referring to technical restrictions on the Web, just to what people are ready for &#8211; and with &#8220;people&#8221; I mean users, publishers, as well as tool providers. If there were good products out there that are able to produce your background/foreground separation automatically and they were important players in the market place, then there would be a need to look at it in the Web, too. Right now we are mostly trying to solve issues that YouTube and others have had to deal with and partially solved, but not in a manner that will work across the Web. Web standards generally don&#8217;t tend to be doing new technology, but rather standardise good existing practice.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Gabriel Shalom</title>
		<link>http://blog.gingertech.net/2009/10/17/dealing-with-multi-track-video-and-audio/comment-page-1/#comment-1848</link>
		<dc:creator>Gabriel Shalom</dc:creator>
		<pubDate>Wed, 28 Oct 2009 12:08:59 +0000</pubDate>
		<guid isPermaLink="false">http://blog.gingertech.net/?p=632#comment-1848</guid>
		<description>@silvia

Which constraints of the web do see as being prohibitive specifically? I&#039;ve been reading a lot lately on how the costs of storage, bandwidth and processing are predicted to exponentially decline in the near future. Do you think these factors affect the evolution of such standards? Also, what about a cross-media format which might have a rich life of use &quot;offline&quot; that could eventually be ported to the web in the future...</description>
		<content:encoded><![CDATA[<p>@silvia</p>
<p>Which constraints of the web do see as being prohibitive specifically? I&#8217;ve been reading a lot lately on how the costs of storage, bandwidth and processing are predicted to exponentially decline in the near future. Do you think these factors affect the evolution of such standards? Also, what about a cross-media format which might have a rich life of use &#8220;offline&#8221; that could eventually be ported to the web in the future&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: silvia</title>
		<link>http://blog.gingertech.net/2009/10/17/dealing-with-multi-track-video-and-audio/comment-page-1/#comment-1831</link>
		<dc:creator>silvia</dc:creator>
		<pubDate>Wed, 28 Oct 2009 03:04:42 +0000</pubDate>
		<guid isPermaLink="false">http://blog.gingertech.net/?p=632#comment-1831</guid>
		<description>@Gabriel,

I think this is a bit too ambitious for right now. Describing such multi-layers in a simple manner has been tried in SMIL and MPEG-7 and I truly think we are not yet ready for it on the Web. We are moving only slightly in that direction with media fragments and the idea of addressing spatial objects.</description>
		<content:encoded><![CDATA[<p>@Gabriel,</p>
<p>I think this is a bit too ambitious for right now. Describing such multi-layers in a simple manner has been tried in SMIL and MPEG-7 and I truly think we are not yet ready for it on the Web. We are moving only slightly in that direction with media fragments and the idea of addressing spatial objects.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Gabriel Shalom</title>
		<link>http://blog.gingertech.net/2009/10/17/dealing-with-multi-track-video-and-audio/comment-page-1/#comment-1801</link>
		<dc:creator>Gabriel Shalom</dc:creator>
		<pubDate>Tue, 27 Oct 2009 03:13:12 +0000</pubDate>
		<guid isPermaLink="false">http://blog.gingertech.net/?p=632#comment-1801</guid>
		<description>How ambitious a request would it be to have a rudimentary depth-based multi-layering per video track; for example a separate layer for the background and foreground or for moving vs. static objects. Couldn&#039;t seam carving and motion tracking enable such layers of metadata to be more accessible? If this type of innovation is too far ahead for this project, when do you forsee it coming into being?

I raise this question because I truly believe there are creative, knowledge and financial motivations to see this sort of &quot;object-oriented&quot; approach to the elements within the frame be included in a video&#039;s metadata. Video is needlessly and artificially flattened by modeling it on celluloid.</description>
		<content:encoded><![CDATA[<p>How ambitious a request would it be to have a rudimentary depth-based multi-layering per video track; for example a separate layer for the background and foreground or for moving vs. static objects. Couldn&#8217;t seam carving and motion tracking enable such layers of metadata to be more accessible? If this type of innovation is too far ahead for this project, when do you forsee it coming into being?</p>
<p>I raise this question because I truly believe there are creative, knowledge and financial motivations to see this sort of &#8220;object-oriented&#8221; approach to the elements within the frame be included in a video&#8217;s metadata. Video is needlessly and artificially flattened by modeling it on celluloid.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: silvia</title>
		<link>http://blog.gingertech.net/2009/10/17/dealing-with-multi-track-video-and-audio/comment-page-1/#comment-1661</link>
		<dc:creator>silvia</dc:creator>
		<pubDate>Wed, 21 Oct 2009 22:12:15 +0000</pubDate>
		<guid isPermaLink="false">http://blog.gingertech.net/?p=632#comment-1661</guid>
		<description>I was pointed to QuickTime and how QuickTime does grouping of alternative tracks and associates presentation information.

Here is the link to the QuickTime &quot;Track Header&quot; description:
http://developer.apple.com/mac/library/documentation/QuickTime/QTFF/QTFFChap2/qtff2.html#//apple_ref/doc/uid/TP40000939-CH204-BBCEIDFA

It contains the following fields (amongst others):

Alternate group - specifies a collection of movie tracks that contain alternate data for one another, choices may be based on e.g. playback quality, language, or the capabilities of the computer.

Layer - describes the priority in which tracks overlay each other

Track width/height - pixel size of this track

Volume - loudness of this track

Further, here is the QuickTime &quot;Media Header&quot; description - &quot;Media Atoms&quot; define a track’s movie data:
http://developer.apple.com/mac/library/documentation/QuickTime/QTFF/QTFFChap2/qtff2.html#//apple_ref/doc/uid/TP40000939-CH204-BBCHCFID

It contains the following fields (amongst others), which I understand and used to chose tracks:

Language - the language code for this media

Quality - the media’s playback quality

I am not sure how this information is actually used in practice, but there is quite a bit of information that could be exposed through a JavaScript API if hooked up with the browser. Similar is necessary for Ogg (though we should make it simpler).</description>
		<content:encoded><![CDATA[<p>I was pointed to QuickTime and how QuickTime does grouping of alternative tracks and associates presentation information.</p>
<p>Here is the link to the QuickTime &#8220;Track Header&#8221; description:<br />
<a href="http://developer.apple.com/mac/library/documentation/QuickTime/QTFF/QTFFChap2/qtff2.html#//apple_ref/doc/uid/TP40000939-CH204-BBCEIDFA" rel="nofollow">http://developer.apple.com/mac/library/documentation/QuickTime/QTFF/QTFFChap2/qtff2.html#//apple_ref/doc/uid/TP40000939-CH204-BBCEIDFA</a></p>
<p>It contains the following fields (amongst others):</p>
<p>Alternate group &#8211; specifies a collection of movie tracks that contain alternate data for one another, choices may be based on e.g. playback quality, language, or the capabilities of the computer.</p>
<p>Layer &#8211; describes the priority in which tracks overlay each other</p>
<p>Track width/height &#8211; pixel size of this track</p>
<p>Volume &#8211; loudness of this track</p>
<p>Further, here is the QuickTime &#8220;Media Header&#8221; description &#8211; &#8220;Media Atoms&#8221; define a track’s movie data:<br />
<a href="http://developer.apple.com/mac/library/documentation/QuickTime/QTFF/QTFFChap2/qtff2.html#//apple_ref/doc/uid/TP40000939-CH204-BBCHCFID" rel="nofollow">http://developer.apple.com/mac/library/documentation/QuickTime/QTFF/QTFFChap2/qtff2.html#//apple_ref/doc/uid/TP40000939-CH204-BBCHCFID</a></p>
<p>It contains the following fields (amongst others), which I understand and used to chose tracks:</p>
<p>Language &#8211; the language code for this media</p>
<p>Quality &#8211; the media’s playback quality</p>
<p>I am not sure how this information is actually used in practice, but there is quite a bit of information that could be exposed through a JavaScript API if hooked up with the browser. Similar is necessary for Ogg (though we should make it simpler).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: silvia</title>
		<link>http://blog.gingertech.net/2009/10/17/dealing-with-multi-track-video-and-audio/comment-page-1/#comment-1660</link>
		<dc:creator>silvia</dc:creator>
		<pubDate>Wed, 21 Oct 2009 21:36:53 +0000</pubDate>
		<guid isPermaLink="false">http://blog.gingertech.net/?p=632#comment-1660</guid>
		<description>I have had many questions related to the challenges of multi-track video, so thought I should extend this a bit.

So, let me explain the idea for how this information could be used with the HTML5 video element:

Assuming we have a multi-track video linked in the src attribute of a video element, the browser would need to identify what is in the tracks (along the lines described in the blog entry) and then do something useful with it.

By default, it would use the main audio and video track of the default language that the browser is set to and play them back.

For blind people (which would need to be a setting in the browser) it would additionally activate the audio description track in the default language. It would also probably disable all video tracks.

For deaf people (again a browser setting) it would additionally activate the caption or subtitle track in the default language and the sign language video track in the default sign language. It would also probably disable all audio tracks.

Additionally, the browser would provide a right-click menu that lets you activate/deactivate all tracks individually. If the controls attribute is set in the video element, this menu is also added to the controls bar.

Additionally, there would be a JavaScript API through which the Web developer can identify the available tracks and turn them on/off selectively.

It is possible that to make this available in a container-format independent way, an external description format such as ROE is necessary.</description>
		<content:encoded><![CDATA[<p>I have had many questions related to the challenges of multi-track video, so thought I should extend this a bit.</p>
<p>So, let me explain the idea for how this information could be used with the HTML5 video element:</p>
<p>Assuming we have a multi-track video linked in the src attribute of a video element, the browser would need to identify what is in the tracks (along the lines described in the blog entry) and then do something useful with it.</p>
<p>By default, it would use the main audio and video track of the default language that the browser is set to and play them back.</p>
<p>For blind people (which would need to be a setting in the browser) it would additionally activate the audio description track in the default language. It would also probably disable all video tracks.</p>
<p>For deaf people (again a browser setting) it would additionally activate the caption or subtitle track in the default language and the sign language video track in the default sign language. It would also probably disable all audio tracks.</p>
<p>Additionally, the browser would provide a right-click menu that lets you activate/deactivate all tracks individually. If the controls attribute is set in the video element, this menu is also added to the controls bar.</p>
<p>Additionally, there would be a JavaScript API through which the Web developer can identify the available tracks and turn them on/off selectively.</p>
<p>It is possible that to make this available in a container-format independent way, an external description format such as ROE is necessary.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael Ressler</title>
		<link>http://blog.gingertech.net/2009/10/17/dealing-with-multi-track-video-and-audio/comment-page-1/#comment-1648</link>
		<dc:creator>Michael Ressler</dc:creator>
		<pubDate>Sun, 18 Oct 2009 22:07:15 +0000</pubDate>
		<guid isPermaLink="false">http://blog.gingertech.net/?p=632#comment-1648</guid>
		<description>I&#039;d prefer to address the tracks through something like ROE.  Naming conventions can get lengthy when overloaded with a lot of attributes.  Seems like ROE allows for some pretty verbose descriptions of tracks and their attributes.

As long as the spec allows for content negotiation with ROE descriptors, I think we&#039;re in business.</description>
		<content:encoded><![CDATA[<p>I&#8217;d prefer to address the tracks through something like ROE.  Naming conventions can get lengthy when overloaded with a lot of attributes.  Seems like ROE allows for some pretty verbose descriptions of tracks and their attributes.</p>
<p>As long as the spec allows for content negotiation with ROE descriptors, I think we&#8217;re in business.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ann O. Nymous</title>
		<link>http://blog.gingertech.net/2009/10/17/dealing-with-multi-track-video-and-audio/comment-page-1/#comment-1644</link>
		<dc:creator>Ann O. Nymous</dc:creator>
		<pubDate>Sun, 18 Oct 2009 12:24:38 +0000</pubDate>
		<guid isPermaLink="false">http://blog.gingertech.net/?p=632#comment-1644</guid>
		<description>A few notes from our IRC discussion:

- such a name, being a list of attributes, might be better off as actual attributes (eg, in Skeleton for Ogg), as it will both avoid having to parse a string into its constituent attributes, and make it easier to add new attributes should the need arise.
- Sending attribute preferences in the original request from the client (either via HTTP headers or URI parameters) will allow the server to select tracks and stream them to the client, needing only one roundtrip, as opposed to two if the client first requests a list of tracks, then requests a subet of these tracks, after having parsed them and worked out which ones it&#039;s interested about). This, if keeping names like proposed, also moves parsing complexity from client to server, a good thing I think.
- names might still be interesting as unique identifiers, for those clients that know exactly what they want (eg, editors), but do not have to carry the semantics.
- Track numbers are easy to break by remuxing a video (eg, simple edition, or even adding a subtitles track).</description>
		<content:encoded><![CDATA[<p>A few notes from our IRC discussion:</p>
<p>- such a name, being a list of attributes, might be better off as actual attributes (eg, in Skeleton for Ogg), as it will both avoid having to parse a string into its constituent attributes, and make it easier to add new attributes should the need arise.<br />
- Sending attribute preferences in the original request from the client (either via HTTP headers or URI parameters) will allow the server to select tracks and stream them to the client, needing only one roundtrip, as opposed to two if the client first requests a list of tracks, then requests a subet of these tracks, after having parsed them and worked out which ones it&#8217;s interested about). This, if keeping names like proposed, also moves parsing complexity from client to server, a good thing I think.<br />
- names might still be interesting as unique identifiers, for those clients that know exactly what they want (eg, editors), but do not have to carry the semantics.<br />
- Track numbers are easy to break by remuxing a video (eg, simple edition, or even adding a subtitles track).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: silvia</title>
		<link>http://blog.gingertech.net/2009/10/17/dealing-with-multi-track-video-and-audio/comment-page-1/#comment-1642</link>
		<dc:creator>silvia</dc:creator>
		<pubDate>Sun, 18 Oct 2009 08:07:12 +0000</pubDate>
		<guid isPermaLink="false">http://blog.gingertech.net/?p=632#comment-1642</guid>
		<description>@jeremy That is exactly the point why I am defining a naming scheme for such multi-track video: the browser can then instruct the server to only deliver those streams that are actually of interest to its particular user. Think of it as content negotiation on multi-track media.

A deaf person may argue the same for any audio track of a video, and a blind person for any video track - they also don&#039;t want to waste bandwidth on something that is not necessary to them. This naming scheme is a first step in the direction of enabling content negotiation on muti-track media files through the media fragment URI scheme which allows addressing of tracks.</description>
		<content:encoded><![CDATA[<p>@jeremy That is exactly the point why I am defining a naming scheme for such multi-track video: the browser can then instruct the server to only deliver those streams that are actually of interest to its particular user. Think of it as content negotiation on multi-track media.</p>
<p>A deaf person may argue the same for any audio track of a video, and a blind person for any video track &#8211; they also don&#8217;t want to waste bandwidth on something that is not necessary to them. This naming scheme is a first step in the direction of enabling content negotiation on muti-track media files through the media fragment URI scheme which allows addressing of tracks.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jeremy</title>
		<link>http://blog.gingertech.net/2009/10/17/dealing-with-multi-track-video-and-audio/comment-page-1/#comment-1641</link>
		<dc:creator>Jeremy</dc:creator>
		<pubDate>Sun, 18 Oct 2009 01:12:57 +0000</pubDate>
		<guid isPermaLink="false">http://blog.gingertech.net/?p=632#comment-1641</guid>
		<description>Our Internet quota is already small enough as it is. Do we really want to throw it away downloading multiple video streams even if you only want to watch one of them?

Take sign language -- I have perfectly good hearing, so I don&#039;t want to waste bandwidth downloading it. On the other hand, somebody without hearing won&#039;t want to download the audio track.</description>
		<content:encoded><![CDATA[<p>Our Internet quota is already small enough as it is. Do we really want to throw it away downloading multiple video streams even if you only want to watch one of them?</p>
<p>Take sign language &#8212; I have perfectly good hearing, so I don&#8217;t want to waste bandwidth downloading it. On the other hand, somebody without hearing won&#8217;t want to download the audio track.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
