ginger's thoughts

Silvia's blog

URI fragments vs URI queries for media fragment addressing

Posted in Digital Media, Open Source, code, open codecs, standards by silvia on the September 8th, 2009

In the W3C Media Fragment Working Group (MFWG) we have had long discussions about the use of the URI query (“?”) or the URI fragment (“#”) addressing approach for addressing directly into media fragments, and the diverse new HTTP headers required to serve such URI requests, considering such side conditions as the stripping-off of fragment parameters from a URI by Web browsers, or the existence of caching Web proxies.

As explained earlier, URI queries request (primary) resources, while URI fragments address secondary resources, which have a relationship to their primary resource. So, in the strictest sense of their specifications, to address segments in media resources without losing the context of the primary resource, we can only use URI fragments.

Browser-supported Media Fragment URIs

For this reason, URI fragments are also the way in which my last media fragment addressing demo has been implemented. For example, I would address “elephants_dream/elephant.ogv#t=12″.

In this case, no extra HTTP parameters are necessary, since my javascript code is making use of an existing functionality of Web browsers that support the HTML5 <video> tag: seeking over the network. Even when we expect the Web browser to support such URI fragment addressing schemes natively, we may still need to rely on the seeking functionality of the Web browser. This seeking functionality in the Ogg and Firefox case is based on several cleverly calculated byte range requests on the primary resource until the server returns the Ogg packets with the required time stamps.

Server-supported Media Fragment URIs

Seeking over the network is, of course, inefficient, and it would be a lot more useful if the server provided the required byte ranges straight away. This can only happen if the server finds out about which time ranges are actually required. Since the fragment part of a URI is not actually transferred over HTTP, the MFWG is proposing the addition of another HTTP header: a range header that can contain the temporal fragment specification. For example:

GET elephants_dream/elephant.ogv HTTP/1.1
Host: www.annodex.net
Accept: video/*
Range: seconds=12-

If we have a clever server, it is able to do the seeking and serve bytes from the seek destination, which is the closest inclusive time range. It will then reply with this time range (and the complete resource duration) in a HTTP partial content response:

HTTP/1.1 206 Partial Content
Accept-Ranges: bytes, seconds
Content-Length: 35714370
Content-Type: video/ogg
Content-Range: seconds 11.85-21.16/3600

The user agent doesn’t need more than the byte ranges that encapsulate the requested time range. Since it has previously prepared a decoding pipeline for the video, it has already loaded the header of the file and is capable of decoding from 11.85s onwards, dropping the first 0.15s and start playing the video fragment from 12s.

Now, this will work more efficiently than the previous browser-based seeking. However, the user agent will need to know which query to send, i.e. whether to query for an intelligently guessed byte range or the actual time range requested. It seems we can integrate the two without problems: the user agent can include both request ranges in one HTTP request. A server that doesn’t understand time ranges will only react to the byte ranges, while a server that understands time ranges will ignore the byte ranges and only react to the time ranges. The user agent will understand from the response whether it received a reply to the byte ranges or the time ranges and can react accordingly.

Web Proxy-supported Media Fragment URIs

A further optimisation that can be considered is to take caching Web proxies into account. These currently do not understand time ranges, but they may understand byte ranges. If we wanted to enable all of our browser-server-communication to be servable from these Web proxies, we need to make sure that the user agent only asks for byte ranges, so it can be served from the cache.

The way to do this is to add an additional HTTP request to our previously optimised retrieval approach, in which the server tells the user agent which byte ranges a requested time range maps to. Then, the user agent can directly undertake the retrieval of the required byte ranges and receive them from the Web proxy’s cache if possible.

To this end, an additional HTTP header tells the server to resolve the requested time range. Since this situation with the Web proxies’ lack of understanding time ranges is expected to be a temporary one, the proposed HTTP header is X-Accept-Range-Redirect. It tells the server to resolve the time range rather than servicing it.

GET elephants_dream/elephant.ogv HTTP/1.1
Host: www.annodex.net
Accept: video/*
Range: seconds=12-
X-Accept-Range-Redirect: bytes

The server’s reply contains information on what time schemes it supports (Accept-Ranges), explains that the X-Accept-Range-Redirect header results in a different reply to the previous request (Vary), and provides the mapping to bytes (X-Range-Redirect):

HTTP/1.1 200 OK
Accept-Ranges: bytes, seconds
Content-Type: video/ogg
X-Accept-TimeURI: npt, smpte-25
X-Range-Redirect: bytes 1113724-2082711/35714370
Vary: X-Accept-Range-Redirect
Location: http://www.annodex.net/elephants_dream/elephant.ogv

Now, it’s easy for the user agent to go back to a normal byte range request:

GET elephants_dream/elephant.ogv HTTP/1.1
Host: www.annodex.net
Accept: video/*
Range: bytes 1113724-2082711

And the proxy or the server can reply with the appropriate byte ranges:

HTTP/1.1 206 Partial Content
Accept-Ranges: bytes
Content-Length: 35714370
Content-Type: video/ogg
Content-Range: bytes 1113724-2082711/35714370

URI queries for media fragments

The above described URI fragment addressing methods only work for byte-identical segments of a media resource, since we assume a simple mapping between time (and potentially space, track and name) and bytes that each infrastructure element deals with. However, for some types of fragments it is impossible to maintain byte-identity and instead some sort of transcoding of the resource is necessary. In these cases, the user agent is not able to resolve the fragmentation by itself and a server interaction is required. Thus, URI queries are more appropriate, since they result in a server interaction and a different (primary) resource.

Where URI queries are used, the retrieval action has to additionally make sure to create a fully valid new resource, for example for Ogg this implies a reconstruction of Ogg headers to accurately describe the new resource (e.g. a non-zero start-time or different encoding parameters).

No new Range headers are required to execute a URI query for media fragment retrieval. The reply will not be a partial resource, though, but a 200 OK. Addition of the Content-Range header in the reply would be advantageous since it contains information on what range was actually retrievable in comparison to the URI query request. For resources that do not maintain this information (Ogg does), the browser can then determine how much to skip to display the resource from the actually requested offset.

Further it is possible to attach an additional response header called “Link” that relates the retrieved new resource to its primary resource. In this way the user agent is actually made aware of the relationship. The user agent could use an additional request to the primary resource to determine the full dimension of the complete resource. In this case, the user agent is also enable to choose to display the dimensions of the primary resource or the one created by the query.

Again, taking the next step, queries can also be enabled to support existing caching Web proxies using the same X-Accept-Range-Redirect header as fragments.

Combining Fragments and Queries

A combination of a URI query for media fragment with a URI fragment yields a URI fragment resolution on top of the newly created resource. For example, “elephants_dream/elephant.ogv?t=50,80#t=20″ will lead to the 20s fragment offset being applied to the new resource starting at 50 going to 80. Thus, the reply to this is a 10s extract, starting from 70-80.

Summary

If this looks all too complicated for you, don’t worry – most of this will be hidden within browsers and the infrastructure. Also, these are my current thoughts, brought together from recent discussions I had with the Media Fragments WG, so we may not end up exactly with this model. It makes sense to me and I am keen to see some implementations or further discussions happening around this.

6 Responses to 'URI fragments vs URI queries for media fragment addressing'

Subscribe to comments with RSS or TrackBack to 'URI fragments vs URI queries for media fragment addressing'.


  1. on September 8th, 2009 at 2:42 am

    Sylvia,

    Cannot agree more with the “URI queries for media fragments” paragraph, because most of the time, redirecting from a seconds-range query to a bytes-range query will not work, especially with media containers you may find today on the web (i.e. FLV, MP4 & OGG).

    A server rewrite is necessary most of the time, as it will synthetize some headers to create a new self-contained valid A/V fragment, according to each container type specificity (building an FLV metadata block is fairly easy, re-creating a MOOV atom in an MP4 is quite another story).

    The response will also contain extra information pertaining to the parent document, so that media players may display the fragment at the right place in the whole document representation (either in-band using containers facilities, like FLV metadata or MP4 iTunes tags (!), or out-bound using HTTP headers, provided the media players have access to these headers).

  2. silvia said,

    on September 8th, 2009 at 9:41 am

    Hi Pierre-Yves,

    Redirecting from a seconds-range query to a bytes-range query actually does work very well with Ogg. It is implemented it in oggz-chop and mod_annodex, which will help the server to re-synthesize the headers for the now smaller file. However, it may well be very complicated for other container format, such as quicktime or MP4 as you point out.

    I also agree: some container will support the delivery of the extra information required to correctly display the shortened resource (Ogg with skeleton does for temporal queries), but others may not and thus the extra HTTP headers are required. It will indeed take a while before all media players and Web browsers will support these extra headers.

    The good thing is: it can all be done in stages. The most fundamental and important media fragment addressing using URI fragments has already been shown to work and is a trivial extension for Web browsers that have already implemented the HTML5 video tag. So, I am hopeful that we can get that support soon. The optimisations that I list and the different other options can be added over time. Server components will need to be written, media players/browsers/other UAs will need to be adapted, and ultimately Web proxies (such as squid) will want to learn about the new range headers and support them. Lots to do still!


  3. on September 8th, 2009 at 5:19 pm

    Hi Silvia,

    “It is implemented it in oggz-chop and mod_annodex, which will help the server to re-synthesize the headers for the now smaller file.” : yes, my point exactly. There is no “direct” mapping between a time-range and a byte-range for most container, and one need some server assistance to re-build a different document most of the time.

    The only cases I can think about for MFWG[7.2] (the seconds-to-bytes range conversion through a Vary range referrer) are bare MP3 audio or MPEG-2 PS files, where data is organized as short closed GOP (so to speak for MP3 ^_^), with additional re-synchronization sequences at the beginning of each frame (these containers were specifically designed so hardware decoders could re-synchronize easily without relying on extra headers or index tables).

    With today’s formats found on the Internet (FLV, MP4 and OGG), it’s not actually practical to do so. So either one build a server extension to “understand” the different containers and deal with them, or we are back to the client-side “try and guess” bytes-ranges requests we actually see in Quicktime player (iPhone clients) or HTML5 video players (Firefox and Chrome do exactly that). When most of your content is VBR-encoded, its obviously sub-optimal and error-prone (as you pointed out above, despite any clever algorithm you may throw at it).

    We (at Dailymotion) built such a server extension from scratch, supporting the 3 above containers with full key-frames seeking and delivery throttling support, entirely at the server side (with some minor client-side tweaks (Flash player for now) to present the user with a consistent time-line while seeking through a given document). The extension primarily focused on efficiency so that we could activate the seeking and bandwidth throttling features without upgrading our current hardware (when it comes to tens of thousands of simultaneous delivered streams translating into tens of Gbps, you may find the exercise not so easy all of a sudden ^_^).

    It also integrates nicely within our streaming architecture, with multi-layers caching and streams security management. We may release the source code at some point in an (hopefully soon) future, provided we can remove all Dailymotion-specific code from it. We are still trying to build an HTML5 video tag prototype with in- and out-of buffer seeking support, and are working closely with Mozilla on this to provide some additional access to the underlying HTTP connection headers.

    Pierre-Yves


  4. on September 8th, 2009 at 9:11 pm

    Great post, it helps to see how all the HTTP headers are meant to interact!

    “It seems we can integrate the two without problems: the user agent can include both request ranges in one HTTP request.”

    I have my doubts about this. In order to make a guess about the byte range the UA will need to know the duration of the resource. At least for Ogg, that means that we already need to have done at least 2 requests: 1 for headers and 1 at the end of the resource for getting the duration. The only exception is a server that supports X-Content-Duration but not time range requests, which seems unlikely.

    I also would be surprised if there wasn’t a lot of server software that assumes that there will be at most 1 Range HTTP header and misbehaves otherwise.

    It looks like there’s a lot to do but as you say support for media fragments will happen progressively rather than as one big step.

  5. silvia said,

    on September 9th, 2009 at 10:52 am

    @pierre-yves: sounds like some awesome work that you’ve done! It would be great if that was available as open source.

    @philip: I believe the request for the headers is actually part of “setting up the video tag”, i.e. it is done anyway by the browser to set up its decoding pipeline when it encounters a video tag. For the duration bit, I would indeed expect X-Content-Duration to be available.

  6. silvia said,

    on December 14th, 2009 at 2:16 pm

    There’s now a more in-depth specification of the HTTP header exchange for media fragments at http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/ . Feedback very welcome!

Leave a Reply