Accessibility support in Ogg and liboggplay
At the recent FOMS/LCA in Wellington, New Zealand, we talked a lot about how Ogg could support accessibility. Technically, this means support for multiple text tracks (subtitles/captions), multiple audio tracks (audio descriptions parallel to main audio track), and multiple video tracks (sign language video parallel to main video track).
Creating multitrack Ogg files
The creation of multitrack Ogg files is already possible using one of the muxing applications, e.g. oggz-merge. For example, I have my own little collection of multitrack Ogg files at http://annodex.net/~silvia/itext/elephants_dream/multitrack/. But then you are stranded with files that no player will play back.
Multitrack Ogg in Players
As Ogg is now being used in multiple Web browsers in the new HTML5 media formats, there are in particular requirements for accessibility support for the hard-of-hearing and vision-impaired. Either multitrack Ogg needs to become more of a common case, or the association of external media files that provide synchronised accessibility data (captions, audio descriptions, sign language) to the main media file needs to become a standard in HTML5.
As it turn out, both these approaches are being considered and worked on in the W3C. Accessibility data that are audio or video tracks will in the near future have to come out of the media resource itself, but captions and other text tracks will also be available from external associated elements.
The availability of internal accessibility tracks in Ogg is a new use case – something Ogg has been ready to do, but has not gone into common usage. MPEG files on the other hand have for a long time been used with internal accessibility tracks and thus frameworks and players are in place to decode such tracks and do something sensible with them. This is not so much the case for Ogg.
For example, a current VLC build installed on Windows will display captions, because Ogg Kate support is activated. A current VLC build on any other platform, however, has Ogg Kate support deactivated in the build, so captions won’t display. This will hopefully change soon, but we have to look also beyond players and into media frameworks – in particular those that are being used by the browser vendors to provide Ogg support.
Multitrack Ogg in Browsers
Hopefully gstreamer (which is what Opera uses for Ogg support) and ffmpeg (which is what Chrome uses for Ogg support) will expose all available tracks to the browser so they can expose them to the user for turning on and off. Incidentally, a multitrack media JavaScript API is in development in the W3C HTML5 Accessibility Task Force for allowing such control.
The current version of Firefox uses liboggplay for Ogg support, but liboggplay’s multitrack support has been sketchy this far. So, Viktor Gal – the liboggplay maintainer – and I sat down at FOMS/LCA to discuss this and Viktor developed some patches to make the demo player in the liboggplay package, the glut-player, support the accessibility use cases.
I applied Viktor’s patch to my local copy of liboggplay and I am very excited to show you the screencast of glut-player playing back a video file with an audio description track and an English caption track all in sync:
Further developments
There are still important questions open: for example, how will a player know that an audio description track is to be played together with the main audio track, but a dub track (e.g. a German dub for an English video) is to be played as an alternative. Such metadata for the tracks is something that Ogg is still missing, but that Ogg can be extended with fairly easily through the use of the Skeleton track. It is something the Xiph community is now working on.
Summary
This is great progress towards accessibility support in Ogg and therefore in Web browsers. And there is more to come soon.
How to display seeked position for HTML5 video
Recently, I was asked for some help on coding with an HTML5 video element and its events. In particular the question was: how do I display the time position that somebody seeked to in a video?
Here is a code snipped that shows how to use the seeked event:
<video onseeked="writeVideoTime(this.currentTime);" src="video.ogv" controls></video>
<p>position:</p><div id="videotime"></div>
<script type="text/javascript">
// get video element
var video = document.getElementsByTagName("video")[0];
function writeVideoTime(t) {
document.getElementById("videotime").innerHTML=t;
}
</script>
Other events that can be used in a similar way are:
- loadstart: UA requests the media data from the server
- progress: UA is fetching media data from the server
- suspend: UA is on purpose idling on the server connection mid-fetching
- abort: UA aborts fetching media data from the server
- error: UA aborts fetching media because of a network error
- emptied: UA runs out of network buffered media data (I think)
- stalled: UA is waiting for media data from the server
- play: playback has begun after play() method returns
- pause: playback has been paused after pause() method returns
- loadedmetadata: UA has received all its setup information for the media resource, duration and dimensions and is ready to play
- loadeddata: UA can render the media data at the current playback position for the first time
- waiting: playback has stopped because the next frame is not available yet
- playing: playback has started
- canplay: playback can resume, but at risk of buffer underrun
- canplaythrough: playback can resume without estimated risk of buffer underrun
- seeking: seeking attribute changed to true (may be too short to catch)
- seeked: seeking attribute changed to false
- timeupdate: current playback position changed enough to report on it
- ended: playback stopped at media resource end; ended attribute is true
- ratechange: defaultPlaybackRate or playbackRate attribute have just changed
- durationchange: duration attribute has changed
- volumechange:volume attribute or the muted attribute has changed
Please refer to the actual event list in the specification for more details and more accurate information on the events.
Audio Track Accessibility for HTML5
I have talked a lot about synchronising multiple tracks of audio and video content recently. The reason was mainly that I foresee a need for more than two parallel audio and video tracks, such as audio descriptions for the vision-impaired or dub tracks for internationalisation, as well as sign language tracks for the hard-of-hearing.
It is almost impossible to introduce a good scheme to deliver the right video composition to a target audience. Common people will prefer bare a/v, vision-impaired would probably prefer only audio plus audio descriptions (but will probably take the video), and the hard-of-hearing will prefer video plus captions and possibly a sign language track . While it is possible to dynamically create files that contain such tracks on a server and then deliver the right composition, implementation of such a server method has not been very successful in the last years and it would likely take many years to roll out such new infrastructure.
So, the only other option we have is to synchronise completely separate media resource together as they are selected by the audience.
It is this need that this HTML5 accessibility demo is about: Check out the demo of multiple media resource synchronisation.
I created a Ogg video with only a video track (10m53s750). Then I created an audio track that is the original English audio track (10m53s696). Then I used a Spanish dub track that I found through BlenderNation as an alternative audio track (10m58s337). Lastly, I created an audio description track in the original language (10m53s706). This creates a video track with three optional audio tracks.
I took away all native controls from these elements when using the HTML5 audio and video tag and ran my own stop/play and seeking approaches, which handled all media elements in one go.
I was mostly interested in the quality of this experience. Would the different media files stay mostly in sync? They are normally decoded in different threads, so how big would the drift be?
The resulting page is the basis for such experiments with synchronisation.
The page prints the current playback position in all of the media files at a constant interval of 500ms. Note that when you pause and then play again, I am re-synching the audio tracks with the video track, but not when you just let the files play through.
I have let the files play through on my rather busy Macbook and have achieved the following interesting drift over the course of about 9 minutes:
You will see that the video was the slowest, only doing roughly 540s, while the Spanish dub did 560s in the same time.
To fix such drifts, you can always include regular re-synchronisation points into the video playback. For example, you could set a timeout on the playback to re-sync every 500ms. Within such a short time, it is almost impossible to notice a drift. Don’t re-load the video, because it will lead to visual artifacts. But do use the video’s currentTime to re-set the others. (UPDATE: Actually, it depends on your situation, which track is the best choice as the main timeline. See also comments below.)
It is a workable way of associating random numbers of media tracks with videos, in particular in situations where the creation of merged files cannot easily be included in a workflow.
Tutorial on HTML5 open video at LCA 2010
During last week’s LCA, Jan Gerber, Michael Dale and I gave a 3 hour tutorial on how to publish HTML5 video in an open format.
We basically taught people how to create and publish Ogg Theora video in HTML5 Web pages and how to make them work across browsers, including much of the available tools and libraries. We’re hoping that some people will have learnt enough to include modules in CMSes such as Drupal, Joomla and Wordpress, which will easily support the publishing of Ogg Theora.
I have been asked to share the material that we used. It consists of:
- HTML5_Tutorial (611KB)
- the example videos (328MB), and
- HTML5 video exercises (3.4KB).
Note that if you would like to walk through the exercises, you should install the following software beforehand:
- oggz-tools
- oggvideotools
- apache2 or a Web server of your choice
- ffmpeg2theora
- firefox3.5+
- firefogg plugin
- firebug plugin
- vlc, mplayer, totem or xine
- kino or pitivi or another video editor that exports Theora, e.g. iMovie with XiphQT
You might need to look for packages of your favourite OS (e.g. Windows or Mac, Ubuntu or Debian).
The exercises include:
- creating a Ogg video from an editor
- transcoding a video using http://firefogg.org/
- creating a poster image using OggThumb
- writing a first HTML5 video Web page with Ogg Theora
- publishing it on a Web Server, with correct MIME type & Duration hint
- writing a second HTML5 video Web page with Ogg Theora & MP4 to cover Safari/Webkit
- transcoding using ffmpeg2theora in a script
- writing a third HTML5 video Web page with Cortado fallback
- writing a fourth Web page using “Video for Everybody”
- writing a fifth Web page using “mwEmbed”
- writing a sixth Web page using firefogg for transcoding before upload
- and a seventh one with a progress bar
- encoding srt subtitles into an Ogg Kate track
- writing an eighth Web page using cortado to display the Ogg Kate track
For those that would like to see the slides here immediately, a special flash embed:
Enjoy!
Best economy flight evva!
Over the years, I have flown a lot – mainly between Sydney and Frankfurt or Sydney and San Francisco. Today, for the first time in a long time, I had a flight with Qantas from Sydney to San Francisco. And I must say: it was the most productive and most comfortable economy flight I had in a long time.
This is gonna feel awkward, since it’s not one of my usual technical posts. But I just have to say “Thank you” to Qantas. When I fly to the US, I tend to catch a US airline because they usually turn up as the cheapest. This time, Qantas was the second cheapest, so I decided to spend the extra hundred bucks on getting a modern airline. Yes, get that US airlines: no matter which of you I take, I always feel like I am thrown back into the last century. Legspace is rare, seats are uncomfortable, food is crap, service is poor, oh … and have you ever heard of personal entertainment screens? Yes, I know, your planes are from the last century. But honestly: I had a personal entertainment screen on my Singapore Airlines flight when coming to Australia for the first time in 1998! Couldn’t you at least upgrade the inside of your planes?
Anyway, back to this flight. It all started with the question: would you like to sit in the centre isle in front of the baby bassinet? Oh, I usually take a window seat to get some peace and quiet – but hey, I’m not going to say “no” to space! And, man did I use it!
I settled in with a good book and a little nap until the first meal and after that felt strengthened and awake enough to start hacking. With my new MacBook Pro, I was bound to get a few hours in before the battery would die on me. Not the 7 hours, that Apple claims, but that’s because I was going to do lots of compiles of Firefox. Anyway – without a seat in front of me, without the personal entertainment screen pulled out, and with the nice thick cushion that Qantas supply on my lap, protecting me from the laptop heat, I almost felt like I was back home in my living room.
On top of that – and unfortunately for Qantas, but fortunately for me – the plane was only two thirds full, so I had the middle seat on my left empty, which I immediately used to extend my table space. I had continuing catering service for the next 4-5 hours of compiling, applying OggK patches to the new Chris Double Firefox codebase, and fixing compile errors (all configuration based – I have yet to get to writing actual code). Ongoing catering service, no need to cook for myself, uninterrupted coding time, good music from the inflight entertainment service – I think I’ll move my office into a Qantas plane! Not been this productive in ages!
Everywhere around me the lights were out, people were watching movies, but I was working and really enjoying it. And then, the battery was empty, half way into the flight. Bummer! But I didn’t give up this easily. Thought it’d be worth asking if there was a way to recharge without occupying a toilet for two hours. And as with everything else, Qantas inflight personnel made an extra effort to please: they found me a empty seat in business class and hooked up the laptop for an hour to recharge. Totally, utterly awesome! I got it back after another nice reading break – cannot start watching movies, since that makes the brain go mash. I got another few hours of compiling in before my body forced me to catch a few hours of sleep.
Now, I’m about an hour away from San Fran and the laptop claims 40min of power left. Funnily, that number seems to go up rather than down, so I’m sure it will last until arrival (uh! It’s now at 1:24min – oh, compilation just finished!). Hopefully I will be able to find out, why some of the Ogg Theora/Vorbis/Kate videos that I created using kateenc and oggz-merge don’t play in the patched Firefox. After all, it would be awesome to be able to show it off in the upcoming HTML5 Video Accessibility workshop!
New proposal for captions and other timed text for HTML5
The first specification for how to include captions, subtitles, lyrics, and similar time-aligned text with HTML5 media elements has received a lot of feedback – probably because there are several demos available.
The feedback has encouraged me to develop a new specification that includes the concerns and makes it easier to associate out-of-band time-aligned text (i.e. subtitles stored in separate files to the video/audio file). A simple example of the new specification using srt files is this:
<video src="video.ogv" controls>
<itextlist category="CC">
<itext src="caption_en.srt" lang="en"/>
<itext src="caption_de.srt" lang="de"/>
<itext src="caption_fr.srt" lang="fr"/>
<itext src="caption_jp.srt" lang="jp"/>
</itextlist>
</video>
By default, the charset of the itext file is UTF-8, and the default format is text/srt (incidentally a mime type the still needs to be registered). Also by default the browser is expected to select for display the track that matches the set default language of the browser. This has been proven to work well in the previous experiments.
Check out the new itext specification, read on to get an introduction to what has changed, and leave me your feedback if you can!
The itextlist element
You will have noticed that in comparison to the previous specification, this specification contains a grouping element called “itextlist”. This is necessary because we have to distinguish between alternative time-aligned text tracks and ones that can be additional, i.e. displayed at the same time. In the first specification this was done by inspecting each itext element’s category and grouping them together, but that resulted in much repetition and unreadable specifications.
Also, it was not clear which itext elements were to be displayed in the same region and which in different ones. Now, their styling can be controlled uniformly.
The final advantage is that association of callbacks for entering and leaving text segments as extracted from the itext elements can now be controlled from the itextlist element in a uniform manner.
This change also makes it simple for a parser to determine the structure of the menu that is created and included in the controls element of the audio or video element.
Incidentally, a patch for Firefox already exists that makes this part of the browser. It does not yet support this new itext specification, but here is a screenshot that Felipe Corrêa da Silva Sanches created to demonstrate it:
If several itextlist elements are specified, that menu will receive sub-menus – one each for each itextlist. An example is the following:
<video src="video.ogv" aria-label="test video" controls>
<itextlist category="SUB" name="subtitles">
<itext src="sub_en.srt" lang="en"/>
<itext src="sub_de.srt" lang="de"/>
<itext src="sub_fr.srt" lang="fr"/>
<itext src="sub_jp.srt" lang="jp"/>
</itextlist>
<itextlist category="TAD" name="spoken transcript">
<itext id="tad_en" src="tad_en.srt" lang="en"/>
<itext id="tad_jp" src="tad_jp.srt" lang="jp"/>
</itextlist>
</video>
which will result in the following menu structure:
text
- subtitles
-- English
-- German
-- French
-- Japanese
-- none
- spoken transcript
-- English
-- Japanese
-- none
Similarly, a context menu would use the same structure.
Callbacks on timed text segments
This specification further introduces callbacks on time-aligned text segments: onenter and onleave. At this stage this is an idea I am experimenting with, but I believe has lots of potential to allow people to do fancy things when subtitles appear or disappear. Some ideas are: to have a specific picture displayed that relates to the text segment, to have text in another area of the display change e.g. because we have moved into a different part of the full text transcript, or to display Google ads that relate to the text in that particular text segment.
I am curious about feedback on this idea. It relates closely to the idea of cue ranges that was previously part of HTML5.
It is possible to achieve this effect simply through adding a timeupdate event listener, but proper callbacks like these are much more efficient.
Synchronisation adjustments
Another addition to the itext element is the introduction of two attributes that together allow fixing synchronisation issues in the timing between the video (or audio) and the itext track. The two attributes are “delay” and “stretch”.
“delay” allows specification of a negative or positive float value that represents the amount of seconds with which to delay the display of the itext text segments relative to the timing of the video (or audio) element.
“stretch” allows fixing a constant drift that in timing differences between the video (or audio) element and the text segments. It is given in percent, where 100% means no time stretch, 97% means getting the text segments 3% faster than their actual timing, and 108% means 8% slower.
These attributes are relevant since itext files are independent resources to the media resource and can therefore synchronise to a different clock than the media files. It happens frequently with srt files that are being used for differently encoded video files.
Further feedback
I am currently experimenting with creating the same kind of JavaScript API for in-line annotation tracks through extending some Firefox patches. It is exciting to see it all come together.
At the same time, I am sure there is still feedback that will further improve the specification and I encourage you to contribute. I have set up a wiki page where you can leave your feedback. Also feel free to drop me an email or leave a comment on this blog post. Thanks!
UPDATE 30th Oct 2009:
There is now also a working implementation that demonstrates the approach with itextlist. Check out http://www.annodex.net/~silvia/itext/elephant_no_skin_v2.html, which will not look much different to the previous version, but does indeed behave very differently.
HTML5 audio element accessibility
As part of my experiments in video accessibility I am also looking at the audio element. I have just finished a proof of concept for parsing Lyrics files for music in lrc format.
The demo uses Tay Zonday’s “Chocolate Rain” song both as a video with subtitles and as an audio file with lyrics. Fortunately, he published these all under a creative commons license, so I was able to use this music file. BTW: I found it really difficult to find a openly licensed music file with lyrics.
While I was at it, I also cleaned up all the old demos and now have a nice list of all demos in a central file.
Updated video accessibility demo
Just a brief note to share that I have updated the video accessibility demo at http://www.annodex.net/~silvia/itext/elephant_no_skin.html.
It should now support ARIA and tab access to the menu, which I have simply put next to the video. I implemented the menu by learning from YUI. My Firefox 3.5.3 actually doesn’t tab through it, but then it also doesn’t tab through the YUI example, which I think is correct. Go figure.
Also, the textual audio descriptions are improved and should now work better with screenreaders.
I have also just prepared a recorded audio description of “Elephants Dreams” (German accent warning).
You can also download the multitrack Ogg Theora video file that contains the original audio and video track plus the audio description as an extra track, created using oggz-merge.
As soon as some kind soul donates a sign language track for “Elephants Dream”, I will have a pretty complete set of video accessibility tracks for that video. This will certainly become the basis for more video a11y work!
URI fragments vs URI queries for media fragment addressing
In the W3C Media Fragment Working Group (MFWG) we have had long discussions about the use of the URI query (“?”) or the URI fragment (“#”) addressing approach for addressing directly into media fragments, and the diverse new HTTP headers required to serve such URI requests, considering such side conditions as the stripping-off of fragment parameters from a URI by Web browsers, or the existence of caching Web proxies.
As explained earlier, URI queries request (primary) resources, while URI fragments address secondary resources, which have a relationship to their primary resource. So, in the strictest sense of their specifications, to address segments in media resources without losing the context of the primary resource, we can only use URI fragments.
Browser-supported Media Fragment URIs
For this reason, URI fragments are also the way in which my last media fragment addressing demo has been implemented. For example, I would address “elephants_dream/elephant.ogv#t=12″.
In this case, no extra HTTP parameters are necessary, since my javascript code is making use of an existing functionality of Web browsers that support the HTML5 <video> tag: seeking over the network. Even when we expect the Web browser to support such URI fragment addressing schemes natively, we may still need to rely on the seeking functionality of the Web browser. This seeking functionality in the Ogg and Firefox case is based on several cleverly calculated byte range requests on the primary resource until the server returns the Ogg packets with the required time stamps.
Server-supported Media Fragment URIs
Seeking over the network is, of course, inefficient, and it would be a lot more useful if the server provided the required byte ranges straight away. This can only happen if the server finds out about which time ranges are actually required. Since the fragment part of a URI is not actually transferred over HTTP, the MFWG is proposing the addition of another HTTP header: a range header that can contain the temporal fragment specification. For example:
GET elephants_dream/elephant.ogv HTTP/1.1
Host: www.annodex.net
Accept: video/*
Range: seconds=12-
If we have a clever server, it is able to do the seeking and serve bytes from the seek destination, which is the closest inclusive time range. It will then reply with this time range (and the complete resource duration) in a HTTP partial content response:
HTTP/1.1 206 Partial Content
Accept-Ranges: bytes, seconds
Content-Length: 35714370
Content-Type: video/ogg
Content-Range: seconds 11.85-21.16/3600
The user agent doesn’t need more than the byte ranges that encapsulate the requested time range. Since it has previously prepared a decoding pipeline for the video, it has already loaded the header of the file and is capable of decoding from 11.85s onwards, dropping the first 0.15s and start playing the video fragment from 12s.
Now, this will work more efficiently than the previous browser-based seeking. However, the user agent will need to know which query to send, i.e. whether to query for an intelligently guessed byte range or the actual time range requested. It seems we can integrate the two without problems: the user agent can include both request ranges in one HTTP request. A server that doesn’t understand time ranges will only react to the byte ranges, while a server that understands time ranges will ignore the byte ranges and only react to the time ranges. The user agent will understand from the response whether it received a reply to the byte ranges or the time ranges and can react accordingly.
Web Proxy-supported Media Fragment URIs
A further optimisation that can be considered is to take caching Web proxies into account. These currently do not understand time ranges, but they may understand byte ranges. If we wanted to enable all of our browser-server-communication to be servable from these Web proxies, we need to make sure that the user agent only asks for byte ranges, so it can be served from the cache.
The way to do this is to add an additional HTTP request to our previously optimised retrieval approach, in which the server tells the user agent which byte ranges a requested time range maps to. Then, the user agent can directly undertake the retrieval of the required byte ranges and receive them from the Web proxy’s cache if possible.
To this end, an additional HTTP header tells the server to resolve the requested time range. Since this situation with the Web proxies’ lack of understanding time ranges is expected to be a temporary one, the proposed HTTP header is X-Accept-Range-Redirect. It tells the server to resolve the time range rather than servicing it.
GET elephants_dream/elephant.ogv HTTP/1.1
Host: www.annodex.net
Accept: video/*
Range: seconds=12-
X-Accept-Range-Redirect: bytes
The server’s reply contains information on what time schemes it supports (Accept-Ranges), explains that the X-Accept-Range-Redirect header results in a different reply to the previous request (Vary), and provides the mapping to bytes (X-Range-Redirect):
HTTP/1.1 200 OK
Accept-Ranges: bytes, seconds
Content-Type: video/ogg
X-Accept-TimeURI: npt, smpte-25
X-Range-Redirect: bytes 1113724-2082711/35714370
Vary: X-Accept-Range-Redirect
Location: http://www.annodex.net/elephants_dream/elephant.ogv
Now, it’s easy for the user agent to go back to a normal byte range request:
GET elephants_dream/elephant.ogv HTTP/1.1
Host: www.annodex.net
Accept: video/*
Range: bytes 1113724-2082711
And the proxy or the server can reply with the appropriate byte ranges:
HTTP/1.1 206 Partial Content
Accept-Ranges: bytes
Content-Length: 35714370
Content-Type: video/ogg
Content-Range: bytes 1113724-2082711/35714370
URI queries for media fragments
The above described URI fragment addressing methods only work for byte-identical segments of a media resource, since we assume a simple mapping between time (and potentially space, track and name) and bytes that each infrastructure element deals with. However, for some types of fragments it is impossible to maintain byte-identity and instead some sort of transcoding of the resource is necessary. In these cases, the user agent is not able to resolve the fragmentation by itself and a server interaction is required. Thus, URI queries are more appropriate, since they result in a server interaction and a different (primary) resource.
Where URI queries are used, the retrieval action has to additionally make sure to create a fully valid new resource, for example for Ogg this implies a reconstruction of Ogg headers to accurately describe the new resource (e.g. a non-zero start-time or different encoding parameters).
No new Range headers are required to execute a URI query for media fragment retrieval. The reply will not be a partial resource, though, but a 200 OK. Addition of the Content-Range header in the reply would be advantageous since it contains information on what range was actually retrievable in comparison to the URI query request. For resources that do not maintain this information (Ogg does), the browser can then determine how much to skip to display the resource from the actually requested offset.
Further it is possible to attach an additional response header called “Link” that relates the retrieved new resource to its primary resource. In this way the user agent is actually made aware of the relationship. The user agent could use an additional request to the primary resource to determine the full dimension of the complete resource. In this case, the user agent is also enable to choose to display the dimensions of the primary resource or the one created by the query.
Again, taking the next step, queries can also be enabled to support existing caching Web proxies using the same X-Accept-Range-Redirect header as fragments.
Combining Fragments and Queries
A combination of a URI query for media fragment with a URI fragment yields a URI fragment resolution on top of the newly created resource. For example, “elephants_dream/elephant.ogv?t=50,80#t=20″ will lead to the 20s fragment offset being applied to the new resource starting at 50 going to 80. Thus, the reply to this is a 10s extract, starting from 70-80.
Summary
If this looks all too complicated for you, don’t worry – most of this will be hidden within browsers and the infrastructure. Also, these are my current thoughts, brought together from recent discussions I had with the Media Fragments WG, so we may not end up exactly with this model. It makes sense to me and I am keen to see some implementations or further discussions happening around this.
Demo of deep hyperlinking into HTML5 video
In an effort to give a demo of some of the W3C Media Fragment WG specification capabilities, I implemented a HTML5 page with a video element that reacts to fragment offset changes to the URL bar and the <video> element.
Demo Features
The demo can be found on the Annodex Web server. It has the following features:
If you simply load that Web page, you will see the video jump to an offset because it is referred to as “elephants_dream/elephant.ogv#t=20″.
If you change or add a temporal fragment in the URL bar, the video jumps to this time offset and overrules the video’s fragment addressing. (This only works in Firefox 3.6, see below – in older Firefoxes you actually have to reload the page for this to happen.) This functionality is similar to a time linking functionality that YouTube also provides.
When you hit the “play” button on the video and let it play a bit before hitting “pause” again – the second at which you hit “pause” is displayed in the page’s URL bar . In Firefox, this even leads to an addition to the browser’s history, so you can jump back to the previous pause position.
Three input boxes allow for experimentation with different functionality.
- The first one contains a link to the current Web page with the media fragment for the current video playback position. This text is displayed for cut-and-paste purposes, e.g. to send it in an email to friends.
- The second one is an entry box which accepts float values as time offsets. Once entered, the video will jump to the given time offset. The URL of the video and the page URL will be updated.
- The third one is an entry box which accepts a video URL that replaces the <video> element’s @src attribute value. It is meant for experimentation with different temporal media fragment URLs as they get loaded into the <video> element.
Javascript Hacks
You can look at the source code of the page – all the javascript in use is actually at the bottom of the page. Here are some of the juicy bits of what I’ve done:
Since Web browsers do not support the parsing and reaction to media fragment URIs, I implemented this in javascript. Once the video is loaded, i.e. the “loadedmetadata” event is called on the video, I parse the video’s @currentSrc attribute and jump to a time offset if given. I use the @currentSrc, because it will be the URL that the video element is using after having parsed the @src attribute and all the containing <source> elements (if they exist). This function is also called when the video’s @src attribute is changed through javascript.
This is the only bit from the demo that the browsers should do natively. The remaining functionality hooks up the temporal addressing for the video with the browser’s URL bar.
To display a URL in the URL bar that people can cut and paste to send to their friends, I hooked up the video’s “pause” event with an update to the URL bar. If you are jumping around through javascript calls to video.currentTime, you will also have to make these changes to the URL bar.
Finally, I am capturing the window’s “hashchange” event, which is new in HTML5 and only implemented in Firefox 3.6. This means that if you change the temporal offset on the page’s URL, the browser will parse it and jump the video to the offset time.
Optimisation
Doing these kinds of jumps around on video can be very slow when the seeking is happening on the remote server. Firefox actually implements seeking over the network, which in the case of Ogg can require multiple jumps back and forth on the remote video file with byte range requests to locate the correct offset location.
To reduce as much as possible the effort that Firefox has to make with seeking, I referred to Mozilla’s very useful help page to speed up video. It is recommended to deliver the X-Content-Duration HTTP header from your Web server. For Ogg media, this can be provided through the oggz-chop CGI. Since I didn’t want to install it on my Apache server, I hard coded X-Content-Duration in a .htaccess file in the directory that serves the media file. The .htaccess file looks as follows:
<Files "elephant.ogv">
Header set X-Content-Duration "653.791"
</Files>
This should now help Firefox to avoid the extra seek necessary to determine the video’s duration and display the transport bar faster.
I also added the @autobuffer attribute to the <video> element, which should make the complete video file available to the browser and thus speed up seeking enormously since it will not need to do any network requests and can just do it on the local file.
ToDos
This is only a first and very simple demo of media fragments and video. I have not made an effort to capture any errors or to parse a URL that is more complicated than simply containing “#t=”. Feel free to report any bugs to me in the comments or send me patches.
Also, I have not made an effort to use time ranges, which is part of the W3C Media Fragment spec. This should be simple to add, since it just requires to stop the video playback at the given end time.
Also, I have only implemented parsing of the most simple default time spec in seconds and fragments. None of the more complicated npt, smpte, or clock specifications have been implemented yet.
The possibilities for deeper access to video and for improved video accessibility with these URLs are vast. Just imagine hooking up the caption elements of e.g. an srt file with temporal hyperlinks and you can provide deep interaction between the video content and the captions. You could even drive this to the extreme and jump between single words if you mark up each with its time relationship. Happy experimenting!
UPDATE: I forgot to mention that it is really annoying that the video has to be re-loaded when the @src attribute is changed, even if only the hash changes. As support for media fragments is implemented in <video> and <audio> elements, it would be advantageous if the “load()” function checked whether only the hash changed and does not re-load the full resource in these cases.
Thanks go to Chris Double and Chris Pearce from Mozilla for their feedback and suggestions for improvement on an early version of this.
