Accessibility support in Ogg and liboggplay
At the recent FOMS/LCA in Wellington, New Zealand, we talked a lot about how Ogg could support accessibility. Technically, this means support for multiple text tracks (subtitles/captions), multiple audio tracks (audio descriptions parallel to main audio track), and multiple video tracks (sign language video parallel to main video track).
Creating multitrack Ogg files
The creation of multitrack Ogg files is already possible using one of the muxing applications, e.g. oggz-merge. For example, I have my own little collection of multitrack Ogg files at http://annodex.net/~silvia/itext/elephants_dream/multitrack/. But then you are stranded with files that no player will play back.
Multitrack Ogg in Players
As Ogg is now being used in multiple Web browsers in the new HTML5 media formats, there are in particular requirements for accessibility support for the hard-of-hearing and vision-impaired. Either multitrack Ogg needs to become more of a common case, or the association of external media files that provide synchronised accessibility data (captions, audio descriptions, sign language) to the main media file needs to become a standard in HTML5.
As it turn out, both these approaches are being considered and worked on in the W3C. Accessibility data that are audio or video tracks will in the near future have to come out of the media resource itself, but captions and other text tracks will also be available from external associated elements.
The availability of internal accessibility tracks in Ogg is a new use case – something Ogg has been ready to do, but has not gone into common usage. MPEG files on the other hand have for a long time been used with internal accessibility tracks and thus frameworks and players are in place to decode such tracks and do something sensible with them. This is not so much the case for Ogg.
For example, a current VLC build installed on Windows will display captions, because Ogg Kate support is activated. A current VLC build on any other platform, however, has Ogg Kate support deactivated in the build, so captions won’t display. This will hopefully change soon, but we have to look also beyond players and into media frameworks – in particular those that are being used by the browser vendors to provide Ogg support.
Multitrack Ogg in Browsers
Hopefully gstreamer (which is what Opera uses for Ogg support) and ffmpeg (which is what Chrome uses for Ogg support) will expose all available tracks to the browser so they can expose them to the user for turning on and off. Incidentally, a multitrack media JavaScript API is in development in the W3C HTML5 Accessibility Task Force for allowing such control.
The current version of Firefox uses liboggplay for Ogg support, but liboggplay’s multitrack support has been sketchy this far. So, Viktor Gal – the liboggplay maintainer – and I sat down at FOMS/LCA to discuss this and Viktor developed some patches to make the demo player in the liboggplay package, the glut-player, support the accessibility use cases.
I applied Viktor’s patch to my local copy of liboggplay and I am very excited to show you the screencast of glut-player playing back a video file with an audio description track and an English caption track all in sync:
Further developments
There are still important questions open: for example, how will a player know that an audio description track is to be played together with the main audio track, but a dub track (e.g. a German dub for an English video) is to be played as an alternative. Such metadata for the tracks is something that Ogg is still missing, but that Ogg can be extended with fairly easily through the use of the Skeleton track. It is something the Xiph community is now working on.
Summary
This is great progress towards accessibility support in Ogg and therefore in Web browsers. And there is more to come soon.
Audio Track Accessibility for HTML5
I have talked a lot about synchronising multiple tracks of audio and video content recently. The reason was mainly that I foresee a need for more than two parallel audio and video tracks, such as audio descriptions for the vision-impaired or dub tracks for internationalisation, as well as sign language tracks for the hard-of-hearing.
It is almost impossible to introduce a good scheme to deliver the right video composition to a target audience. Common people will prefer bare a/v, vision-impaired would probably prefer only audio plus audio descriptions (but will probably take the video), and the hard-of-hearing will prefer video plus captions and possibly a sign language track . While it is possible to dynamically create files that contain such tracks on a server and then deliver the right composition, implementation of such a server method has not been very successful in the last years and it would likely take many years to roll out such new infrastructure.
So, the only other option we have is to synchronise completely separate media resource together as they are selected by the audience.
It is this need that this HTML5 accessibility demo is about: Check out the demo of multiple media resource synchronisation.
I created a Ogg video with only a video track (10m53s750). Then I created an audio track that is the original English audio track (10m53s696). Then I used a Spanish dub track that I found through BlenderNation as an alternative audio track (10m58s337). Lastly, I created an audio description track in the original language (10m53s706). This creates a video track with three optional audio tracks.
I took away all native controls from these elements when using the HTML5 audio and video tag and ran my own stop/play and seeking approaches, which handled all media elements in one go.
I was mostly interested in the quality of this experience. Would the different media files stay mostly in sync? They are normally decoded in different threads, so how big would the drift be?
The resulting page is the basis for such experiments with synchronisation.
The page prints the current playback position in all of the media files at a constant interval of 500ms. Note that when you pause and then play again, I am re-synching the audio tracks with the video track, but not when you just let the files play through.
I have let the files play through on my rather busy Macbook and have achieved the following interesting drift over the course of about 9 minutes:
You will see that the video was the slowest, only doing roughly 540s, while the Spanish dub did 560s in the same time.
To fix such drifts, you can always include regular re-synchronisation points into the video playback. For example, you could set a timeout on the playback to re-sync every 500ms. Within such a short time, it is almost impossible to notice a drift. Don’t re-load the video, because it will lead to visual artifacts. But do use the video’s currentTime to re-set the others. (UPDATE: Actually, it depends on your situation, which track is the best choice as the main timeline. See also comments below.)
It is a workable way of associating random numbers of media tracks with videos, in particular in situations where the creation of merged files cannot easily be included in a workflow.
Video Streaming from Linux.conf.au
You probably heard it already: Linux.conf.au is live streaming its video in a Microsoft proprietary format.
Fortunately, there is now a re-broadcast that you can get in an open format from http://stream.v2v.cc:8000/ . It comes from a server in Europe, but relies on transcoding here in New Zealand, so it may not be completely reliable.
UPDATE: A second server is now also available from the US at http://repeater.xiph.org:8000/.
Today, the down under open source / Linux conference linux.conf.au in Wellington started with the announcement that every talk and mini-conf will be live streamed to the Internet and later published online. That’s an awesome achievement!
However, minutes after the announcement, I was very disappointed to find out that the streams are actually provided in a proprietary format and through a proprietary streaming protocol: a Microsoft streaming service that provides Windows media streams.
Why stream an open source conference in a proprietary format with proprietary software? If we cannot use our own technologies for our own conferences, how will we get the rest of the world to use them?
I must say, I am personally embarrassed, because I was part of several audio/video teams of previous LCAs that have managed to record and stream content in open formats and with open media software. I would have helped get this going, but wasn’t aware of the situation.
I am also the main organiser of the FOMS Workshop (Foundations of Open Media Software) that ran the week before LCA and brought some of the core programmers in open media software into Wellington, most of which are also attending LCA. We have the brains here and should be able to get this going.
Fortunately, the published content will be made available in Ogg Theora/Vorbis. So, it’s only the publicly available stream that I am concerned about.
Speaking with the organisers, I can somewhat understand how this came to be. They took the “easy” way of delegating the video work to an external company. Even though this company is an expert in open source and networking, their media streaming customers are all using Flash or Windows media software, which are current de-facto standards and provide extra features such as DRM. It seems apart from linux.conf.au there were no requests on them for streaming Ogg Theora/Vorbis yet. Their existing infrastructure includes CDN distribution and CDN providers certainly typically don’t provide Ogg Theora/Vorbis support or Icecast streaming.
So, this is actually a problem founded in setting up streaming through a professional service rather than through the community. The way in which this was set up at other events was to get together a group of volunteers that provided streaming reflectors for free. In this way, a community-created CDN is built that can deal with the streams. That there are no professional CDN providers available yet that provide Icecast support is a sign that there is a gap in the market.
But phear not – a few of the FOMS folk got together to fix the situation.
It involved setting up Icecast streams for each room’s video stream. Since there is no access to the raw video stream, there is a need to transcode the video from proprietary codecs to the open Ogg Theora/Vorbis format.
To do this legally, a purchase of the codec libraries from Fluendo was necessary, which cost a whopping EURO 28 and covers all the necessary patent licenses. The glue to get the videos from mms to icecast streams is a GStreamer pipeline which I leave others to talk about.
Now, we have all the streams from the conference available as Ogg Theora/Video streams, we can also publish them in HTML5 video elements. Check out this Web page which has all the video streams together on a single page. Note that the connections may be a bit dodgy and some drop-outs may occur.
Further, let me recommend the Multimedia Miniconf at linux.conf.au, which will take place tomorrow, Tuesday 19th January. The Miniconf has decided to add a talk about “How to stream you conference with open codecs” to help educate any potential future conference organisers and point out the software that helps solve these issues.
UPDATE: I should have stated that I didn’t actually do any of the technical work: it was all done by Ralph Giles, Jan Gerber, and Jan Schmidt.
FOMS and LCA Multimedia Miniconf
If you haven’t proposed a presentation yet, got ahead and register yourself for:
FOMS (Foundations of Open Media Software workshop) at
http://www.foms-workshop.org/foms2010/pmwiki.php/Main/CFP
LCA Multimedia Miniconf at
http://www.annodex.org/events/lca2010_mmm/pmwiki.php/Main/CallForP
It’s already November and there’s only Christmas between now and the conferences!
I’m personally hoping for many discussions about HTML5 <video> and <audio>, including what to do with multitrack files, with cue ranges, and captions. These should also be relevant to other open media frameworks – e.g. how should we all handle multitrack sign language tracks?
But there are heaps of other topics to discuss and anyone doing any work with open media software will find a fruitful discussions at FOMS.
First experiments with itext
My accessibility work for Mozilla is showing first results.
I have now implemented a demo for the previously proposed <itext> element. During the development process, the specification became more concrete.
I’m sure you’re keen to check out the demo.
Please note the following features of the demo:
- It experiments with four different types of time-aligned text: subtitles, captions, chapters, and textual audio annotations.
- It extends the video controls by a menu button for the time-aligned text tracks. This enables the user to switch between different languages for the different tracks.
- The textual audio annotations are mapped into an aria-live activated div element, such that they are indeed read out by screen-readers; this div sits behind the video, invisible to everyone else.
- The chapters are displayed as text on top of the video.
- The subtitles and captions are displayed as overlays at the bottom of the video.
- The display styles and positions are supposed to be default display mechanisms for these kinds of tracks, that could be overwritten by the stylesheet of a Web developer, who intends to place the text elsewhere on screen.
In order to “hear” the textual audio annotations work, you will need to install a screen reader such as JAWS, NVDA, or the firevox plugin on the Mac.
As far as I am aware, this is the first demo of HTML5 video accessibility that includes support for the vision-impaired, hearing-impaired, and also for foreign language speakers.
There have been initial discussions about this proposal, the results of which are captured in the wiki page. I expect a lot more heated discussion will happen on the WHATWG mailing list when I post it soon. I am well aware that probably most of the javascript API will need to be changed, and also some of the HTML.
Also please note that there are some bugs still left on the software, which should not inhibit the discussion at this stage. We will definitely develop a newer and better version.
I am particularly proud that I was able to make this work in the experimental builds of Opera and Chrome, as well as in Safari with XiphQT installed, and of course in Firefox 3.5.
YouTube Ogg Theora+Vorbis & H.263/H.264 comparison
On Jun 13th 2009 Chris DiBona of Google claimed on the WhatWG mailing list:
“If were to switch to theora and maintain even a semblance of the current youtube quality it would take up most available bandwidth across the Internet.”
Everyone who has ever encoded a Ogg Theora/Vorbis file and in parallel encoded one with another codec will have to immediately protest. It is sad that even the best people fall for FUD spread by the un-enlightened or the ones who have their own agenda.
Fortunately, Gregory Maxwell from Wikipedia came to the rescue and did an actual “YouTube / Ogg/Theora comparison”. It’s a good read and a comparison on one video. He has put his instructions there, so anyone can repeat it for themselves. You will have to start with a pretty good quality video though to see such differences.
Sites with Ogg in HTML5 video tag
Yesterday, somebody mentioned that the HTML5 video tag with Ogg Theora/Vorbis can be played back in Safari if you have XiphQT installed (btw: the 0.1.9 release of XiphQT is upcoming). So, today I thought I should give it a quick test. It indeed works straight through the QuickTime framework, so the player looks like a QuickTime player. So, by now, Firefox 3.5, Chrome, Safari with XiphQT, and experimental builds of Opera support Ogg Theora/Vorbis inside the HTML5 video tag. Now we just need somebody to write some ActiveX controls for the Xiph DirectShow Filters and it might even work in IE.
While doing my testing, I needed to go to some sites that actually use Ogg Theora/Vorbis in HTML5 video tags. Here is a list that I came up with in no particular order:
- Chris Double’s Tinyvid
- Dailymotion’s Open Video Demo (restricted to Firefox 3.5)
- Michael Dale and Aphid’s Metavid
- Archive.org’s videos
- Wikipedia’s videos
- the FOMS workshop videos
I’m sure there’s a lot more out there – feel free to post links in the comments.
Firefox plugin to encode Ogg video
Michael Dale just posted this to theora-dev. Go to one of the given URLs to install the Firefox plugin that lets you transcode video to Ogg using your Web browser.
Firefogg is developed by Jan Gerber and lives at http://www.firefogg.org/. There is a javascript API available so you can make use of Firefogg in your own Website project to allow people to upload any video and transcode it to Ogg on the fly.
Enjoy!
On Fri, Jun 5, 2009 at 7:08 AM, Michael Dale
> I mentioned it in the #theora channel a few days ago but here it is with
> a more permanent url:
>
> http://www.firefogg.org/make/advanced.html
> &
> http://www.firefogg.org/make/
>
> These will be simple links you can send people so that they can encode
> source footage to a local ogg video file with the latest and greatest
> ogg encoders (presently thusnelda and vorbis). Updates to thusnelda and
> possible other free codecs will be pushed out via firefogg updates ![]()
>
> Pass along any feedback if things break or what not.
>
> I am also doing testing with “embed” these encoder interface. For those
> familiar with jQuery: an example to rewrite all your file inputs with
> firefogg enhanced inputs: $(“input:[type='file']“).firefogg() … Feel
> free to expeirment based on those examples. The form rewrite has mostly
> only been tested in the mediaWiki context:
> http://sandbox.kaltura.com/testwiki/index.php/Special:Upload
> but with minor hacking should work elsewhere ![]()
>
> enjoy
> –michael
>
> _______________________________________________
> theora mailing list
> theora@xiph.org
> http://lists.xiph.org/mailman/listinfo/theora
>
FOMS 2009: video introductions available
In January this year we had the third Foundations of Open Media software workshop for developers. The focus this year was on legal issues around codecs, Xiph and Web video (HTML5 video and video servers), authoring/editing software, and accessibility. Check out the complete set of areas of concern and community goals that we decided upon.
As every year, at the beginning of the workshop every participant provided a 5 min introduction about their field of speciality and the current challenges. These are video recorded and shared with the community.
The videos and accompanying slides have been available for about 2 months now, but I haven’t gotten around to blogging about it – apologies everyone! So, here are your star videos in reverse alphabetic order published using open source video software only:
- Viktor Gal, Xiph / Annodex liboggplay
- Timothy Terriberry, Xiph – Theora codec
- Silvia Pfeiffer, Annodex/Xiph – video a11y
- Shane Stephens, Google – liboggplay>a
- Robin Gareus, linuxaudio.org
- Rob Savoye, Gnash
- Peter Ross, Xvid & FFMpeg
- Michael Dale, Wikipedia & Metavid
- Jan Gerber, Xiph hacker
- Edward Hervey, Collabora – PiTiVi
- Conrad Parker, Annodex/Xiph hacker
- Charles McCathieNevile, Opera
- Benjamin Otte, swfdec
- Anuradha Suraparaju, BBC – Dirac codec
Enjoy!
FFMPEG release
Quick Press: the awesome guys from FFmpeg have made an official release this week. The days of pain for compiling and packaging FFmpeg have come to an end. FFmpeg is being used in many Web video sites to provide backend transcoding – FAIK that includes YouTube. I use FFmpeg for all my transcoding needs and it has never let me down. Open media software to the win!

