ginger's thoughts

Silvia's blog

HTML5 Media and Accessibility presentation

Posted in Digital Media, Open Source, open codecs, standards, video accessibility by silvia on the February 23rd, 2010

Today, I was invited to give a talk at my old workplace CSIRO about the HTML5 media elements and accessibility.

A lot of the things that have gone into Ogg and that are now being worked on in the W3C in different working groups – including the Media Fragments and HTML5 WGs – were also of concern in the Annodex project that I worked on while at CSIRO. So I was rather excited to be able to report back about the current status in HTML5 and where we’re at with accessibility features.

Check out the presentation here. It contains a good collection of links to exciting demos of what is possible with the new HTML5 media elements when combined with other HTML features.

I tried something now with this presentation: I wrote it in a tool called S5, which makes use only of HTML features for the presentation. It was quite a bit slower than I expected, e.g. reloading a page always included having to navigate to that page. Also, it’s not easily possible to do drawings, unless you are willing to code them all up in HTML. But otherwise I have found it very useful for, in particular, including all the used URLs and video element demos directly in the slides. I was inspired with using this tool by Chris Double’s slides from LCA about implementing HTML 5 video in Firefox.

Google’s challenges of freeing VP8

Posted in Digital Media, Open Source, open codecs, standards by silvia on the February 20th, 2010

Since On2 Technology’s stockholders have approved the merger with Google, there are now first requests to Google to open up VP8.

I am sure Google is thinking about it. But … what does “it” mean?

Freeing VP8
Simply open sourcing it and making it available under a free license doesn’t help. That just provides open source code for a codec where relevant patents are held by a commercial entity and any other entity using it would still need to be afraid of using that technology, even if it’s use is free.

So, Google has to make the patents that relate to VP8 available under an irrevocable, royalty-free license for the VP8 open source base, but also for any independent implementations of VP8. This at least guarantees to any commercial entity that Google will not pursue them over VP8 related patents.

Now, this doesn’t mean that there are no submarine or unknown patents that VP8 infringes on. So, Google needs to also undertake an intensive patent search on VP8 to be able to at least convince themselves that their technology is not infringing on anyone else’s. For others to gain that confidence, Google would then further have to indemnify anyone who is making use of VP8 for any potential patent infringement.

I believe – from what I have seen in the discussions at the W3C – it would only be that last step that will make companies such as Apple have the confidence to adopt a “free” codec.

An alternative to providing indemnification is the standardisation of VP8 through an accepted video standardisation body. That would probably need to be ISO/MPEG or SMPTE, because that’s where other video standards have emerged and there are a sufficient number of video codec patent holders involved that a royalty-free publication of the standard will hold a sufficient number of patent holders “under control”. However, such a standardisation process takes a long time. For HTML5, it may be too late.

Technology Challenges
Also, let’s not forget that VP8 is just a video codec. A video codec alone does not encode a video. There is a need for an audio codec and a encapsulation format. In the interest of staying all open, Google would need to pick Vorbis as the audio codec to go with VP8. Then there would be the need to put Vorbis and VP8 in a container together – this could be Ogg or MPEG or QuickTime’s MOOV. So, apart from all the legal challenges, there are also technology challenges that need to be mastered.

It’s not simple to introduce a “free codec” and it will take time!

Google and Theora
There is actually something that Google should do before they start on the path of making VP8 available “for free”: They should formulate a new license agreement with Xiph (and the world) over VP3 and Theora. Right now, the existing license that was provided by On2 Technologies to Theora (link is to an early version of On2’s open source license of VP3) was only for the codebase of VP3 and any modifications of it, but doesn’t in an obvious way apply to an independent re-implementations of VP3/Theora. The new agreement between Google and Xiph should be about the patents and not about the source code. (UPDATE: The actual agreement with Xiph apparently also covers re-implementations – see comments below.)

That would put Theora in a better position to be universally acceptable as a baseline codec for HTML5. It would allow, e.g. Apple to make their own implementation of Theora – which is probably what they would want for ipods and iphones. Since Firefox, Chrome, and Opera already support Ogg Theora in their browsers using the on2 licensed codebase, they must have decided that the risk of submarine patents is low. So, presumably, Apple can come to the same conclusion.

Free codecs roadmap
I see this as the easiest path towards getting a universally acceptable free codec. Over time then, as VP8 develops into a free codec, it could become the successor of Theora on a path to higher quality video. And later still, when the Internet will handle large resolution video, we can move on to the BBC’s Dirac/VC2 codec. It’s where the future is. The present is more likely here and now in Theora.


ADDITION:
Please note the comments from Monty from Xiph and from Dan, ex-On2, about the intent that VP3 was to be completely put into the hands of the community. Also, Monty notes that in order to implement VP3, you do not actually need any On2 patents. So, there is probably not a need for Google to refresh that commitment. Though it might be good to reconfirm that commitment.

Accessibility support in Ogg and liboggplay

Posted in Digital Media, FOMS, LCA, Open Source, code, open codecs, standards, video accessibility by silvia on the February 19th, 2010

At the recent FOMS/LCA in Wellington, New Zealand, we talked a lot about how Ogg could support accessibility. Technically, this means support for multiple text tracks (subtitles/captions), multiple audio tracks (audio descriptions parallel to main audio track), and multiple video tracks (sign language video parallel to main video track).

Creating multitrack Ogg files
The creation of multitrack Ogg files is already possible using one of the muxing applications, e.g. oggz-merge. For example, I have my own little collection of multitrack Ogg files at http://annodex.net/~silvia/itext/elephants_dream/multitrack/. But then you are stranded with files that no player will play back.

Multitrack Ogg in Players
As Ogg is now being used in multiple Web browsers in the new HTML5 media formats, there are in particular requirements for accessibility support for the hard-of-hearing and vision-impaired. Either multitrack Ogg needs to become more of a common case, or the association of external media files that provide synchronised accessibility data (captions, audio descriptions, sign language) to the main media file needs to become a standard in HTML5.

As it turn out, both these approaches are being considered and worked on in the W3C. Accessibility data that are audio or video tracks will in the near future have to come out of the media resource itself, but captions and other text tracks will also be available from external associated elements.

The availability of internal accessibility tracks in Ogg is a new use case – something Ogg has been ready to do, but has not gone into common usage. MPEG files on the other hand have for a long time been used with internal accessibility tracks and thus frameworks and players are in place to decode such tracks and do something sensible with them. This is not so much the case for Ogg.

For example, a current VLC build installed on Windows will display captions, because Ogg Kate support is activated. A current VLC build on any other platform, however, has Ogg Kate support deactivated in the build, so captions won’t display. This will hopefully change soon, but we have to look also beyond players and into media frameworks – in particular those that are being used by the browser vendors to provide Ogg support.

Multitrack Ogg in Browsers
Hopefully gstreamer (which is what Opera uses for Ogg support) and ffmpeg (which is what Chrome uses for Ogg support) will expose all available tracks to the browser so they can expose them to the user for turning on and off. Incidentally, a multitrack media JavaScript API is in development in the W3C HTML5 Accessibility Task Force for allowing such control.

The current version of Firefox uses liboggplay for Ogg support, but liboggplay’s multitrack support has been sketchy this far. So, Viktor Gal – the liboggplay maintainer – and I sat down at FOMS/LCA to discuss this and Viktor developed some patches to make the demo player in the liboggplay package, the glut-player, support the accessibility use cases.

I applied Viktor’s patch to my local copy of liboggplay and I am very excited to show you the screencast of glut-player playing back a video file with an audio description track and an English caption track all in sync:

Further developments
There are still important questions open: for example, how will a player know that an audio description track is to be played together with the main audio track, but a dub track (e.g. a German dub for an English video) is to be played as an alternative. Such metadata for the tracks is something that Ogg is still missing, but that Ogg can be extended with fairly easily through the use of the Skeleton track. It is something the Xiph community is now working on.

Summary
This is great progress towards accessibility support in Ogg and therefore in Web browsers. And there is more to come soon.

Audio Track Accessibility for HTML5

Posted in Digital Media, FOMS, Open Source, code, open codecs, standards, video accessibility by silvia on the February 12th, 2010

I have talked a lot about synchronising multiple tracks of audio and video content recently. The reason was mainly that I foresee a need for more than two parallel audio and video tracks, such as audio descriptions for the vision-impaired or dub tracks for internationalisation, as well as sign language tracks for the hard-of-hearing.

It is almost impossible to introduce a good scheme to deliver the right video composition to a target audience. Common people will prefer bare a/v, vision-impaired would probably prefer only audio plus audio descriptions (but will probably take the video), and the hard-of-hearing will prefer video plus captions and possibly a sign language track . While it is possible to dynamically create files that contain such tracks on a server and then deliver the right composition, implementation of such a server method has not been very successful in the last years and it would likely take many years to roll out such new infrastructure.

So, the only other option we have is to synchronise completely separate media resource together as they are selected by the audience.

It is this need that this HTML5 accessibility demo is about: Check out the demo of multiple media resource synchronisation.

I created a Ogg video with only a video track (10m53s750). Then I created an audio track that is the original English audio track (10m53s696). Then I used a Spanish dub track that I found through BlenderNation as an alternative audio track (10m58s337). Lastly, I created an audio description track in the original language (10m53s706). This creates a video track with three optional audio tracks.

I took away all native controls from these elements when using the HTML5 audio and video tag and ran my own stop/play and seeking approaches, which handled all media elements in one go.

I was mostly interested in the quality of this experience. Would the different media files stay mostly in sync? They are normally decoded in different threads, so how big would the drift be?

The resulting page is the basis for such experiments with synchronisation.

The page prints the current playback position in all of the media files at a constant interval of 500ms. Note that when you pause and then play again, I am re-synching the audio tracks with the video track, but not when you just let the files play through.

I have let the files play through on my rather busy Macbook and have achieved the following interesting drift over the course of about 9 minutes:

Drift between multiple parallel played media elements

You will see that the video was the slowest, only doing roughly 540s, while the Spanish dub did 560s in the same time.

To fix such drifts, you can always include regular re-synchronisation points into the video playback. For example, you could set a timeout on the playback to re-sync every 500ms. Within such a short time, it is almost impossible to notice a drift. Don’t re-load the video, because it will lead to visual artifacts. But do use the video’s currentTime to re-set the others. (UPDATE: Actually, it depends on your situation, which track is the best choice as the main timeline. See also comments below.)

It is a workable way of associating random numbers of media tracks with videos, in particular in situations where the creation of merged files cannot easily be included in a workflow.

Tutorial on HTML5 open video at LCA 2010

Posted in Digital Media, LCA, Open Source, code, open codecs, standards, video accessibility by silvia on the January 26th, 2010

During last week’s LCA, Jan Gerber, Michael Dale and I gave a 3 hour tutorial on how to publish HTML5 video in an open format.

We basically taught people how to create and publish Ogg Theora video in HTML5 Web pages and how to make them work across browsers, including much of the available tools and libraries. We’re hoping that some people will have learnt enough to include modules in CMSes such as Drupal, Joomla and Wordpress, which will easily support the publishing of Ogg Theora.

I have been asked to share the material that we used. It consists of:

Note that if you would like to walk through the exercises, you should install the following software beforehand:

You might need to look for packages of your favourite OS (e.g. Windows or Mac, Ubuntu or Debian).

The exercises include:

  • creating a Ogg video from an editor
  • transcoding a video using http://firefogg.org/
  • creating a poster image using OggThumb
  • writing a first HTML5 video Web page with Ogg Theora
  • publishing it on a Web Server, with correct MIME type & Duration hint
  • writing a second HTML5 video Web page with Ogg Theora & MP4 to cover Safari/Webkit
  • transcoding using ffmpeg2theora in a script
  • writing a third HTML5 video Web page with Cortado fallback
  • writing a fourth Web page using “Video for Everybody”
  • writing a fifth Web page using “mwEmbed”
  • writing a sixth Web page using firefogg for transcoding before upload
  • and a seventh one with a progress bar
  • encoding srt subtitles into an Ogg Kate track
  • writing an eighth Web page using cortado to display the Ogg Kate track

For those that would like to see the slides here immediately, a special flash embed:

Enjoy!

Video Streaming from Linux.conf.au

Posted in Digital Media, FOMS, LCA, Open Source, open codecs, standards by silvia on the January 18th, 2010

You probably heard it already: Linux.conf.au is live streaming its video in a Microsoft proprietary format.

Fortunately, there is now a re-broadcast that you can get in an open format from http://stream.v2v.cc:8000/ . It comes from a server in Europe, but relies on transcoding here in New Zealand, so it may not be completely reliable.

UPDATE: A second server is now also available from the US at http://repeater.xiph.org:8000/.

Today, the down under open source / Linux conference linux.conf.au in Wellington started with the announcement that every talk and mini-conf will be live streamed to the Internet and later published online. That’s an awesome achievement!

However, minutes after the announcement, I was very disappointed to find out that the streams are actually provided in a proprietary format and through a proprietary streaming protocol: a Microsoft streaming service that provides Windows media streams.

Why stream an open source conference in a proprietary format with proprietary software? If we cannot use our own technologies for our own conferences, how will we get the rest of the world to use them?

I must say, I am personally embarrassed, because I was part of several audio/video teams of previous LCAs that have managed to record and stream content in open formats and with open media software. I would have helped get this going, but wasn’t aware of the situation.

I am also the main organiser of the FOMS Workshop (Foundations of Open Media Software) that ran the week before LCA and brought some of the core programmers in open media software into Wellington, most of which are also attending LCA. We have the brains here and should be able to get this going.

Fortunately, the published content will be made available in Ogg Theora/Vorbis. So, it’s only the publicly available stream that I am concerned about.

Speaking with the organisers, I can somewhat understand how this came to be. They took the “easy” way of delegating the video work to an external company. Even though this company is an expert in open source and networking, their media streaming customers are all using Flash or Windows media software, which are current de-facto standards and provide extra features such as DRM. It seems apart from linux.conf.au there were no requests on them for streaming Ogg Theora/Vorbis yet. Their existing infrastructure includes CDN distribution and CDN providers certainly typically don’t provide Ogg Theora/Vorbis support or Icecast streaming.

So, this is actually a problem founded in setting up streaming through a professional service rather than through the community. The way in which this was set up at other events was to get together a group of volunteers that provided streaming reflectors for free. In this way, a community-created CDN is built that can deal with the streams. That there are no professional CDN providers available yet that provide Icecast support is a sign that there is a gap in the market.

But phear not – a few of the FOMS folk got together to fix the situation.

It involved setting up Icecast streams for each room’s video stream. Since there is no access to the raw video stream, there is a need to transcode the video from proprietary codecs to the open Ogg Theora/Vorbis format.

To do this legally, a purchase of the codec libraries from Fluendo was necessary, which cost a whopping EURO 28 and covers all the necessary patent licenses. The glue to get the videos from mms to icecast streams is a GStreamer pipeline which I leave others to talk about.

Now, we have all the streams from the conference available as Ogg Theora/Video streams, we can also publish them in HTML5 video elements. Check out this Web page which has all the video streams together on a single page. Note that the connections may be a bit dodgy and some drop-outs may occur.

Further, let me recommend the Multimedia Miniconf at linux.conf.au, which will take place tomorrow, Tuesday 19th January. The Miniconf has decided to add a talk about “How to stream you conference with open codecs” to help educate any potential future conference organisers and point out the software that helps solve these issues.

UPDATE: I should have stated that I didn’t actually do any of the technical work: it was all done by Ralph Giles, Jan Gerber, and Jan Schmidt.

FOMS and LCA Multimedia Miniconf

Posted in Digital Media, FOMS, LCA, Open Source, open codecs, standards, video accessibility by silvia on the November 11th, 2009

If you haven’t proposed a presentation yet, got ahead and register yourself for:

FOMS (Foundations of Open Media Software workshop) at
http://www.foms-workshop.org/foms2010/pmwiki.php/Main/CFP

LCA Multimedia Miniconf at
http://www.annodex.org/events/lca2010_mmm/pmwiki.php/Main/CallForP

It’s already November and there’s only Christmas between now and the conferences!

I’m personally hoping for many discussions about HTML5 <video> and <audio>, including what to do with multitrack files, with cue ranges, and captions. These should also be relevant to other open media frameworks – e.g. how should we all handle multitrack sign language tracks?

But there are heaps of other topics to discuss and anyone doing any work with open media software will find a fruitful discussions at FOMS.

Cortado 0.5.0 released

Posted in Digital Media, Open Source, open codecs by silvia on the October 28th, 2009

Cortado is a java applet that provides support for Ogg Theora/Vorbis to Web publishers. It’s particularly useful to publishers that want to use Ogg Theora/Vorbis in Browsers that do not yet support the HTML5 video element with Ogg.

Cortado was originally developed by Fluendo SA under a LGPL license and contains a re-implementation of Theora and Vorbis in Java (jheora and jcraft). After a few years of low maintenance, the Wikimedia Foundation took it in their hands to undust the code for their use in the Wikimedia Commons, where only unencumberd open video format are acceptable.

As Ralph states in his announcement of the new release: earlier this year, Xiph.org took over maintenance of the Cortado java applet to help concentrate interest and expertise on this important component of the free media codec infrastructure. Therefore, the official website for Cortado is as now part of the Xiph. [If somebody could update the Wikipedia article - that would be awesome!]

So, I am very happy to point to the first Cortado release in three years. Source and sample builds are available from the Xiph.org download site.

Ralph writes further:

The new version is tagged 0.5.0 to indicate both the change in hosting and the significant new support for files from the new libtheora encoder implementation and Kate embedded subtitles.

In particular, 0.5.0 has:

  • Support for files encoded with Theora 1.1
  • Faster YUV to RGB conversion with better results
  • Basic support for embedded Ogg Kate streams
  • Seeking fixed for files with an Ogg Skeleton track
  • Maintained compatibility with the Microsoft VM

This is an awesome example of the power of open source and what a group of people can achieve. Congratulations to everyone at Xiph, Wikipedia, and anyone else who contributed to the release!

MySQL, Snow Leopard and ruby

Posted in Open Source, rails, random by silvia on the October 13th, 2009

I got a shiny new MacBook Pro on the weekend, yay! After months of complaining about the slowness and the heat evaporating from my old Macbook, I’m finally off to better grounds.

But then there was the annoying task of setting up the machine with all the software that I’m using. MySQL and ruby turned out to be particular problems. I installed MySQL for 10.5, since MySQL haven’t published one for OS 10.6 yet. I ran “gem install mysql”. And then the pain started.

I got all the errors that were reported elsewhere:
uninitialized constant MysqlCompat::MysqlRes” and “undefined method `real_connect’ for Mysql:Class (NoMethodError)“. I tried all the suggestions – including:
"sudo env ARCHFLAGS="-arch x86" gem install mysql -- --with-mysql-config=/usr/local/mysql-5.1.39-osx10.5-x86/bin/mysql_config -V --debug, but just couldn’t get there.

My laptop reports in the System Software Overview: “64-bit Kernel and Extensions: No”, so I assumed I had to use the 32 bit versions. However, that was a wrong assumption. Even though my kernel seems to be 32 bit, applications seem to be 64 bit.

So, eventually I re-installed MySQL for Mac OS X 10.5 (x86_64) and ran the correct gem install command:
sudo env ARCHFLAGS="-arch x86_64" gem install mysql -- --with-mysql-config=/usr/local/mysql-5.1.39-osx10.5-x86 and things were fine.

Additionally, there was some fighting with the PrefPane and re-starting mysql. I had to kill it manually and I had to install the updated PrefPane of Swoon dot net to make it work.

Hope this helps somebody avoid the same pain!

Web Directions South 2009 talk on HTML5 video

Posted in Digital Media, Open Source, open codecs, standards, video accessibility by silvia on the October 9th, 2009

Yesterday, I gave a talk on the HTML5 video element at Web Directions South.

The title was “Taking HTML5 <video> a step further” and the abstract was provided goes as follows:

This talk focuses on the efforts engaged by W3C to improve the new HTML 5 media elements with mechanisms to allow people to access multimedia content, including audio and video. Such developments are also useful beyond accessibility needs and will lead to a general improvement of the usability of media, making media discoverable and generally a prime citizen on the Web.

Silvia will discuss what is currently technically possible with the HTML5 media elements, and what is still missing. She will describe a general framework of accessibility for HTML5 media elements and present her work for the Mozilla Corporation that includes captions, subtitles, textual audio annotations, timed metadata, and other time-aligned text with the HTML5 media elements. Silvia will also discuss work of the W3C Media Fragments group to further enhance video usability and accessibility by making it possible to directly address temporal offsets in video, as well as spatial areas and tracks.

Here are my slides:

Download the pdf from here.

There was also a video recording and I will add that here as soon as it is published.

UPDATE:
The video is available on Tinyvid:

I’m not going to try and upload this 50min long video to YouTube – with it’s 10 min limit, I won’t get very far.

Next Page »