Monthly Archives: November 2008

Whatever happened to the Innovation Review?

In September, as reported, Terry Cutler delivered an extensive review on the Australian innovation system and what should be done. But then, the worldwide economic downturn hit us, and nobody seems to think about creating incentives for Australian innovation any longer.

Where did it go? Nothing is happening on the innovation website.

In particular as a small business owner, I ask: what happened to incentives to innovate for SMEs? Before the review, the government as a precaution scrapped the existing Australian grant system for innovation – in particular the Commercial Ready grants. The idea was to have some leeway with building a new system. That seemed fair enough at the time – but it seems more and more that there will be no replacement for this scrapped funding. Is the government simply forgetting what they promised in view of other, more pressing needs? Do we need to remind them?

Terry Cutler in a recent interview with SmartCompany paints a dark picture for the Australian future. He is aware that commercial innovation originates from SMEs and is fearful that by dropping all support for SMEs, the whole SME technology sector will be wiped out, with a negative influence also on larger companies and Australia will fall so far behind in innovation that it may be hard to ever catch up (remember what we did with our advantage in creating computer hardware at the time of CSIRAC?).

The Commercial Ready grants and other innovation funding were abruptly scrapped in May 2008. It is now more than 6 months later and we haven’t seen an indication of anything new to replace them. In the midst of the global crisis, how can we make this important cause heard?

Embedding time-aligned text into Ogg

As part of my accessibility work for Mozilla and Xiph, it is necessary to define how time-aligned text such as subtitles, captions, or annotations, are encapsulated into Ogg. In the fansubber community this is called “hard subtitles” as opposed to “soft subtitles” which are subtitles that stay in a text file and are loaded separately to the video file into a media player and synchronised with the video by the media player. (as per comment below, all text annotations are “soft” – or also “closed”.)

I can hear you ask: so how do I do subtitles/captions with Ogg now? Well, it would have been possible to simply choose one subtitling format and map that into Ogg, then ask everyone to just use that one format and be done. But which one to choose? And why prefer a simpler one over a more complex one? And why just do subtitles and not any other time-aligned text?

So, instead, I analysed what types of time-aligned text “codecs” I have come across. Each one would have a multitude of text formats to capture the text data, because it is easy to invent a new format and standardisation hasn’t really happened in this space yet.

I have come up with the following list of typical time-aligned text codecs:

  • CC: closed captions (for the deaf)
  • SUB: subtitles
  • TAD: textual audio descriptions (for the blind – to be transferred to braille or TTS)
  • KTV: karaoke
  • TIK: ticker text
  • AR: active regions
  • NB: metadata & semantic annotations
  • TRX: transcripts / scripts
  • LRC: lyrics
  • LIN: linguistic markup
  • CUE: cue points, DVD style chapter markers and similar navigational landmarks

Let me know if you can think of any other classes of video/audio-related time-aligned text.

All of these texts can be represented in text files with some kind of time marker, and possibly some header information to set up the interpretation environment. So, the simplest way of creating a representation of these inside Ogg was to define a generic mapping for time-aligned text into Ogg.

The Xiph wiki holds the current draft specification for mapping text codecs into Ogg. For anyone wanting to map a text codec into Ogg, this should provide the framework. The idea is to separate the text codec’s data into header data and into timed text segments (which can have all sorts of styling and other information with it). Then, the mapping is simple. An example for srt is described on the wiki page.

The specification is still in draft status, because we’re still expecting feedback. In fact, what we now need is people trying an implementation and providing fixes to the specification.

To map your text codec of choice into Ogg, you will probably requrie further mapping specifications. Dependent on how complex your text codec of choice is, these additional mapping specifications may be rather simple or quite complicated. In the case of srt, it should be trivial. Considering the massive amount of srt already freely available online, the srt mapping may well have a really large impact. Enough hits. Let me know if you’re coding up something!

My next duty is to look for a representation that is generic enough to provide representations for any of the above listed text codecs. This representation is what will need to be available to a Web Browser when working with a Web video that has related text. Current contenders are OggKate and W3C TimedText, but I am not sure if either are too restrictive. I am indeed looking for the next generation of captioning technology that will be able to provide any type of time-aligned text that relates to audio/video.

“Sorry, this video is no longer available”

Recently, I noticed an increasing number of videos on YouTube were no longer available – even if they had just been shared through a blog post by friends or even if they were the main video on a producer’s YouTube page, such as QuantumOfSolace.

I was suspicious for a while that there was something wrong with my browser, but when my colleague was able to play the video from the same network and I wasn’t, something had to be done.

I am running Firefox 3.0.4 on OS X 10.5.5 with the Flash 10.0 d26 plugin. First we thought it might be blocked for au.youtube.com and not for www.youtube.com, but there was no difference. Still the same “Sorry, this video is no longer available”.

Finally, an installation of the latest Flash plugin 10.0 r12 fixed the issue. So, if a large number of videos on YouTube isn’t available to you for no apparent reason, you might want to upgrade your Flash plugin.

News from the open media world

Today, there were so many news that I can only summarise them in a short post.

The guys from Collabora have announced that they are going to support the development of PiTiVi – one of the best open source video editors around. They are even looking to hire people to help Christian Schaller, the author of PiTiVi. The plan is to have a feature-rich video editor ready by April next year that is comparable in quality to basic proprietary video editors.

The BBC Dirac team have today announced a ffmpeg2dirac software package, which is built along the same lines as the commonly used ffmpeg2theora and of course transcodes any media stream to Ogg Dirac/Vorbis. With Ogg Dirac/Vorbis playback already available in vlc and mplayer, this covers the much needed creation side of Ogg Dirac/Vorbis files. Dirac is an open source, non-patent-encumbered video codec developed by the BBC. It creates higher quality video than Theora at comparable bitrates.

The FOMSFoundations of Open Media Software hacker workshop for open media software announced today the current list of confirmed participants for the January Workshop. It seems that this year we have a big focus on open video codecs, on browser support of media, on open Flash software, and on media frameworks. It is still possible to take part in the workshop – check out the CFP page.

Finally an important security message: Mozilla has decided to put a security measure around the HTML5 audio and video elements that will stop them from being exploited by cross-site scripting exploits. Chris Double explains the changes that are necessary to your setup to enable your published audio or video to be displayed on domains that are different to the domain on which these files are hosted.

Media fragment URI addressing

In the media fragment working group at the W3C, we are introducing a standard means to address fragments of media resources through URIs. The idea is to define URIs such as http://example.com/video.ogv#t=24m16s-30m12s, which would only retrieve the subpart of video.ogv that is of interest to the user and thus save bandwidth. This is particularly important for mobile devices, but also for pointing out highlights in videos on the Web, bookmarking, and other use cases.

I’d like to give a brief look into the state of discussion from a technical viewpoint here.

Let’s start by considering the protocols for which such a scheme could be defined. We are currently focusing on HTTP and RTSP, since they are open protocols for media delivery. P2P protocols are also under consideration, however, most of them are proprietary. Also, p2p protocols are mostly used to transfer complete large files, so fragment addressing may not be desired. RSTP already has a mechanism to address temporal fragments of media resources through a range parameter of the play request as part of the protocol parameters. Yet, there is no URI addressing scheme for this. Our key focus however is HTTP, since most video content nowadays is transferred over HTTP, e.g. YouTube.

Another topic that needs discussion are the types of fragmentation for which we will specify addressing schemes. At the moment, we are considering temporal fragmentation, spatial fragmentation, and fragmentation by tracks. In temporal fragmentation, a request asks for a time interval that is a subpart of the media resource (e.g. audio or video). In spatial fragmentation, the request is for an image region (e.g. in an image or a video). Track fragmentation addresses the issue where, e.g. a blind person would not require to receive the actual video data for a video and thus a user agent could request only the data tracks from the resource that are really required for the user.

Another concern is the syntax of URI addressing. URI fragments (“#”) have been invented to created URIs that point at so-called “secondary” resources. Per definition, a secondary resource may be some portion or subset of the primary resource, some view on representations of the primary resource, or some other resource defined or described by those representations. It is therefore the perfect syntax for media fragment URIs.

The only issue is that URI fragments (“#”) are not expected to be transferred from the client to the server (e.g. Apache strips it off the URI if it receives it). Therefore, in the temporal URI specification of Annodex we decided to use the query (“?”) parameter instead. This is however not necessary. The W3C working group is proposing to have the user agent strip off the URI fragment specification and transform it into a protocol parameter. For HTTP, the idea is to introduce new range units for the types of fragmentation that we will define. Then, the Range and Content-Range headers can be used to request and deliver the information about the fragmentation.

The most complicated issue that we are dealing with is the issue of caching in Web proxies. Existing Web proxies will not be able to understand new range units and will therefore not cache such requests. This is unfortunate and we are trying to devise two schemes – one for existing Web proxies and one for future more intelligent Web proxies – to enable proxy caching. This discussion has many dimensions – such as e.g. the ability to uniquely map time to bytes for any codec format, the ability to recompose new fragment requests from existing combined fragment requests, or the need and abilities for partial re-encoding. Mostly we are dealing with the complexities and restrictions of different codecs and encapsulation formats. Possibly, the idea of recomposition of ranges in Web proxies is too complex to realise and caching is best done by regarding each fragment as its own cacheable resource, but this hasn’t been decided yet.

We now have experts from the squid community, from YouTube/Google, HTTP experts, Web accessibility experts, SMIL experts, me from Annodex/Xiph, and a more people with diverse media backgrounds in the team. It’s a great group and we are covering the issues from all aspects. The brief update above is given from my perspective, and only lists the key issues superficially, while the discussions that we’re having on the mailing list and in meetings are much more in-depth.

I am not quite expecting us to meet the deadline of having a first working draft before the end of this month, but certainly before Christmas.

Theora 1.0 released!

While the open source codec “Theora” has been available since 2004 in a stable format, the open source community is very careful about giving any piece of software the “1.0” stamp of quality and libtheora has been put under scrutiny for years.

Today, libtheora 1.0 was finally released – rejoice and go ahead using it in production!

More hard-core improvements to libtheora are also in the pipeline under a version nick-named “Thusnelda”, improving mostly on quality and bit-rate.