HTML5 Media and Accessibility presentation
Today, I was invited to give a talk at my old workplace CSIRO about the HTML5 media elements and accessibility.
A lot of the things that have gone into Ogg and that are now being worked on in the W3C in different working groups – including the Media Fragments and HTML5 WGs – were also of concern in the Annodex project that I worked on while at CSIRO. So I was rather excited to be able to report back about the current status in HTML5 and where we’re at with accessibility features.
Check out the presentation here. It contains a good collection of links to exciting demos of what is possible with the new HTML5 media elements when combined with other HTML features.
I tried something now with this presentation: I wrote it in a tool called S5, which makes use only of HTML features for the presentation. It was quite a bit slower than I expected, e.g. reloading a page always included having to navigate to that page. Also, it’s not easily possible to do drawings, unless you are willing to code them all up in HTML. But otherwise I have found it very useful for, in particular, including all the used URLs and video element demos directly in the slides. I was inspired with using this tool by Chris Double’s slides from LCA about implementing HTML 5 video in Firefox.
Google’s challenges of freeing VP8
Since On2 Technology’s stockholders have approved the merger with Google, there are now first requests to Google to open up VP8.
I am sure Google is thinking about it. But … what does “it” mean?
Freeing VP8
Simply open sourcing it and making it available under a free license doesn’t help. That just provides open source code for a codec where relevant patents are held by a commercial entity and any other entity using it would still need to be afraid of using that technology, even if it’s use is free.
So, Google has to make the patents that relate to VP8 available under an irrevocable, royalty-free license for the VP8 open source base, but also for any independent implementations of VP8. This at least guarantees to any commercial entity that Google will not pursue them over VP8 related patents.
Now, this doesn’t mean that there are no submarine or unknown patents that VP8 infringes on. So, Google needs to also undertake an intensive patent search on VP8 to be able to at least convince themselves that their technology is not infringing on anyone else’s. For others to gain that confidence, Google would then further have to indemnify anyone who is making use of VP8 for any potential patent infringement.
I believe – from what I have seen in the discussions at the W3C – it would only be that last step that will make companies such as Apple have the confidence to adopt a “free” codec.
An alternative to providing indemnification is the standardisation of VP8 through an accepted video standardisation body. That would probably need to be ISO/MPEG or SMPTE, because that’s where other video standards have emerged and there are a sufficient number of video codec patent holders involved that a royalty-free publication of the standard will hold a sufficient number of patent holders “under control”. However, such a standardisation process takes a long time. For HTML5, it may be too late.
Technology Challenges
Also, let’s not forget that VP8 is just a video codec. A video codec alone does not encode a video. There is a need for an audio codec and a encapsulation format. In the interest of staying all open, Google would need to pick Vorbis as the audio codec to go with VP8. Then there would be the need to put Vorbis and VP8 in a container together – this could be Ogg or MPEG or QuickTime’s MOOV. So, apart from all the legal challenges, there are also technology challenges that need to be mastered.
It’s not simple to introduce a “free codec” and it will take time!
Google and Theora
There is actually something that Google should do before they start on the path of making VP8 available “for free”: They should formulate a new license agreement with Xiph (and the world) over VP3 and Theora. Right now, the existing license that was provided by On2 Technologies to Theora (link is to an early version of On2’s open source license of VP3) was only for the codebase of VP3 and any modifications of it, but doesn’t in an obvious way apply to an independent re-implementations of VP3/Theora. The new agreement between Google and Xiph should be about the patents and not about the source code. (UPDATE: The actual agreement with Xiph apparently also covers re-implementations – see comments below.)
That would put Theora in a better position to be universally acceptable as a baseline codec for HTML5. It would allow, e.g. Apple to make their own implementation of Theora – which is probably what they would want for ipods and iphones. Since Firefox, Chrome, and Opera already support Ogg Theora in their browsers using the on2 licensed codebase, they must have decided that the risk of submarine patents is low. So, presumably, Apple can come to the same conclusion.
Free codecs roadmap
I see this as the easiest path towards getting a universally acceptable free codec. Over time then, as VP8 develops into a free codec, it could become the successor of Theora on a path to higher quality video. And later still, when the Internet will handle large resolution video, we can move on to the BBC’s Dirac/VC2 codec. It’s where the future is. The present is more likely here and now in Theora.
ADDITION:
Please note the comments from Monty from Xiph and from Dan, ex-On2, about the intent that VP3 was to be completely put into the hands of the community. Also, Monty notes that in order to implement VP3, you do not actually need any On2 patents. So, there is probably not a need for Google to refresh that commitment. Though it might be good to reconfirm that commitment.
Accessibility support in Ogg and liboggplay
At the recent FOMS/LCA in Wellington, New Zealand, we talked a lot about how Ogg could support accessibility. Technically, this means support for multiple text tracks (subtitles/captions), multiple audio tracks (audio descriptions parallel to main audio track), and multiple video tracks (sign language video parallel to main video track).
Creating multitrack Ogg files
The creation of multitrack Ogg files is already possible using one of the muxing applications, e.g. oggz-merge. For example, I have my own little collection of multitrack Ogg files at http://annodex.net/~silvia/itext/elephants_dream/multitrack/. But then you are stranded with files that no player will play back.
Multitrack Ogg in Players
As Ogg is now being used in multiple Web browsers in the new HTML5 media formats, there are in particular requirements for accessibility support for the hard-of-hearing and vision-impaired. Either multitrack Ogg needs to become more of a common case, or the association of external media files that provide synchronised accessibility data (captions, audio descriptions, sign language) to the main media file needs to become a standard in HTML5.
As it turn out, both these approaches are being considered and worked on in the W3C. Accessibility data that are audio or video tracks will in the near future have to come out of the media resource itself, but captions and other text tracks will also be available from external associated elements.
The availability of internal accessibility tracks in Ogg is a new use case – something Ogg has been ready to do, but has not gone into common usage. MPEG files on the other hand have for a long time been used with internal accessibility tracks and thus frameworks and players are in place to decode such tracks and do something sensible with them. This is not so much the case for Ogg.
For example, a current VLC build installed on Windows will display captions, because Ogg Kate support is activated. A current VLC build on any other platform, however, has Ogg Kate support deactivated in the build, so captions won’t display. This will hopefully change soon, but we have to look also beyond players and into media frameworks – in particular those that are being used by the browser vendors to provide Ogg support.
Multitrack Ogg in Browsers
Hopefully gstreamer (which is what Opera uses for Ogg support) and ffmpeg (which is what Chrome uses for Ogg support) will expose all available tracks to the browser so they can expose them to the user for turning on and off. Incidentally, a multitrack media JavaScript API is in development in the W3C HTML5 Accessibility Task Force for allowing such control.
The current version of Firefox uses liboggplay for Ogg support, but liboggplay’s multitrack support has been sketchy this far. So, Viktor Gal – the liboggplay maintainer – and I sat down at FOMS/LCA to discuss this and Viktor developed some patches to make the demo player in the liboggplay package, the glut-player, support the accessibility use cases.
I applied Viktor’s patch to my local copy of liboggplay and I am very excited to show you the screencast of glut-player playing back a video file with an audio description track and an English caption track all in sync:
Further developments
There are still important questions open: for example, how will a player know that an audio description track is to be played together with the main audio track, but a dub track (e.g. a German dub for an English video) is to be played as an alternative. Such metadata for the tracks is something that Ogg is still missing, but that Ogg can be extended with fairly easily through the use of the Skeleton track. It is something the Xiph community is now working on.
Summary
This is great progress towards accessibility support in Ogg and therefore in Web browsers. And there is more to come soon.
Audio Track Accessibility for HTML5
I have talked a lot about synchronising multiple tracks of audio and video content recently. The reason was mainly that I foresee a need for more than two parallel audio and video tracks, such as audio descriptions for the vision-impaired or dub tracks for internationalisation, as well as sign language tracks for the hard-of-hearing.
It is almost impossible to introduce a good scheme to deliver the right video composition to a target audience. Common people will prefer bare a/v, vision-impaired would probably prefer only audio plus audio descriptions (but will probably take the video), and the hard-of-hearing will prefer video plus captions and possibly a sign language track . While it is possible to dynamically create files that contain such tracks on a server and then deliver the right composition, implementation of such a server method has not been very successful in the last years and it would likely take many years to roll out such new infrastructure.
So, the only other option we have is to synchronise completely separate media resource together as they are selected by the audience.
It is this need that this HTML5 accessibility demo is about: Check out the demo of multiple media resource synchronisation.
I created a Ogg video with only a video track (10m53s750). Then I created an audio track that is the original English audio track (10m53s696). Then I used a Spanish dub track that I found through BlenderNation as an alternative audio track (10m58s337). Lastly, I created an audio description track in the original language (10m53s706). This creates a video track with three optional audio tracks.
I took away all native controls from these elements when using the HTML5 audio and video tag and ran my own stop/play and seeking approaches, which handled all media elements in one go.
I was mostly interested in the quality of this experience. Would the different media files stay mostly in sync? They are normally decoded in different threads, so how big would the drift be?
The resulting page is the basis for such experiments with synchronisation.
The page prints the current playback position in all of the media files at a constant interval of 500ms. Note that when you pause and then play again, I am re-synching the audio tracks with the video track, but not when you just let the files play through.
I have let the files play through on my rather busy Macbook and have achieved the following interesting drift over the course of about 9 minutes:
You will see that the video was the slowest, only doing roughly 540s, while the Spanish dub did 560s in the same time.
To fix such drifts, you can always include regular re-synchronisation points into the video playback. For example, you could set a timeout on the playback to re-sync every 500ms. Within such a short time, it is almost impossible to notice a drift. Don’t re-load the video, because it will lead to visual artifacts. But do use the video’s currentTime to re-set the others. (UPDATE: Actually, it depends on your situation, which track is the best choice as the main timeline. See also comments below.)
It is a workable way of associating random numbers of media tracks with videos, in particular in situations where the creation of merged files cannot easily be included in a workflow.
Tutorial on HTML5 open video at LCA 2010
During last week’s LCA, Jan Gerber, Michael Dale and I gave a 3 hour tutorial on how to publish HTML5 video in an open format.
We basically taught people how to create and publish Ogg Theora video in HTML5 Web pages and how to make them work across browsers, including much of the available tools and libraries. We’re hoping that some people will have learnt enough to include modules in CMSes such as Drupal, Joomla and Wordpress, which will easily support the publishing of Ogg Theora.
I have been asked to share the material that we used. It consists of:
- HTML5_Tutorial (611KB)
- the example videos (328MB), and
- HTML5 video exercises (3.4KB).
Note that if you would like to walk through the exercises, you should install the following software beforehand:
- oggz-tools
- oggvideotools
- apache2 or a Web server of your choice
- ffmpeg2theora
- firefox3.5+
- firefogg plugin
- firebug plugin
- vlc, mplayer, totem or xine
- kino or pitivi or another video editor that exports Theora, e.g. iMovie with XiphQT
You might need to look for packages of your favourite OS (e.g. Windows or Mac, Ubuntu or Debian).
The exercises include:
- creating a Ogg video from an editor
- transcoding a video using http://firefogg.org/
- creating a poster image using OggThumb
- writing a first HTML5 video Web page with Ogg Theora
- publishing it on a Web Server, with correct MIME type & Duration hint
- writing a second HTML5 video Web page with Ogg Theora & MP4 to cover Safari/Webkit
- transcoding using ffmpeg2theora in a script
- writing a third HTML5 video Web page with Cortado fallback
- writing a fourth Web page using “Video for Everybody”
- writing a fifth Web page using “mwEmbed”
- writing a sixth Web page using firefogg for transcoding before upload
- and a seventh one with a progress bar
- encoding srt subtitles into an Ogg Kate track
- writing an eighth Web page using cortado to display the Ogg Kate track
For those that would like to see the slides here immediately, a special flash embed:
Enjoy!
HTML5 video: 25% H.264 reach vs. 95% Ogg Theora reach
Vimeo started last week with a HTML5 beta test. They use the H.264 codec, probably because much of their content is already in this format through the Flash player.
But what really surprised me was their claim that roughly 25% of their users will be able to make use of their HTML5 beta test. The statement is that 25% of their users use Safari, Chrome, or IE with Chrome Frame. I wondered how they got to that number and what that generally means to the amount of support of H.264 vs Ogg Theora on the HTML5-based Web.
According to Statcounter’s browser market share statistics, the percentage of browsers that support HTML5 video is roughly: 31.1%, as summed up from Firefox 3.5+ (22.57%), Chrome 3.0+ (5.21%), and Safari 4.0+ (3.32%) (Opera’s recent release is not represented yet).
Out of those 31.1%,
and
Given these numbers, Vimeo must assume that roughly 16% of their users have Chrome Frame in IE installed. That would be quite a number, but it may well be that their audience is special.
So, how is Ogg Theora support doing in comparison, if we allow such browser plugins to be counted?
With an installation of XiphQT, Safari can be turned into a browser that supports Ogg Theora. The Chome Frame installation will also turn IE into a Ogg Theora supporting browser. These could get the browser support for Ogg Theora up to 45%. Compare this to a claimed 48% of MS Silverlight support.
But we can do even better for Ogg Theora. If we use the Java Cortado player as a fallback inside the video element, we can capture all those users that have Java installed, which could be as high as 90%, taking Ogg Theora support potentially up to 95%, almost up to the claimed 99% of Adobe Flash.
I’m sure all these numbers are disputable, but it’s an interesting experiment with statistics and tells us that right now, Ogg Theora has better browser support than H.264.
UPDATE: I was told this article sounds aggressive. By no means am I trying to be aggressive – I am stating the numbers as they are right now, because there is a lot of confusion in the market. People believe they reach less audience if they publish in Ogg Theora compared to H.264. I am trying to straighten this view.
Video Streaming from Linux.conf.au
You probably heard it already: Linux.conf.au is live streaming its video in a Microsoft proprietary format.
Fortunately, there is now a re-broadcast that you can get in an open format from http://stream.v2v.cc:8000/ . It comes from a server in Europe, but relies on transcoding here in New Zealand, so it may not be completely reliable.
UPDATE: A second server is now also available from the US at http://repeater.xiph.org:8000/.
Today, the down under open source / Linux conference linux.conf.au in Wellington started with the announcement that every talk and mini-conf will be live streamed to the Internet and later published online. That’s an awesome achievement!
However, minutes after the announcement, I was very disappointed to find out that the streams are actually provided in a proprietary format and through a proprietary streaming protocol: a Microsoft streaming service that provides Windows media streams.
Why stream an open source conference in a proprietary format with proprietary software? If we cannot use our own technologies for our own conferences, how will we get the rest of the world to use them?
I must say, I am personally embarrassed, because I was part of several audio/video teams of previous LCAs that have managed to record and stream content in open formats and with open media software. I would have helped get this going, but wasn’t aware of the situation.
I am also the main organiser of the FOMS Workshop (Foundations of Open Media Software) that ran the week before LCA and brought some of the core programmers in open media software into Wellington, most of which are also attending LCA. We have the brains here and should be able to get this going.
Fortunately, the published content will be made available in Ogg Theora/Vorbis. So, it’s only the publicly available stream that I am concerned about.
Speaking with the organisers, I can somewhat understand how this came to be. They took the “easy” way of delegating the video work to an external company. Even though this company is an expert in open source and networking, their media streaming customers are all using Flash or Windows media software, which are current de-facto standards and provide extra features such as DRM. It seems apart from linux.conf.au there were no requests on them for streaming Ogg Theora/Vorbis yet. Their existing infrastructure includes CDN distribution and CDN providers certainly typically don’t provide Ogg Theora/Vorbis support or Icecast streaming.
So, this is actually a problem founded in setting up streaming through a professional service rather than through the community. The way in which this was set up at other events was to get together a group of volunteers that provided streaming reflectors for free. In this way, a community-created CDN is built that can deal with the streams. That there are no professional CDN providers available yet that provide Icecast support is a sign that there is a gap in the market.
But phear not – a few of the FOMS folk got together to fix the situation.
It involved setting up Icecast streams for each room’s video stream. Since there is no access to the raw video stream, there is a need to transcode the video from proprietary codecs to the open Ogg Theora/Vorbis format.
To do this legally, a purchase of the codec libraries from Fluendo was necessary, which cost a whopping EURO 28 and covers all the necessary patent licenses. The glue to get the videos from mms to icecast streams is a GStreamer pipeline which I leave others to talk about.
Now, we have all the streams from the conference available as Ogg Theora/Video streams, we can also publish them in HTML5 video elements. Check out this Web page which has all the video streams together on a single page. Note that the connections may be a bit dodgy and some drop-outs may occur.
Further, let me recommend the Multimedia Miniconf at linux.conf.au, which will take place tomorrow, Tuesday 19th January. The Miniconf has decided to add a talk about “How to stream you conference with open codecs” to help educate any potential future conference organisers and point out the software that helps solve these issues.
UPDATE: I should have stated that I didn’t actually do any of the technical work: it was all done by Ralph Giles, Jan Gerber, and Jan Schmidt.
Opera’s present for the New Year
I am a very happy camper today! Not because of the New Year – well, yes, there are new opportunities and challenges for the New Year. But I’ve just received an email from Philip Jägenstedt announcing the New Year’s pre-alpha release of Opera 10.50 has HTML5 video support! Congratulations, Philip, congratulations Opera!
Opera’s HTML5 video support is based on using GStreamer, an open source multimedia framework used widely on Linux systems. On Linux, the Opera package will make sure you have GStreamer installed and thus provide HTML5 video support on all codecs that your GStreamer install supports. On other platforms, Opera will come packaged with a rudimentary version of GStreamer which provides only core codec support. Right now, that has only been done for Windows – I’m looking forward for the Mac version!
As core codecs, Opera has decided to support Ogg Vorbis, Ogg Theora and uncompressed WAVE PCM. This makes it the third browser to support Ogg Vorbis/Theora next to Firefox and Chrome and moves the balance in codec support in favor of open and royalty free codecs: three browsers to support Ogg Theora/Vorbis vs two browsers to support MPEG H.264/AAC.
It’s also cool to see Philips announcement of intending to support the W3C Media Fragments specification for directly addressing time offsets (and other fragments). This is probably related to the implementation of seeking, which is the same problem, technically. Lack of seeking is actually a bit annoying right now, since you cannot jump to time offsets in the video or find out how long the video without having played it through completely.
It’s also cool to see that Opera is on board with wanting to implement caption support. It has already started accessibility support for the video element with the following:
- you can tab onto the video controls: play/pause, transport bar, volume are tabbed to separately
- space bar toggles between play and pause when keyboard focus is on it
- when focused on the volume button, up/down arrow increases/decreases volume, space bar turns it on and off
I’m sure that once Opera has seeking support implemented, the transport bar will get improved and display progress, and also provide keyboard accessibility through being able to jump forwards and backwards with arrow key combinations.
Very nice work, Opera, and an awesome New Year’s present to the world!!
HTML5 Video element discussions at TPAC meetings
Last week’s TPAC (2009 W3C Technical Plenary / Advisory Committee) meetings were my second time at a TPAC and I found myself becoming highly involved with the progress on accessibility on the HTML5 video element. There were in particular two meetings of high relevanct: the Video Accessibility workshop and Friday’s HTML5 breakout group on the video element.
HTML5 Video Accessibility Workshop
The week started on Sunday with the “HTML5 Video Accessibility workshop” at Stanford University, organised by John Foliot and Dave Singer. They brought together a substantial number of people all representing a variety of interest groups. Everyone got their chance to present their viewpoint – check out the minutes of the meeting for a complete transcript.
The list of people and their discussion topics were as follows:
Accessibility Experts
- Janina Sajka, chair of WAI Protocols and Formats: represented the vision-impaired community and expressed requirements for a deeply controllable access interface to audio-visual content, preferably in a structured manner similar to DAISY.
- Sally Cain, RNIB, Member of W3C PF group: expressed a deep need for audio descriptions, which are often overlooked besides captions.
- Ken Harrenstien, Google: has worked on captioning support for video.google and YouTube and shared his experiences, e.g. http://www.youtube.com/watch?v=QRS8MkLhQmM, and automated translation.
- Victor Tsaran, Yahoo! Accessibility Manager: joined for a short time out of interest.
Practicioners
- John Foliot, professor at Stanford Uni: showed a captioning service that he set up at Stanford University to enable lecturers to publish more accessible video – it uses humans for transcription, but automated tools to time-align, and provides a Web interface to the staff.
- Matt May, Adobe: shared what Adobe learnt about accessibility in Flash – in particular that an instream-only approach to captions was a naive approach and that external captions are much more flexible, extensible, and can fit into current workflows.
- Frank Olivier, Microsoft: attended to listen and learn.
Technologists
- Pierre-Antoine Champin from Liris (France), who was not able to attend, sent a video about their research work on media accessibility using automatic and manual annotation.
- Hironobu Takagi, IBM Labs Tokyo, general chair for W4A: demonstrated a text-based audio description system combined with a high-quality, almost human-sounding speech synthesizer.
- Dick Bulterman, Researcher at CWI in Amsterdam, co-chair of SYMM (group at W3C doing SMIL): reported on 14 years of experience with multimedia presentations and SMIL (slides) and the need to make temporal and spatial synchronisation explicit to be able to do the complex things.
- Joakim Söderberg, researcher at Ericsson, co-chair of media annotation group at W3C: reported on W3C media annotations group work and wanted to find out whether there are a11y related attributes missing.
- Felix Sasaki, University of Applied Sciences in Potsdam, W3C media annotations group member: Teaching metadata.
- Eric Carlson, Apple, Engineering HTML5 media elements in Webkit: there to watch and make sure the specification is implementable.
- James Craig, Accessibility Software Quality Engineer, Apple: is interested in universal design and to solve content selection.
- Myself, as Mozilla’s video accessibility contractor: I discussed what requirements I had collected before going ahead and doing implementations (slides) – also showed my demos.
Standards Experts
- Dave Singer, Apple, head of multimedia standards: interested in building up a framework that gets better accessibility over time, and presented on how media queries could be used to improve source selection (slides).
- Michael Cooper, work as W3C WAI staff: needs to make sure W3C technology has accessibility baked in and existing solutions are re-used.
- Marisa DeMeglio, DAISY Consortium developer: reported on Digital Talking Book standards, the challenges with video in DAISY 4, and understanding the possibilities with HTML5.
- Ian Hickson, Google, HTML5 editor: gave an update on the state of accessibility in HTML5.
- Chris Lilley, W3C Hypertext CG co-chair, CSS and SVG group, etc.: wanted to make sure that whatever solution is created for HTML will be usable in SVG, and was also keenly interested to solve the internationalisation question.
- Philippe Le Hegaret, W3C representative for HTML, timed text, and video working groups: reported on work in Timed Text working group, including a demo of DFXP use in HTML5.
- Charles McCathieNeville, Opera, in charge of standards: interested in i18n and use of accessibility methods across different technologies without reinventing wheels.
- Geoff Freed, NCAM, joined online: keen to solve captions and audio description in HTML5 in a declarative way.
- Judy Brewer, head of WAI at W3C: interested in how the options for accessible media affect the user, especially when there are multiple options. Keen to create a proper requirements document.
- Doug Schepers, Team Contact for the SVG and WebApps Working Groups: video handling in HTML5 and SVG should be similar. Keen to re-use technology.
The workshop helped clarify some of the requirements and potential solutions. No concrete specification progress was made, but that was not the intention of the workshop. Rather, it started getting people talking and involved with the newly created HTML5 Accessibility Task Force. In the end, the creation of a requirements document for media accessibility was taken on board as one of the first tasks for the HTML5 Accessibility Task Force, which, I believe, is still looking for an editor for it.
TPAC HTML5 Video Breakout Group
The HTML Working Group at TPAC was run differently to the previous year: instead of having all topics discussed in the full group, breakout groups were organised in two parallel rooms which focused each on particular topics that still need resolution in HTML5. I proposed a breakout group on Video Accessibility and somebody else wanted to get an update on the baseline codec discussion on the HTML5 video element, so we turned it into a single Video breakout group.
I put together a list of the core issues that we had come across from Sunday and turned it into a an agenda. Here are some short notes on each topic that was discussed. A full transcript is available in the minutes.
- Baseline codecs
The brief discussion about baseline codecs just stated the impasse that we are currently at wrt the lack of a baseline codec that would satisfy all the stated requirements for a baseline codec. There is work being done behind closed doors to move towards agreeable royalty-free codecs, but no progress to be reportable to date. - Cue ranges
It has become apparent that there is a need to bring back the functionality of cue ranges, in particular for things such as activating slide changes, pause and display ads, and captions for live video. Rather than callbacks, this time a declarative and event-based solution is envisaged. It will possibly get introduced as part of the solution for captions. - Media Element Accessibility
The discussion here focused on the means in which to provide caption/subtitle support and briefly touched upon audio descriptions. I collected the following requirements:- while we prefer textual sources for captions/subtitles, and burnt-in captions or bitmap overlays (the DVD format) are not ideal, they should still be possible to be displayed; since most bitmaps are in gif, png or bmp format, that should not be a major issue
- we need a declarative syntax for captions/subtitles, both for in-band and external files
- if there are “conflicting” captions/subtitles available from in-band and external files, the in-band one would probably be displayed by default, but all available tracks should be user selectable
- we need a default presentation, but also a JavaScript API to allow custom display
- to deal with cross-site scripting issues for external files, CORS is probably a good solution
- baseline codecs for captions/subtitles should be DFXP, srt and probably smpteTText – a new DFXP based SMPTE standard for timed text; it was further suggested to regard srt simply as a trivial subpart of DFXP functionality
- the default presentation has to take into account authoring requirements, user preferences, and allow for interactive override – it seems we need to define new user preferences for video a11y aside from simply the browser language settings
- the idea of using text and ARIA live regions to display audio descriptions was welcomed as a useful modern means to provide a11y to vision-impaired – it provides choice between screen reader and braille use and also improves searchability and has further advantages of being automatically processable
- however, there continues to be a need to make human-created audio descriptions available, in particular for high-quality recordings (Shakespeare was used as an example)
- futher, the HTML5 video element has not yet clarified how multiple encodings e.g. for different devices/bitrates, should be made available – this also ties into making sign language video tracks in different sign languages available
- content selection could be done using media queries – needs further experimentation
Unfortunately, there was not enough time to address the remaining topics on the agenda: the need for hierarchical navigation through audio/video elements, and the handling of multi-track video. I am sure we will get back to them in due time.
Well, the meetings have certainly widened my understanding of the issues that we are currently dealing with around the audio and video elements – in particular I believe that once we solve how to deal with multiple alternative representations of the original media file, we will also solve many of the accessibility issues. It may, however, happen that we create the accessibility solutions first and thus also solve the issues of alternate representations. I have a lot to think about.
FOMS and LCA Multimedia Miniconf
If you haven’t proposed a presentation yet, got ahead and register yourself for:
FOMS (Foundations of Open Media Software workshop) at
http://www.foms-workshop.org/foms2010/pmwiki.php/Main/CFP
LCA Multimedia Miniconf at
http://www.annodex.org/events/lca2010_mmm/pmwiki.php/Main/CallForP
It’s already November and there’s only Christmas between now and the conferences!
I’m personally hoping for many discussions about HTML5 <video> and <audio>, including what to do with multitrack files, with cue ranges, and captions. These should also be relevant to other open media frameworks – e.g. how should we all handle multitrack sign language tracks?
But there are heaps of other topics to discuss and anyone doing any work with open media software will find a fruitful discussions at FOMS.
