Today we started a community group (CG) at the W3C for “Web Media Text Tracks”: http://www.w3.org/community/texttracks/.
The group has been created to work on many aspects of video text tracks of which captioning and the WebVTT format are key parts.
The main reason behind creating this group is to create a forum at the W3C for working on WebVTT to allow all browsers to support this format and be involved in its development.
We’ve not gone the full way to creating a Working Group, although that was the initial intention. We had objections from W3C members for going down that path, so are using the CG path for now.
This is actually a good thing because CGs are open for anyone to join, while WGs are only open to W3C members. The key difference is that specs coming out of WGs can become RECs (“standards”), while CG’s specs cannot.
If we eventually see a need to move WebVTT to a REC, that move will be straight forward, since there is a clear path for work to transition from a CG to a WG.
Curious about any new requirements that the TV community may have for HTML5 video, I attended the W3C Web and TV Workshop in Hollywood last week. It’s already the third of its kind and was also the largest to date showing an increasing interest of the TV community to converge with the Web community.
The Workshop Aim
I went into the Workshop not quite knowing what to expect. My previous contact with members of this community was restricted to email exchanges on the W3C Web and TV Interest Group (IG) mailing list. I knew there was some interest in video accessibility (well: particularly captions) and little knowledge of existing HTML5 specifications around text tracks and why the browsers were going with WebVTT. So I had decided to attend the workshop to get a better understanding of the community, it’s background, needs, and issues, and to hopefully teach some of the ways of HTML5. For that reason I had also submitted a WebVTT presentation/demo.
As it turned out, the workshop had as its key target the facilitation of communication between the TV and the HTML5 community. The aim was to identify features that need to be added to the HTML5 video element to satisfy the needs of the TV community. I obviously came to the right workshop.
The process that is being used by the W3C in the Interest Group is to have TV community members express their needs, then have HTML5 experts express how these needs can be satisfied with existing HTML5 features, then make trial implementations and identify any shortcomings, then move forward to progress these through HTML5 or HTML.next. This workshop clearly focused on the first step: expressing needs.
Often times it was painful for me to watch presenters defending their requirements and trying to impress on the audience how important a certain feature is to them when that features actually already has a HTML5 specification, but just not yet a browser implementations. That there were so few HTML5 video experts present and that they were given very little space to directly reply to the expressed needs and actually explain what is already possible (or specified to be possible) was probably one of the biggest drawbacks of the workshop.
Now, the video features of HTML5 are still continuing to change almost on a daily basis. Much development is, for example, happening around real-time communication features and around the track element as we speak. So, focusing on further requirements finding around HTML5 video for now is probably a good thing.
The TV Community Approach
Before I move on to some of the topics covered by the workshop, I have to express some concern about the behaviour that I observed with lots of the TV community folks. Many people tried pushing existing solutions from other spaces into the Web unchanged with a claim of not re-inventing the wheel and following paved cowpaths, which are some of the underlying design principles for HTML5. I can understand where such behaviour originates thinking that having solved the same problems elsewhere before, those solutions should apply here, too. But I would like to warn people of this approach.
I would therefore like to see some more respect given to HTML5 and the complexities involved in finding the best possible technical solutions for the Web given that the video element does not stand alone in HTML5, but is part of a much larger picture of technical capabilities on the Web where many of the requested features for TV applications may already be solved by existing HTML markup that is not part of the video element.
Now let’s get to the take-aways I had from the workshop’s sessions:
Session 1 / Content Provider and Consumer Perspective:
The sessions participants postulate that we will see the creation of application stores for TV applications similar to how we have experienced this for mobile phones and tablets. People enjoy collecting apps like they collect badges. Right now, the app store domain is dominated by native apps and now Web apps. The reason is that we haven’t got a standard platform for setting up Web app stores with Web apps that work in all browsers on all operating systems. Thus, developers have to re-deploy their app for many environments.
While essentially an orthogonal need to HTML standardisation, this seems to be one of the key issues that keep Web apps back from making big market inroads and W3C may do well in setting up a new WG to define a standard Web app manifest format and JS APIs.
Session 2+3 / Multi-screen TV in the Home Network:
Several technologies of hybrid TV broadcast and set-top-box Web content delivery were being pointed out, including the European HbbTV and the Japanese Hybridcast, the latter of which gave an in-depth demo.
Web purists would probably say that it would be simpler to just deliver all content over the Web and not have to worry about any further technical challenges encountered by having to synchronize content received via two vastly different delivery mechanisms. I personally believe this development is one of business models: we don’t yet know exactly how to earn money from TV content delivered over the Internet, but we do know how to do so with TV content. So, hybrids allow the continuation of existing income streams while allowing the features to be augmented with those people enjoy from the Internet.
Should requirements that emerge from such a use case for HTML5 video be taken seriously? I think they absolutely should. What I see happening is that a new way of using the Web is starting to emerge. The new way is video-focused rather than text-focused. We receive our Web content by watching video programming online – video channels, not Web pages are the core content that we consume in the living room. Video channels are where we start our browsing experience from. Search may still be our first point of call, but it will be search for video content or a video-centric app rather than search for a Web site.
And it will be a matter of many interconnected devices in the house that contribute to the experience: the 5.1 stereos that are spread all over the house and should receive our video’s sound, the different screens in the different areas of our house between which we move around, and remote controls, laptops or tablets that function as remote controls and preview stations and are used to determine our viewing experience and provide a back-channel to the publishers.
We have barely begun to identify how such interconnected devices within a home fit within the server-client-based view of the Web world, and the new Web Sockets functionality. The Home Networking Task Force of the Web and TV IG is looking at the issues and analysing existing protocols and standards that solve this picture. But I have a gnawing feeling that the best solution will be something new that is more Web-specific and fits better with the technology layers of the Web.
Session 4 / Synchronized Metadata:
The TV environment offers many data services, some of which have been legally prescribed. This session analysed TV needs and how they can be satisfied with current HTML5.
Subtitles and closed captioning support are one of the key requirements that have been legally prescribed to allow for equal access of non-native speakers, and blind and vision-impaired users to TV content. After demonstration of some key features defined into the HTML5 track element and the WebVTT format, it was generally accepted that HTML5 is making big progress in this space, in particular that browsers are in the process of implementing support for the track element. A concern still exists for complete coverage of all the CEA-608/708 features in WebVTT.
Demos were shown using WebVTT to synchronize ads with videos, to display related metadata from a user’s life log with videos, to display thumbnails along a video’s timeline, and to show the rendering of text descriptions through screen readers. General agreement by the panel was that WebVTT offers many opportunities and that this area will continue to need further development and that we will see new capabilities on the Web around metadata that were not previously possible on TV.
Session 5 / Content Format and Codecs: DASH and Codec standards
The introduction of HTTP adaptive streaming into HTML5 was one of the core issues that kept returning in the discussions. This panel focused on MPEG DASH, but also mentioned the need for programmatic implementation of adaptive streaming functionality.
The work around MPEG DASH would require specifications of how to use DASH with WebM and Ogg Theora, as well as a specification of a HTML5 profile for DASH, which would limit the functionality possible in DASH files to the ones needed in a HTML5 video element. One criticism of DASH was its verbosity. Another was its unclear patent position. Panel attendees with included Qualcomm, Apple and Microsoft made very clear that their position is pro a royalty-free use of DASH.
I would personally support the creation of Community Group at the W3C around HTTP adaptive streaming and DASH. I think it would work towards alleviating the perceived patent issues around DASH and allow the right members of the community to participate in preparing a specification for HTML5 without requiring them to become W3C members.
Session 6 / Content Protection and DRM
A core concern of the TV community is around content protection. The requirements in this space seem, however, very confused.
The picture is further confused by requests for authentication, authorization, encryption, obfuscation, same-origin, secure transmission, secure decryption key delivery, unique content identification and other “content protection” techniques without a clear understanding of what is already possible on the Web and what requirements to content publishers actually have for delivering their content on the Web. This is further complicated by the fact that there are many competing solutions for DRM systems in the market with no clear standard that all browsers could support.
A thorough analysis of the technologies and solutions available in this space as well as an analysis of the needs for HTML5 is required before it becomes clear what solution HTML5 browsers may need to support. There seemed to be agreement in the group, though, that browsers would not need to implement DRM solutions, but rather only hand through the functionality of the platform on which they are running (including the media frameworks and operating system functionalities). How this is supposed to work was, however, unclear.
Session 7 / Web & TV: Additional Device & User Requirements
This was a catch-all session for topics that had not been addressed in other sessions. Among the topics addressed in this group were:
- Parental Guidance: how to deal with ratings in an internationally inconsistent ratings landscape, how to deliver the ratings with the content, and how to enforce the viewing restrictions
- Emergency Notifications: how to replicate on the Web the emergency notification functionality of TV by providing text overlays to alert users
- TV channels: how to detect what channels of programming are available to users
Overall, the workshop was a worthwhile experience. It seems there is a lot of work still ahead for making HTML5 video the best it can be on the Web.