ginger's thoughts

Silvia's blog

HTML5 multi-track audio or video

Posted in Digital Media,standards,video accessibility,Videos by silvia on the May 1st, 2011

In the last months, we’ve been working hard at the WHATWG and W3C to spec out new HTML markup and a JavaScript interface for dealing with audio or video content that has more than just one audio and video track.

This is particularly relevant when a Web page author wants to add a sign language track to a video or audio resource for deaf people, or an audio description track (i.e. a sound track in which a speaker explains the key things that can be seen on screen) for blind people. It is also relevant when a Web page author wants to publish a video with multiple audio tracks that are each a different language dub for the video and can be used for less common cases such as a director’s comment track, or making available different camera angles for an event.

Just to be clear: this is not a means to introduce video editing functionality into the Web browser. If you want to do edits, you’re better off with an application that will eventually render a new piece of content and includes fancy transitions etc. Similarly, this is not a means to introduce mixing functionality (as in what DJs do when they play with multiple audio recordings). You’re better off with an actual audio mixing or DJ application that will provide you all sorts of amazing effects and filters.

So, multi-track is squarely focused on synchronizing alternative or additional tracks to a single resource with a single timeline to which all tracks are slaved.

Two means of publishing such multi-track media content are possible:

  • In-band multi-track
  • Synchronized resources

1. In-band multi-track

In in-band multi-track, there is a single file that has all all the tracks inside it. For this single file, there is now an API in HTML5 that allows addressing and controlling these tracks.

Of the video file formats that Web browsers support, WebM is currently not defined to contain more than one audio or video track. However, since WebM is using the Matroska container format, which supports multi-track, it is possible to extend WebM for multi-track resources. I have seen multitrack Ogg, MP4 and Matroska files in the wild and most media players support their display.

The specification that has gone into HTML5 to support in-band multi-track looks as follows:

interface HTMLMediaElement : HTMLElement {
  [...]
  // tracks
  readonly attribute MultipleTrackList audioTracks;
  readonly attribute ExclusiveTrackList videoTracks;
};

interface TrackList {
  readonly attribute unsigned long length;
  DOMString getID(in unsigned long index);
  DOMString getKind(in unsigned long index);
  DOMString getLabel(in unsigned long index);
  DOMString getLanguage(in unsigned long index);

           attribute Function onchange;
};

interface MultipleTrackList : TrackList {
  boolean isEnabled(in unsigned long index);
  void enable(in unsigned long index);
  void disable(in unsigned long index);
};

interface ExclusiveTrackList : TrackList {
  readonly attribute unsigned long selectedIndex;
  void select(in unsigned long index);
};

You will notice that every audio and video track gets an index to address them. You can enable() and disable() individual audio tracks and you can select() a single video track for display. This means that one or more audio tracks can be active at the same time (e.g. main audio and audio description), but only one video track will be active at a time (e.g. main video or sign language).

Through the getID(), getKind(), getLabel() and getLanguage() functions you can find out more about what actual content is available in the individual tracks so as to activate/deactivate them correctly and display the right information about them.

getKind() identifies the type of content that the track exposes such as “description” (for audio description), “sign” (for sign language), “main” (for the default displayed track), “translation” (for a dubbed audio track), and “alternative” (for an alternative to the default track).

getLabel() provides a human readable string that describes the content of the track aiming to be used in a menu.

getID() provides a short machine-readable string that can be used to construct a media fragment URI for the track. The use case for this will be discussed later.

getLanguage() provides a machine-readable language code to identify which language is spoken or signed in an audio or sign language video track.

Example 1:

The following uses a video file that has a main video track, a main audio track in English and French, and an audio description track in English and French. (It likely also has caption tracks, but we will ignore text tracks for now.) This code sample switches the French audio tracks on and all other audio tracks off.

<video id="v1" poster=“video.png” controls>
 <source src=“video.ogv” type=”video/ogg”>
 <source src=“video.mp4” type=”video/mp4”>
</video>

<script type="text/javascript">
video = document.getElementsByTagName("video")[0];

for (i=0; i< video.audioTracks.length; i++) {
  if (video.audioTracks.getLanguage(i) == "fr") {
    video.audioTracks.enable(i);
  } else {
    video.audioTracks.disable(i);
  }
}
</script>

Example 2:

The following uses a audio file that has a main audio track in English, no main video track, but sign language video tracks in ASL (American Sign Language), BSL (British Sign Language), and ASF (Australian Sign Language). This code sample switches the Australian sign language track on and all other video tracks off.

<video id="a1" controls>
 <source src=“audio_sign.ogg” type=”video/ogg”>
 <source src=“audio_sign.mp4” type=”video/mp4”>
</video>

<script type="text/javascript">
video = document.getElementsByTagName("video")[0];

for (i=0; i< video.videoTracks.length; i++) {
  if (video.videoTracks.getLanguage(i) == "asf") {
    video.videoTracks.select(i);
    break;
  }
}
</script>

If you have more tracks in both examples that conflict with your intentions, you may need to further filter your activation / deactivation code using the getKind() function.

2. Synchronized resources

Sometimes the production process of media creates not a single resource with multiple contained tracks, but multiple resources that all share the same timeline. This is particularly useful for the Web, because it means the user can download only the required resources, typically saving a substantial amount of bandwidth.

For this situation, an attribute called @mediagroup can be added in markup to slave multiple media elements together. This is administrated in the JavaScript API through a MediaController object, which provides events and attributes for the combined multi-track object.

The new IDL interfaces for HTMLMediaElement are as follows:

interface HTMLMediaElement : HTMLElement {
  [...]
  // media controller
           attribute DOMString mediaGroup;
           attribute MediaController controller;
};

interface MediaController {
  readonly attribute TimeRanges buffered;
  readonly attribute TimeRanges seekable;
  readonly attribute double duration;
           attribute double currentTime;

  readonly attribute boolean paused;
  readonly attribute TimeRanges played;
  void play();
  void pause();

           attribute double defaultPlaybackRate;
           attribute double playbackRate;

           attribute double volume;
           attribute boolean muted;

           attribute Function onemptied;
           attribute Function onloadedmetadata;
           attribute Function onloadeddata;
           attribute Function oncanplay;
           attribute Function oncanplaythrough;
           attribute Function onplaying;
           attribute Function onwaiting;
           attribute Function ondurationchange;
           attribute Function ontimeupdate;
           attribute Function onplay;
           attribute Function onpause;
           attribute Function onratechange;
           attribute Function onvolumechange;
};

You will notice that the MediaController replicates some of the states and events of the slave media elements. In general the approach is that the attributes represent the summary state from all the elements and the writable attributes when set are handed through to all the slave elements.

Importantly, if the individual media elements have @controls activated, then the displayed controls interact with the MediaController thus allowing synchronized playback and interaction with the combined multi-track object.

Example 3:

The following uses a video file that has a main video track, a main audio track in English. There is another video file with the ASL sign language for the video, and an audio file with the audio description in English. This code sample creates controls on the first file, which then also control the audio description and the sign language video, neither of which have controls. Since the audio description doesn’t have controls, it doesn’t get visually displayed. The sign language video will just sit next to the main video without controls.

<video id="v1" poster=“video.png” controls mediagroup="a11y_vid">
 <source src=“video.webm” type=”video/webm”>
 <source src=“video.mp4” type=”video/mp4”>
</video>

<video id="v2" poster=“sign.png” mediagroup="a11y_vid">
 <source src=“sign.webm” type=”video/webm”>
 <source src=“sign.mp4” type=”video/mp4”>
</video>

<audio id="a1" mediagroup="a11y_vid">
 <source src=“audio.ogg” type=”audio/ogg”>
 <source src=“audio.mp3” type=”audio/mp3”>
</audio>

Example 4:

We now accompany a main video with three sign language video tracks in ASL, BSL and ASF. We could just do this in JavaScript and replace the currentSrc of a second video element with the links to BSL and ASF as required, but then we need to run our own media controls to list the available tracks. So, instead, we create a video element for each one of the tracks and use CSS to remove the inactive ones from the page layout. The code sample activates the ASF track and deactivates the other sign language tracks.

<style>
  video.inactive { display: none; }
</style>

<video id="v1" poster=“video.png” controls mediagroup="a11y_vid">
 <source src=“video.webm” type=”video/webm”>
 <source src=“video.mp4” type=”video/mp4”>
</video>

<video id="v2" poster=“sign_asl.png” mediagroup="a11y_vid" class="active">
 <source src=“sign_asl.webm” type=”video/webm”>
 <source src=“sign_asl.mp4” type=”video/mp4”>
</video>

<video id="v3" poster=“sign_bsl.png” mediagroup="a11y_vid" class="inactive">
 <source src=“sign_bsl.webm” type=”video/webm”>
 <source src=“sign_bsl.mp4” type=”video/mp4”>
</video>

<video id="v4" poster=“sign_asf.png” mediagroup="a11y_vid" class="inactive">
 <source src=“sign_asf.webm” type=”video/webm”>
 <source src=“sign_asf.mp4” type=”video/mp4”>
</video>

<script type="text/javascript">
videos = document.getElementsByTagName("video");

for (i=0; i< videos.length; i++) {
  if (video[i].videoTracks.getLanguage(0) == "asf") {
    video[i].setAttribute("class", "active");
  } else {
    video[i].setAttribute("class", "inactive");
  }
}
</script>

Example 5:

In this final example we look at what to do when we have a in-band multi-track resource with multiple video tracks that should all be displayed on screen. This is not a simple problem to solve because a video element is only allowed to display a single video track at a time. Therefore for this problem you need to use both approaches: in-band and synchronized resources.

We take a in-band multitrack resource with a main video and audio track and three sign language tracks in ASL, BSL and ASF. The second resource will be made up from the URI of the first resource with a media fragment address of the sign language tracks. (If required, these can be discovered using the getID() function on the first resource.) The markup will look as follows:

<video id="v1" poster=“video.png” controls mediagroup="a11y_vid">
 <source src=“video.ogv#track=v_main&track=a_main” type=”video/ogv”>
 <source src=“video.mp4#track=v_main&track=a_main” type=”video/mp4”>
</video>

<video id="v2" poster=“sign.png” controls mediagroup="a11y_vid">
 <source src=“video.ogv#track=asl&track=bsl&track=asf” type=”video/ogv”>
 <source src=“video.mp4#track=asl&track=bsl&track=asf” type=”video/mp4”>
</video>

Note that with multiple video elements you can always style them in the way that you want them displayed on screen. E.g. if you want a picture-in-picture display, you scale the second video down and absolutely position it on top of the first one in the appropriate location. You can even grab the second video into a canvas, chroma-key your sign language speaker on a green or blue screen and remove that background through some canvas processing before popping it on top of the video.

The world is all yours!

HOWEVER: There is one big caveat on all these specs – while they have all found entry into the HTML5 specification, it would be expecting a bit much to have browser support already. :-)

32 Responses to 'HTML5 multi-track audio or video'

Subscribe to comments with RSS or TrackBack to 'HTML5 multi-track audio or video'.


  1. on May 3rd, 2011 at 12:30 am

    [...] Sylvia Pfiefer summarized, clarified, and provided demonstrations related the discussion about HTML5 multi-track audio or video that has threaded discussions in both the HTML WG and WHATWG email [...]

  2. sam said,

    on May 25th, 2011 at 6:36 am

    So any knowledge if any browser is working on support for this? The ability to do audio track selection on HTML5 video is really interesting to me (as are the other capabilities mentioned here)

  3. silvia said,

    on May 25th, 2011 at 9:27 am

    AFAIK not yet. Browser vendors have been involved in the specification of this, but none have yet committed to their implementation. You will need to go to the browser vendors (probably their bug trackers) and request implementation.

  4. Amitabh Arora said,

    on November 23rd, 2011 at 4:37 pm

    Hi Sylvia,
    In researching for multi-track audio for karaoke music, I reached here. My big question is that wha encoders are available today for multi-track encoding in browsers and Adobe Flash? Also, how about supporting .mogg file format? Why restrict it to only WebM?

    Cheers. -Amitabh

  5. silvia said,

    on November 23rd, 2011 at 6:33 pm

    Hi Amitabh,
    Today, no multitrack functionality has been implemented into Web browsers yet. This is all specification text waiting to be implemented. The specification is not encoding format specific, but generic – it will work for multitrack Ogg, WebM, MP4 and whatever else browsers want to support.

    As for multitrack encoding: you would probably not do that in with browser capabilities but with an authoring tool. There is oggz-merge available for Ogg. WebM right now does not support multitrack.

    Also, as I understand it, the .mogg format is a non-standard multitrack Ogg Vorbis format. It uses the standard .ogg format and throws some extra headers in front of it to manage the tracks. It is unlikely browsers will support this format – they will more likely support standard multitrack Ogg.


  6. on January 2nd, 2012 at 10:32 am

    Great Article, excellent writing and reasoning

  7. Stu said,

    on March 28th, 2012 at 2:38 pm

    Hi Silvia,

    I’m looking to help a friend build a site for their band. They are trying to get multiple audio tracks in sync and include buttons to switch individual tracks on/off.

    I am wondering if you have a functional page for the example above or know of one I can look at.

    Kind Regards
    Stu

  8. silvia said,

    on March 28th, 2012 at 4:15 pm

    @Stu No browser implements this yet. But how about using https://github.com/rwldrn/mediagroup.js/ – I’ve had it work in Mozilla, but had some problems in WebKit. Not sure about other browsers. It’s a start, but you may need to improve on it.

  9. Stu said,

    on March 29th, 2012 at 8:50 am

    Thank Silvia! Will let you know how I go. Love your work :)

  10. sundaresh said,

    on April 27th, 2012 at 4:33 pm

    @Silvia what about the code in
    http://www.whatwg.org/specs/web-apps/current-work/multipage/the-video-element.html#media-element

    4.8.10.10 Media resources with multiple media tracks-example

  11. silvia said,

    on April 27th, 2012 at 4:57 pm

    @sundaresh – what about it? That’s the updated version from what I described in this article. It has changed a bit, but not fundamentally.

  12. Joe Goggins said,

    on May 6th, 2012 at 7:34 am

    @Stu — Any luck in following @Silvia ‘s suggestion for the multi-track audio thing you are working on with mediagroup.js? (didn’t work for me, w wav files in Chrome, Firefox, or Safari)

    I’m building an app with a similar use-case as you, an app that allows me to browse, search, tag, and play multi-track audio loops from my archive. Would love to find a non-flash implementation to make this happen, if anyone on this thread has suggestions, please send-me my way.

    -joe

  13. silvia said,

    on May 6th, 2012 at 1:48 pm

    @Stu @sundaresh I may be wrong about browsers not supporting @mediagroup yet. You might find it working in the latest WebKit & Chrome builds. Give it a try.

  14. Stu said,

    on May 8th, 2012 at 12:23 pm

    @Joe My projects on hold so sorry to say have nothing to report yet

  15. James Brink said,

    on June 2nd, 2012 at 7:34 am

    There is a minor error in example 3. The tag ends with a video tag instead of audio.

  16. silvia said,

    on June 2nd, 2012 at 7:36 pm

    @James Thanks, fixed.


  17. on June 19th, 2012 at 9:43 pm

    [...] Explanation of the mediagroup attribute [...]

  18. BobbieB said,

    on August 10th, 2012 at 3:52 am

    We developed an app using Flash that presents a series of still pictures with voice annotation for each picture and also a background music track that loops for the entire duration of the playback. The music track ducks when there is an audio of an image frame, So we’re playing two audio tracks at the same time at varying volumes. So far it doesn’t look at thought this is possible as of Aug 2012 with HTML5 audio. If anyone has figured out how to implement this concurrent multi-track playback would love to hear about it. Thanks.

  19. silvia said,

    on August 10th, 2012 at 11:13 pm

    @BobbieB You can do that with HTML5, but you need to use JavaScript and you need a browser that supports @mediagroup on the audio and video elements.

  20. marieD said,

    on August 14th, 2012 at 7:21 am

    can you program this for me ?


  21. on October 14th, 2012 at 1:07 am

    [...] Apple-werknemer Ted O’Connor, en de externe consultant Silvia Pfieffer. Laatstgenoemde is gespecialiseerd in html-video en heeft eigen bedrijven Ginger Technologies en [...]

  22. Gianluca said,

    on January 22nd, 2013 at 3:50 am

    Hello Silvia.
    I’m not an html5 expert, i’m a video expert.
    I read a lot in these day to solve a problem. I need to start 4 different syncronized videos in a website, and then in some way , switch from one to another without lag.
    Is that possible with Html5 ?
    Thank you very much.

  23. silvia said,

    on January 22nd, 2013 at 11:08 am

    @Gianluca Not all browsers have implemented the @mediagroup attribute. And those that have may have bugs. Your best bet is probably to write a JavaScript polyfill for @mediagroup where you do frequent seeks on the videos to get them in sync.

  24. Gianluca said,

    on January 22nd, 2013 at 4:40 pm

    Thank you so much for the answer.
    Do you know where I can find someone to do this job for me ?
    Here in Italy I don’t know no one that can do this.
    Thanks again.

  25. BobW said,

    on February 21st, 2013 at 3:16 am

    You mention seeing in-band multitrack mp4 in the wild. Any idea how it was produced? I’d like to have a video with two alternate audio tracks that would be controlled as you describe here. QuickTime Pro (Mac) seems unable to produce multitrack in any container other than .mov. Premiere Pro CS6 seems unable to produce multitrack at all.

  26. silvia said,

    on February 21st, 2013 at 2:26 pm

    @BobW I’m not an expert on MP4 but if you search for mp4 muxing, there seem to be several tools. For Ogg files we have oggz-merge, for Matroska/WebM there is mk4merge.

  27. BobW said,

    on February 22nd, 2013 at 9:05 am

    @sylvia Thanks – looks like Subler will work for me.

  28. Kieran said,

    on April 7th, 2013 at 10:43 pm

    @Gianluca
    I think I may be able to help you with your project. We have finished a product that fits your requirements in Flash and we are hoping to move it over to HTML5 at the moment (wich is why I am researching here!). Anyway if you still need some help you can get me here – kieran.walkin@plexusoft.com.

    @silvia
    Really great article and thanks for your time. (sorry for my shameless networking!)

  29. aneeshanand said,

    on May 3rd, 2013 at 6:29 pm

    hi
    we are currently migrating an e-learning portal from flash to html5. We have content videos arranged as slides in sequence for one topic, and trying to achieve onevideo:multiple audio track facility. After these 2 years, do we have any browser supporting the features you explained here? If yes, my life is saved :)
    Anyway – this is a great article and very helpful to beginners like me.
    Thank you
    Aneesh Anand

  30. silvia said,

    on May 4th, 2013 at 10:13 pm

    @aneeshanand Unfortunately, browsers haven’t all implemented @mediagroup yet. However, you can use the code that’s been used for audio descriptions in this article: http://www.sitepoint.com/accessible-audio-descriptions-for-html5-video/


  31. on May 8th, 2013 at 7:31 pm

    Hi,

    I know the article is older, but any news on Chromes and Firefox or Safaris plans to integrate audioTracks object? IE10 does and it works fine!

    Thanks,
    Christian

  32. silvia said,

    on May 9th, 2013 at 3:02 pm

    @Christian no idea – file bugs!

Leave a Reply