WEBVTT 0.8 +2.133 Male describer: A MULTIMEDIA SLIDE PRESENTATION IS TITLED 2.933 +4.4 "HTML5 VIDEO ACCESSIBILITY AND THE WebVTT FILE FORMAT. 7.333 +3.367 MARCH 23, 2011. SILVIA PFEIFFER." 10.7 +2.9 IN AN AUDITORIUM, A WOMAN SPEAKS AT A PODIUM. 13.6 +2.533 Black: So welcome all. Thank you for coming. 16.133 +1.367 My name is Naomi Black, 17.5 +2 and I'm a member of the Accessibility Engineering Group 19.5 +1.166 here at Google. 20.666 +1.6 And today, I'm very pleased to invite 22.266 +2.134 our speaker, Dr. Silvia Pfeiffer. 24.4 +3.333 Silvia is a member and an invited expert 27.733 +3.6 on the W3C HTML Accessibility Task Force. 31.333 +1.867 And she's also the author of 33.2 +2.633 "The Definitive Guide to HTML5 Video." 35.833 +1.667 So Silvia's here to talk to us-- 37.5 +1.333 come on up. 38.833 +2.7 Silvia's here to talk to us today about WebVTT, 41.533 +1.967 which is one of the standards for timed text 43.5 +1.933 which is under consideration by the W3C. 45.433 +1.133 Thank you, Silvia. 46.566 +1.034 Pfeiffer: Thank you. 47.6 +1.566 Describer: SILVIA TAKES THE PODIUM. 49.166 +1.8 Pfeiffer: Thanks for inviting me to come and speak 50.966 +2.367 about this important topic today. 53.333 +2.3 We know there are a lot of discussions going on 55.633 +3.433 about formats for captions, 59.066 +2.434 and we want to standardize them for the web. 61.5 +2.066 But standardizing it for the web 63.566 +1.6 has a much larger impact these days 65.166 +1.667 than just on web browsers, 66.833 +1.2 just on the web itself. 68.033 +1.733 It goes into many different devices. 69.766 +2.6 So we're very interested and very keen 72.366 +3.5 to give a broad coverage of available technology, 75.866 +2.167 and this is what we're trying to do here today. 78.033 +6.567 So I'll be talking mostly about WebVTT, the file format. 84.6 +2.266 But I'll also be talking a little bit about 86.866 +4.334 how to plug that into the web browser into HTML 91.2 +1.666 so that in the future, 92.866 +3.467 we have a very simple way of displaying captions 96.333 +2.733 in web browsers on videos 99.066 +2.9 without having to do much more than authoring a file 101.966 +4.234 and giving a link to the web browser for that file. 106.2 +2.433 So it will be very simple for people in the future 108.633 +1.467 to create more captions. 110.1 +2.3 Describer: SILVIA CUES A NEW SLIDE TITLED 112.4 +2 "REQUIREMENTS OF A WEB TEXT FORMAT." 114.4 +2.666 HEADINGS BELOW READ "CAPTIONS OR SUBTITLES" 117.066 +3.067 "TEXT VIDEO DESCRIPTIONS" "NAVIGATION OR CHAPTERS" 120.133 +1.533 AND "METADATA." 121.666 +1.667 Pfeiffer: All right, let's dig right in. 123.333 +2.9 As we were looking at requirements 126.233 +1.933 of such a text format, 128.166 +2.7 a web text format for video, 130.866 +4.067 we looked at the different types of content 134.933 +3.2 that can be time-aligned with video. 138.133 +2.633 And captions and subtitles are the obvious ones, 140.766 +4.1 but text video descriptions are also an important use. 144.866 +2 These are for blind users 146.866 +2.634 and can be read out by screen readers 149.5 +3.633 in parallel to the playback of the video. 153.133 +4.133 This may well not be the most usable way 157.266 +1.6 of doing audio descriptions, 158.866 +2.067 but it is a much easier way to publish 160.933 +3.833 audio descriptions for blind users. 164.766 +1.7 And, in fact, for a lot of blind users, 166.466 +2.167 it may well be all they need, 168.633 +2.5 because they already have their screen readers set up, 171.133 +3.4 and it works really well for some people. 174.533 +1.367 Further to that, 175.9 +2.933 we're also talking about navigation or chapters, 178.833 +2.233 which is also very important 181.066 +3.7 for blind and, in fact, any user. 184.766 +2.267 If you want to go through a video quickly 187.033 +1.433 and find out what's in there, 188.466 +5.034 you want to jump to what we now know as chapter markers. 193.5 +1.6 We can call them navigation markers. 195.1 +4.2 This can be also covered with the same kind of format. 199.3 +3.066 And more generally, metadata. 202.366 +1.734 This is something that archives 204.1 +2 are particularly interested in, 206.1 +3.566 to attach metadata to sections in the video. 209.666 +3.334 It can also be done with such a time-aligned text format. 213 +3.8 So what we have discussed for browsers 216.8 +1.566 is a very simple format. 218.366 +3.6 It's called WebVTT, Video Text Tracks. 221.966 +2 WebVTT. 223.966 +3.9 Describer: A NEW SLIDE TITLED "WebVTT OR VIDEO TEXT TRACKS." 227.866 +3.834 A SAMPLE FILE IS HEADED "WEBVTT" WITH TWO ELEMENTS BELOW. 231.7 +2.666 EACH ELEMENT CONTAINS THREE LINES OF INFORMATION: 234.366 +1.267 CAPTION NUMBER, 235.633 +1.667 TIME CODE START AND TIME CODE END, 237.3 +1.333 AND CAPTION TEXT. 238.633 +2.9 BELOW THE TEXT, A STILL PHOTO OF A MAN SEATED IN A WORKSHOP 241.533 +2.667 IS CAPTIONED "I HEARD ABOUT THIS ARDUINO PROJECT, 244.2 +1.833 AND I SAW IT ONLINE." 246.033 +2.7 Pfeiffer: This is one of the very simple files 248.733 +1.3 that we can think about. 250.033 +2.9 Just a marker at the beginning of the file 252.933 +2.533 that identifies the file format. 255.466 +4.9 The captions or subtitles-- let's call them cues-- 260.366 +3.134 then have an individual identifier. 263.5 +2.933 In this case, it's the number one and number two. 266.433 +2.233 Could be any string, however. 268.666 +2.767 And then we've got start times and end times 271.433 +1.367 on each one of these cues, 272.8 +2.5 and a piece of text in there. 275.3 +2.266 It turns out in-- 277.566 +3.4 as we all know how captions are displayed on screen 280.966 +2.434 in something like this 283.4 +3.366 if it's automatically created by the browser. 286.766 +2.367 Describer: THE CAPTION DISPLAYS AT THE BOTTOM OF THE SCREEN 289.133 +3.933 IN WHITE LETTERING OVER A SHADED GRAY BOX. 293.066 +4.7 Pfeiffer: That was the very simplest way of doing subtitles. 297.766 +2.167 Now, we want to do more than just the simple captions. 299.933 +3.5 In particular, if we want to achieve 303.433 +1.2 all the functionality 304.633 +3.167 of, for example, the CEA-608 captions, 307.8 +2.733 then we need to do a bit more than just text. 310.533 +2.1 We also want to have some formatting in there. 312.633 +3.233 Describer: ANOTHER SLIDE, TITLED "WebVTT FORMATTED SUBTITLES," 315.866 +2.934 DISPLAYS THE SAME STILL IMAGE, WITH THE CAPTION IN GERMAN. 318.8 +3.166 THE WORD "ARDUINO" IS WRITTEN IN ALL CAPS AND COLORED RED. 321.966 +5.734 Pfeiffer: Here is an example on how to do bold. 327.7 +1.2 I'll point to it. 328.9 +4.233 There's a bold tag in here, so that will be bold text. 333.133 +2.233 Here is some italic text. 335.366 +1.134 Describer: IN THE SAMPLE FILE, 336.5 +2 THE SECOND CAPTION CONTAINS THE WORD "WOW!" 338.5 +2.233 SURROUNDED BY THE HTML TAG FOR BOLD, 340.733 +2.3 THE LETTER "b" SURROUNDED BY ANGLED BRACKETS 343.033 +1.033 BEFORE THE WORD "WOW," 344.066 +3.034 AND "/b" IN ANGLE BRACKETS AFTER THE WORD "WOW." 347.1 +1.2 ANOTHER PHRASE IS ENCLOSED 348.3 +2.866 IN THE TAG FOR ITALICS, "i" AND "/i". 351.166 +2.867 Pfeiffer: And up here we've got a general way 354.033 +6.067 to associate style or a class to a piece of text 360.1 +1.366 and give it a meaning. 361.466 +1.267 In this situation, 362.733 +3.7 we've turned a piece of text into red text 366.433 +1.5 and capitalized it. 367.933 +1.167 Describer: THE WORD "ARDUINO" 369.1 +2.966 IS PRECEDED BY THE TAG "c.red.caps" 372.066 +1.734 WITH A "/c" TO CLOSE. 373.8 +2.866 IN THE CAPTION, IT APPEARS RED AND CAPITALIZED. 376.666 +3.334 Pfeiffer: Of course, if we're using this format 380 +1.466 also for subtitles, 381.466 +3.6 we need to be careful to cover internationalization issues. 385.066 +2.534 WebVTT is very clear here. 387.6 +2.966 It requires UTF-8 character encoding. 390.566 +3.667 Describer: ANOTHER SLIDE, WebVTT FOR INTERNATIONALIZATION, 394.233 +2.233 ABBREVIATED I18N. 396.466 +1.5 A SAMPLE CAPTION FILE CONTAINS 397.966 +2.167 TIME CODES FOR CAPTION START AND STOP 400.133 +2.567 AND CAPTION TEXT IN ASIAN LANGUAGE CHARACTERS. 402.7 +2.9 HEADINGS BELOW READ "UTF-8 CHARACTER ENCODING" 405.6 +1.2 "RUBY TEXT" 406.8 +3.166 AND "VERTICAL OR HORIZONTAL RENDERING AND ALIGNMENT." 409.966 +2.534 Pfeiffer: It has a ruby tag, 412.5 +4.366 which supports Asian languages in particular. 416.866 +2.3 Describer: ANNOTATED ASIAN LANGUAGE CHARACTERS ARE SHOWN 419.166 +2 BETWEEN AN OPEN AND CLOSE RUBY TAG, 421.166 +2.467 WITH THE ANNOTATIONS BETWEEN "rt" TAGS 423.633 +2.733 AND THE ANNOTATED TEXT BETWEEN "rb" TAGS. 426.366 +1.667 TO THE RIGHT OF THE START AND STOP TIMES, 428.033 +3.133 CODES READ "D:vertical" AND "A:start". 431.166 +2.534 Pfeiffer: It also does vertical and horizontal 433.7 +2.8 rendering of text. 436.5 +3.866 Again, possibly one of the most important ones 440.366 +1.434 are Asian languages, 441.8 +1.166 and I think there are a few other languages 442.966 +3.3 that are also rendered vertically. 446.266 +3 And we need to make sure we get the alignment right. 449.266 +4.934 Sometimes text is read from the right to the left, 454.2 +2.233 so therefore it needs to be aligned on the right 456.433 +1.467 rather than on the left. 457.9 +1.333 Describer: ANOTHER SLIDE IS TITLED 459.233 +2.367 "WebVTT CAPTION POSITIONING." 461.6 +4.133 Pfeiffer: Now, positioning is another requirement 465.733 +3.133 and, again, something that traditional captions, 468.866 +2.534 TV captions, are able to do. 471.4 +5.866 It's possible to position cues anywhere in WebVTT. 477.266 +3 There are basically three important ways 480.266 +1.434 to position text. 481.7 +2.7 Describer: A STILL SHOT FROM "ANNIE HALL" FINDS DIANE KEATON 484.4 +2.1 WEARING A DRESS SHIRT, NECKTIE, AND VEST. 486.5 +2.3 CAPTIONS APPEAR IN THE CENTER OF THE VIDEO PANE 488.8 +2.2 AT BOTH THE TOP AND BOTTOM OF THE SCREEN. 491 +2.666 THE BOTTOM CAPTION TEXT WRAPS TO A SECOND LINE. 493.666 +2.734 AFTER THE LINE BREAK, THE CAPTIONS ARE CENTER ALIGNED. 496.4 +1.766 THE BOTTOM CAPTION READS 498.166 +1.7 "YEAH, I SORT OF DABBLE AROUND, YOU KNOW." 499.866 +3.9 THE TOP CAPTION: "I DABBLE? LISTEN TO ME. WHAT A JERK." 503.766 +3.034 HEADINGS NEXT TO THE STILL READ "L, LINE POSITION, 506.8 +3.066 T, TEXT POSITION A, ALIGNMENT." 509.866 +2 Pfeiffer: There are line positions. 511.866 +6.4 So the concept of display lines exists in WebVTT. 518.266 +2.334 So the line position allows people 520.6 +3.066 to directly address a specific line. 523.666 +6.3 It can be done with a line number or a percentage. 529.966 +3.667 Then we have the text position. 533.633 +3.633 This means we're placing the text either on the left, 537.266 +4.1 in the middle, or on the right. 541.366 +2.167 No, hold on. That's the alignment, sorry. 543.533 +2.267 Alignment is left, middle, and right. 545.8 +4.3 And the text position is... 550.1 +2.133 so when we have text like this, 552.233 +4.3 it's in the middle, and it's centered. 556.533 +1.567 So we can also do a centering, 558.1 +5.5 and we can do a left alignment 563.6 +1.3 and a right alignment. 564.9 +3.4 But we can also move that whole text elsewhere. 568.3 +2.366 So the text position is where we move the text 570.666 +2.167 and the alignment is where we align it at. 572.833 +1.9 Describer: A SAMPLE FILE FEATURES POSITION CODES 574.733 +2.167 TO THE RIGHT OF THE TIME CODE MARKERS. 576.9 +2.2 FOR THE CAPTION AT THE TOP, THE CODES READ 579.1 +2.466 "A:middle" AND "L:10%". 581.566 +4.034 FOR THE BOTTOM CAPTION, "A:middle" AND "L:60%". 585.6 +3.266 A NEW SLIDE IS TITLED "WebVTT SPEAKER SEMANTICS." 588.866 +2.734 TWO FRAMES OF VIDEO FROM THE ANIMATED PROGRAM "ARTHUR" 591.6 +2.3 CAPTURE A MONKEY FROWNING AT A BUNNY. 593.9 +2.133 THE FIRST FRAME IS CAPTIONED "AHEM..." 596.033 +2.733 WITH THE CAPTION LEFT ALIGNED TO APPEAR BELOW THE MONKEY. 598.766 +2.534 THE SECOND FRAME IS CAPTIONED "WHAT'S THE MATTER?" 601.3 +2.166 AND IS RIGHT ALIGNED TO APPEAR BELOW THE BUNNY. 603.466 +1.934 Pfeiffer: We also have speaker semantics 605.4 +1.6 included into WebVTT, 607 +3 which is interesting because it allows us 610 +3.133 to put some semantic information into our markup. 613.133 +1.067 Describer: IN THE SAMPLE FILE, 614.2 +2.566 TAGS APPEAR BEFORE THE TEXT OF EACH CAPTION, 616.766 +2.334 "v.Beatrix" IN THE FIRST CAPTION, 619.1 +1.833 AND "v.Arthur" IN THE SECOND. 620.933 +1.533 THE TAGS ARE NOT CLOSED. 622.466 +2.834 Pfeiffer: Here, for example, we have two people speaking. 625.3 +4.166 We know their position on the left and on the right. 629.466 +4.667 And the speaker markup 634.133 +2.033 can tell us where we want to position it 636.166 +3.167 and can also, for example, help us always use 639.333 +3.067 the same styling for the same speaker. 642.4 +2.4 So, for example, we want to use the same font, 644.8 +1.433 the same font color, 646.233 +3.267 maybe a specific outline or something for a speaker. 649.5 +1.266 We can define that 650.766 +3.367 and then apply that always for that speaker. 654.133 +3.5 Describer: ANOTHER SLIDE, WebVTT TEXT DESCRIPTIONS. 657.633 +3.2 Pfeiffer: Now, so much for captions. 660.833 +2.4 Now we move on to a little example 663.233 +1.933 on text descriptions. 665.166 +4.734 Here is one that I've used previously, 669.9 +5.733 and we've got that as an example on the site. 675.633 +2.367 Describer: A SAMPLE FILE CONTAINS THREE DESCRIPTIONS, 678 +1.933 EACH WITH THREE LINES OF INFORMATION: 679.933 +2.6 DESCRIPTION NUMBER, TIME CODE START AND STOP, 682.533 +2.4 AND THE TEXT TO BE READ ALOUD BY A SCREEN READER. 684.933 +2.3 THE SLIDE GIVES THREE SAMPLE DESCRIPTIONS. 687.233 +3.433 SAMPLE ONE, "THE ORANGE OPEN MOVIE PROJECT PRESENTS." 690.666 +3.6 SAMPLE TWO, "INTRODUCTORY TITLES ARE SHOWING ON THE BACKGROUND 694.266 +1.767 "OF A WATER POOL WITH FISHES SWIMMING 696.033 +3 AND MECHANICAL OBJECTS LYING ON A STONE FLOOR." 699.033 +3.467 SAMPLE THREE, "TITLE: ELEPHANTS DREAM." 702.5 +1.3 Pfeiffer: I'm not gonna go there; 703.8 +1.666 I just want to mention it, 705.466 +2.4 because we want to focus on captions today. 707.866 +4.634 But what happens here is, we've got text that's aligned 712.5 +3.2 with a start and end time as well. 715.7 +4.033 And for a typical word rate of a screen reader, 719.733 +3.433 it will fit into that space, 723.166 +2.767 and it will be read back by the screen reader 725.933 +1.4 during that time. 727.333 +3.367 Describer: A NEW SLIDE, WebVTT CHAPTERS FOR NAVIGATION. 730.7 +3.733 Pfeiffer: And here is the navigation example. 734.433 +4.633 As I mentioned, WebVTT can also be used for navigation. 739.066 +1.967 Here we have three chapters, 741.033 +4.133 and we can directly jump from chapter to chapter. 745.166 +4.8 There needs to be extra controls on videos to support this, 749.966 +1.734 but this is something we're working towards. 751.7 +2.333 Describer: A SAMPLE FILE CONTAINS THREE CHAPTERS, 754.033 +1.9 EACH WITH THREE LINES OF INFORMATION: 755.933 +4.1 CHAPTER NUMBER, TIME CODE START AND STOP, AND CHAPTER TITLE. 760.033 +3.433 TEXT BELOW THE SAMPLE READS "NAVIGATION MARKERS." 763.466 +3.7 A NEW SLIDE IS TITLED "TEXT TRACKS IN HTML5." 767.166 +1.9 Pfeiffer: Now, of course, as I'm saying, 769.066 +2.2 controls and input into web pages 771.266 +1.7 and automatic rendering, 772.966 +2.034 we need to know how we're going to do that. 775 +2.6 And there is markup in HTML5 777.6 +5.166 for associating captions and formats like this 782.766 +2.1 with videos. 784.866 +5.434 In this example, I've got all of the VTT files 790.3 +1.6 that we've used before. 791.9 +1.866 I've included them here. 793.766 +3.9 And what we're using for it is called a track element, 797.666 +1.734 and this track element is included 799.4 +4.066 underneath the video element in the HTML5 markup. 803.466 +3.5 It links through the VTT file. 806.966 +2.834 And there is some description possible 809.8 +3.766 for the type of file it is, so we have a label. 813.566 +2.767 In this case, it's an English caption. 816.333 +2.4 We have a kind, which gives us a means 818.733 +7.2 to group all the tracks of the same type together. 825.933 +1.867 Describer: A CODE SAMPLE LISTS A SOURCE VIDEO 827.8 +2.3 AND A SERIES OF TRACKS WITH THE FOLLOWING LABELS: 830.1 +3.566 ENGLISH CAPTIONS, GERMAN SUBTITLES, FRENCH SUBTITLES, 833.666 +2.7 ENGLISH DESCRIPTIONS, AND CHAPTERS. 836.366 +4.2 EACH TRACK HAS ATTRIBUTES "KIND" "SRC LANG" AND "SRC", 840.566 +3.7 WITH THE "SRC" CONTAINING A URL TO THE WebVTT FILE. 844.266 +1.534 THE SAMPLE CONTAINS THE KINDS 845.8 +3.233 CAPTIONS, SUBTITLES, DESCRIPTIONS, AND CHAPTERS. 849.033 +2.4 Pfeiffer: And we identify the language. 851.433 +3.133 Because, of course, when we have user settings in browsers, 854.566 +4.934 we want to automatically make certain tracks 859.5 +1.333 available to the user 860.833 +1.7 if the user has, for example, 862.533 +1.767 said that they always want captions 864.3 +3.8 or they always want subtitles in their language being shown. 868.1 +3.466 So the browser can look through this markup 871.566 +4.1 and identify which ones it has to turn on by default. 875.666 +2.934 Describer: AFTER TEXT THAT READS "SRCLANG=" 878.6 +2 THE LETTERS "EN" ARE HIGHLIGHTED, 880.6 +2.566 INDICATING ENGLISH LANGUAGE. 883.166 +4.734 Pfeiffer: Now, in this case, I've used only WebVTT files. 887.9 +6.533 The track layout, the way that we've defined track in HTML5, 894.433 +1.767 is actually generic. 896.2 +3.2 It can be used for other types of files as well, 899.4 +5.433 TTML or SRT or any other formats 904.833 +2.3 that will be implemented. 907.133 +1.4 Describer: THE WORD "TRACK" IS HIGHLIGHTED 908.533 +3.633 IN A TAG READING: TRACK LABEL="ENGLISH CAPTIONS". 912.166 +2.7 Pfeiffer: But the generic way 914.866 +4.3 that track works is in this way. 919.166 +1.467 Describer: A NEW SLIDE IS TITLED 920.633 +2.367 "USING CSS FOR RICHER STYLING." 923 +2.2 A SAMPLE FILE FEATURES THE MAN IN HIS WORKSHOP 925.2 +2.6 CAPTIONED "I HEARD ABOUT THIS ARDUINO PROJECT 927.8 +1.566 AND SAW IT ONLINE." 929.366 +2.367 THE WORD ARDUINO IS RED AND ALL CAPS. 931.733 +3.033 THE REST OF THE CAPTION IS BLACK AND MIXED TYPECASE. 934.766 +2.7 Pfeiffer: Now, once we've got it in the browser, 937.466 +2.7 we can actually support more than what is directly possible 940.166 +2.734 as markup in the WebVTT file, 942.9 +2.8 because now we've got the text in the browser, 945.7 +1.033 and we can make use 946.733 +2.133 of all the functionality of the browser, 948.866 +2.834 which has styling 951.7 +2.4 and the concurrent style sheet functionality 954.1 +1.266 available to it. 955.366 +2.734 So this kind of styling is also available 958.1 +5.966 if used in a browser, to these cues. 964.066 +5.067 And the way in which this is being done 969.133 +3 is that there's a pseudo-element in CSS 972.133 +2.4 called ::cue. 974.533 +1.9 And with that pseudo-element, 976.433 +2.767 you can address, for example, 979.2 +4.133 classes in the cue markup. 983.333 +2.933 Describer: IN THE SAMPLE FILE, A TAG POINTING TO ".arduino" 986.266 +1.734 SURROUNDS THE WORD ARDUINO. 988 +3.433 Pfeiffer: And you can override the formatting 991.433 +2.867 that by default would be given. 994.3 +1.333 You can, for example-- 995.633 +2.533 well, in this case, it's been turned red, 998.166 +3.034 uppercase, a different font family, 1001.2 +1.533 and a lighter weight. 1002.733 +2.567 Describer: IN A SEPARATE BOX, UNDER THE HEADING CSS, 1005.3 +3.1 THE ::CUE PSEUDO-ELEMENT ".arduino" 1008.4 +2.1 CONTAINS CODE SELECTING COLOR AS RED, 1010.5 +1.966 TEXT-TRANSFORM AS UPPERCASE, 1012.466 +1.934 FONT-FAMILY AS HELVETICA NEUE, 1014.4 +2.1 AND FONT-WEIGHT AS LIGHTER. 1016.5 +4.066 A NEW SLIDE IS TITLED "WebVTT DEFAULT RENDERING." 1020.566 +1.3 TWO HEADINGS READ 1021.866 +2.9 "POP-ON: BOTTOM THIRD OF VIDEO VIEWPORT, CENTERED" 1024.766 +3.4 AND "ROLL-UP: OVERLAPPING CAPTIONS ARE ADDED UNDERNEATH." 1028.166 +1.1 Pfeiffer: Now, we've spoken a lot. 1029.266 +1.934 We want to see a little bit of a demo here, 1031.2 +2.433 and I've made a little bit of a demo 1033.633 +3.733 which shows that we can do more than what's typically 1037.366 +1.567 being used for captions right now. 1038.933 +3.933 Most captions that are being used are pop-on captions, 1042.866 +2.6 which are captions that don't overlap in time. 1045.466 +1.434 There's one piece of caption, 1046.9 +1.7 one cue shown; 1048.6 +1.966 it disappears, and the next cue is brought up. 1050.566 +1.3 That's pop-on. 1051.866 +4.467 And that is the default way of rendering it. 1056.333 +3.167 But we may have a very different style 1059.5 +2.866 of providing captions as well, 1062.366 +2.1 which has traditionally been used 1064.466 +2.934 mostly in live captioning. 1067.4 +1.7 It's called roll-up. 1069.1 +3.166 So the cues will actually be added at the bottom 1072.266 +2.634 and roll up as the-- 1074.9 +1.766 and the old ones will disappear. 1076.666 +1.6 So I've made a little example 1078.266 +2.167 that shows how that can be done as well. 1080.433 +1.2 Describer: A VIDEO PLAYS. 1081.633 +1.6 AS THE MAN IN HIS SHOP SPEAKS, 1083.233 +1.733 TWO LINES OF LEFT-ALIGNED CAPTIONS 1084.966 +2.334 APPEAR TO ROLL UP FROM THE BOTTOM OF THE SCREEN. 1087.3 +1.333 WITH EACH NEW CAPTION, 1088.633 +1.733 THE BOTTOM CAPTION MOVES UP TO THE TOP LINE, 1090.366 +2.3 AND THE TOP CAPTION DISAPPEARS. 1092.666 +1.367 Pfeiffer: Let's hope this works. 1094.033 +1.767 Man: I heard about this Arduino Project, 1095.8 +2.5 and I saw it online, and I said, "Wow, 1098.3 +1.933 "a lot of people are starting to talk about this. 1100.233 +1.433 I should check it out." 1101.666 +2.667 Second Man: 'Cause we wanted to make a tool 1104.333 +2.867 for our student that was more modern 1107.2 +2.7 than what was available on the market at the moment. 1111.7 +1.433 Third Man: For me, it was a case 1113.133 +2.467 that this is a tool that I could see using myself, 1115.6 +1.366 and therefore I could believe 1116.966 +2.4 in actually helping to get it out to a wider world. 1119.366 +2.267 Fourth Man: [speaking foreign language] 1126.5 +2.066 Pfeiffer: So you could see 1128.566 +3.134 that the captions were being pushed up 1131.7 +3 as they were being displayed. 1134.7 +3.8 This is a very simple way 1138.5 +7.833 of doing this kind of roll-up. 1146.333 +1.9 So as we're moving on, 1148.233 +2.1 the next caption gets added to the bottom. 1150.333 +3.467 This can be improved. 1153.8 +1.566 This is just a very crude demo. 1155.366 +1.2 But this can be improved 1156.566 +2.234 with a bit more CSS in the browser. 1158.8 +5 We can, for example, transition the text more slowly, 1163.8 +2 and then it would be more readable, 1165.8 +2.366 rather than it jumping there directly. 1168.166 +3.134 There's a whole swag of CSS functionality 1171.3 +1.466 available to us in the browser 1172.766 +1.967 to make this look very nice. 1174.733 +5.967 And the functionality is there and possible to be used. 1180.7 +2.966 Describer: A NEW SLIDE, "EXOTIC: PAINT-ON CAPTIONS." 1183.666 +4.9 Pfeiffer: I've mentioned paint-on and roll-up captions. 1188.566 +3.3 I've mentioned pop-on and roll-up captions. 1191.866 +2.9 I want to briefly also mention paint-on captions, 1194.766 +2.9 even though that's a bit more of an exotic use case. 1197.666 +4.7 But it's possible to be used in CEA-608 captions, 1202.366 +3.134 so we need to make sure that it's also possible 1205.5 +2.433 to be represented in WebVTT. 1207.933 +2.933 And what we've introduced for this kind of application 1210.866 +3.234 is cue timestamps. 1214.1 +4.366 These cue timestamps are basically just a timestamp 1218.466 +4.067 that is being included into the text 1222.533 +3.467 and says when the text that comes afterwards 1226 +1.666 will be activated. 1227.666 +2.134 Describer: A SAMPLE FILE CONTAINS CAPTION NUMBER, 1229.8 +1.966 CAPTION START AND STOP TIME CODES, 1231.766 +3.667 AND INDIVIDUAL CAPTIONED WORDS SEPARATED BY TIMESTAMP CODES. 1235.433 +3.433 THE TIMESTAMPS THEMSELVES ARE ENCLOSED IN ANGLE BRACKETS. 1238.866 +2.534 BELOW, HEADINGS READ "CUE TIMESTAMPS" 1241.4 +1.966 "CHARACTER RESOLUTION POSSIBLE" 1243.366 +2.167 "STYLING THROUGH CSS PSEUDO-SELECTORS" 1245.533 +4.2 AND ":PAST, :FUTURE (E.G. KARAOKE)." 1249.733 +1.533 Pfeiffer: Here I've done it-- 1251.266 +1.367 [coughs] pardon me-- 1252.633 +1.567 at the word level. 1254.2 +4.633 I've put cue timestamps in for every word, 1258.833 +2.433 so every word would come up one after the other 1261.266 +1.7 on the screen. 1262.966 +2.334 However, the resolution is arbitrary. 1265.3 +4.3 We could do that on every character if necessary. 1269.6 +1.766 Interestingly enough, 1271.366 +4.6 that can also be used for styling through CSS. 1275.966 +7.034 There are the past and future pseudo-selectors. 1283 +3.266 And these selectors allow us, for example, 1286.266 +4.434 to do something like paint the old text in yellow, 1290.7 +2 the new text in white. 1292.7 +2.833 Describer: SILVIA HIGHLIGHTS THE ":PAST" PSEUDO-SELECTOR, 1295.533 +2.833 THEN POINTS OUT A CODE SAMPLE WITH THE HEADING CSS. 1298.366 +3 IT CONTAINS TWO EXPRESSIONS MAKING PAST CUE COLORS YELLOW 1301.366 +2.967 AND ADDING A TEXT SHADOW TO FUTURE CUES. 1304.333 +1.533 Pfeiffer: With a text shadow, 1305.866 +2.434 and as it goes over, everything goes yellow. 1308.3 +2.066 And we know this kind of application, obviously, 1310.366 +1.4 from karaoke, 1311.766 +5.4 which bridges into these applications as well, 1317.166 +3.8 into more modern time-aligned text applications. 1320.966 +2.767 We can cover all of these use cases 1323.733 +3.733 with the same approach. 1327.466 +4 So that brings me to the end of the presentation. 1331.466 +1.3 Describer: ANOTHER SLIDE IS TITLED 1332.766 +3.1 "WebVTT, A BRIDGE BETWEEN BROADCAST AND THE WEB." 1335.866 +3.834 POINTS BELOW READ: "FULL FEATURES OF CEA-608 CAPTIONS" 1339.7 +1.466 "SIMPLICITY OF EDITING" 1341.166 +1.667 "ABILITY TO APPLY WEB STYLING" 1342.833 +2.667 AND "OPEN AND FREELY AVAILABLE." 1345.5 +4.333 Pfeiffer: We regard WebVTT as a bridge between broadcast 1349.833 +2.133 and the web of the future. 1351.966 +5.8 We can support all of the CEA-608 captions, 1357.766 +1.9 all of the features, 1359.666 +3.4 possibly also some of the 708 features-- 1363.066 +4.367 I think most of them; I haven't analyzed in detail. 1367.433 +3.367 But most of the 708 features will be supported as well. 1370.8 +2.333 It's a simplicity of editing, 1373.133 +2.733 which we like about the WebVTT format. 1375.866 +2.1 It's readable. You can read it here on screen. 1377.966 +5.234 There's not too much busyness as you're looking at it. 1383.2 +2.166 And that means also that it's easy to edit 1385.366 +1.534 and to create. 1386.9 +3.033 We have the ability to apply web styling 1389.933 +1.8 through the track mechanism 1391.733 +3.333 that has been included into HTML5. 1395.066 +3.634 And this is an open and freely available format. 1398.7 +1.966 If you're looking for references, 1400.666 +2.634 I've put the references on this last slide 1403.3 +1.133 to all the specs. 1404.433 +1.3 They're available for free. 1405.733 +1.833 Thank you very much. 1407.566 +2.2 Describer: NAOMI RETURNS TO THE PODIUM. 1409.766 +3.034 [applause] 1412.8 +1.366 Black: So thank you, Silvia. 1414.166 +3.534 We have a mic up if people want to ask any questions. 1417.7 +1.566 Maybe you could introduce yourself briefly. 1419.266 +1.234 We're recording this, 1420.5 +1.266 and we're gonna be posting it to YouTube, 1421.766 +1.567 so hopefully you won't be shy. 1423.333 +1.5 But please, if you have any questions for Silvia 1424.833 +4.1 about WebVTT, please step up to the mic. 1428.933 +1.833 Pfeiffer: I was going very quickly, 1430.766 +1.5 so if somebody wants to go back 1432.266 +2.8 and explore any of the features in more detail, 1435.066 +2.767 this is probably the opportunity. 1437.833 +1.8 Steinberg: Hi, I'm Daniel Steinberg at Google. 1439.633 +1.933 A couple questions. 1441.566 +3.7 When you had the line number specification, 1445.266 +4.734 how does that apply when you have vertical text? 1450 +2.7 Pfeiffer: Let me just find that. 1452.7 +1.033 Here, this one? 1453.733 +0.933 Steinberg: Yeah. 1454.666 +2.367 Describer: SILVIA CUES THE "CAPTION POSITION" SLIDE 1457.033 +2.233 THAT HIGHLIGHTS LINE POSITION, TEXT POSITION, 1459.266 +1.667 AND ALIGNMENT FEATURES. 1460.933 +5.167 Pfeiffer: So the line numbers are basically done in-- 1466.1 +1.066 for horizontal text, 1467.166 +2.1 obviously from the top to the bottom. 1469.266 +3.534 And for vertical text, they are turned around. 1472.8 +2.666 So they apply in the same way. 1475.466 +5.5 Steinberg: Okay, and you have a lot of the underpinnings 1480.966 +3.234 for interactive text, but not the actual-- 1484.2 +1.633 I didn't see anything actually there. 1485.833 +3.4 So the ability to say this particular tag is live, 1489.233 +2.833 and clicking on it might give a link or something like that. 1492.066 +2.8 Have you considered interactivity? 1494.866 +1.7 Pfeiffer: Interactivity-- 1496.566 +3.934 you particularly talking about hyperlinks in this case? 1500.5 +1.2 Steinberg: In this case, yes. 1501.7 +1 Pfeiffer: Yes, so hyperlinks 1502.7 +3.033 is something of a bit of a controversial issue. 1505.733 +1.733 We've discussed this. 1507.466 +3.4 It is obviously something that can easily be added. 1510.866 +1.9 We've got the markup in HTML5. 1512.766 +5.667 We could easily put a "a" tag in there and a hyperlink. 1518.433 +2.933 It's not something that's currently part 1521.366 +5.134 of the specification simply because people don't believe 1526.5 +1.566 that it's a very good experience. 1528.066 +2.4 When you're watching captions, they stay on the screen 1530.466 +1.534 only for a very short amount of time. 1532 +2.5 By the time you've decided that you want to follow a link, 1534.5 +1.266 it's already gone. 1535.766 +1.767 That's the reason. 1537.533 +3.567 So I'm obviously not fully subscribed to that reason. 1541.1 +3.2 I would actually like to have hyperlinks in it as well. 1544.3 +4.2 What I look at in this format is that it's easy to extend it, 1548.5 +2.466 and if somebody was to support it, 1550.966 +3.967 then it is not a problem to put that in as well. 1554.933 +2.233 Steinberg: Yeah, something to consider with the hyperlinks 1557.166 +2.2 is that, because you have the ability 1559.366 +3.267 to author different kinds of files, 1562.633 +2.467 you could have a description that had a longer time 1565.1 +1.333 and had a longer duration. 1566.433 +1.767 It wouldn't have to disappear 1568.2 +2.533 when a caption line disappeared. 1570.733 +2.3 Pfeiffer: In fact, I should actually add something. 1573.033 +4.767 We've got, as I mentioned, we've got kinds of text 1577.8 +3.8 that we're expecting in a WebVTT file. 1581.6 +2.266 I've talked about captions and subtitles, 1583.866 +1.867 I've talked about text descriptions, 1585.733 +1.4 and I've talked about navigation. 1587.133 +1.733 I've only mentioned metadata, 1588.866 +2.2 but metadata actually solves that problem. 1591.066 +2.767 Metadata means that 1593.833 +2.667 you're allowed to put anything into a cue: 1596.5 +1.633 any markup you want, 1598.133 +1.5 any non-markup you want, 1599.633 +1.767 any text, anything at all. 1601.4 +3.1 It just means that the browser can't do anything with it. 1604.5 +3.6 It decides--it sees that it's of the kind "metadata" 1608.1 +1.5 and goes, "I'm being hands-off. 1609.6 +2.866 "I'm just gonna hand it on to the JavaScript, 1612.466 +2.4 and the JavaScript can do with it whatever it likes." 1614.866 +2.334 So this would be one way to have interactivity in it. 1617.2 +2.7 You could just grab it through JavaScript 1619.9 +4.566 and then put it into a div on your page, 1624.466 +3.334 and then there would be a hyperlink. 1627.8 +2.4 And anything else that you can come up with 1630.2 +3.966 that's time-aligned would work in a similar way as well. 1634.166 +1.034 Steinberg: And one last question. 1635.2 +1.833 You showed the different formats-- 1637.033 +2.5 captions and descriptions and whatnot-- 1639.533 +3.4 as essentially the same VTT format, 1642.933 +4 distinguished only by the track tag. 1646.933 +4.367 And I wonder if there's enough semantic difference 1651.3 +3.866 that you'd want to be able to distinguish it in another way. 1655.166 +1.334 Like, for instance, would you build a-- 1656.5 +1.2 could you build a file 1657.7 +2.933 that had captions and descriptions in it? 1660.633 +6.067 Or might you want to have some identifier in the file say, 1666.7 +1.533 "This is a description file," 1668.233 +3.733 rather than count on the track tag? 1671.966 +4.234 Pfeiffer: Mixing content in one track 1676.2 +2.566 is of course possible. 1678.766 +4.134 Like, you could concatenate, for example, 1682.9 +6.133 a caption file with an audio description file. 1689.033 +3.033 Then you would basically have two tracks available 1692.066 +1.5 through one file. 1693.566 +4.367 It's not a very easy way to deal with, 1697.933 +3.633 and it would require a lot of additional implementation. 1701.566 +1.467 So, for example, 1703.033 +4.533 if that concatenated case was to be handled, 1707.566 +1.867 the browsers would need to find out 1709.433 +2.167 where the second file starts and so on. 1711.6 +3.966 We actually like to keep the semantics separate. 1715.566 +4.367 And HTML markup is built on keeping semantics separate. 1719.933 +2.7 So this is why we introduced the "kind" attribute. 1722.633 +2.367 And so therefore, in one file, 1725 +2.8 you will only find captions of one type. 1730.333 +2.533 Thank you. 1732.866 +1.1 Foliot: Hi, Silvia. 1733.966 +2.067 John Foliot from Stanford University. 1736.033 +3.2 When you gave the code example of the CSS, 1739.233 +2.067 it's not clear where the CSS actually lives. 1741.3 +2.8 Is it embedded in the WebVTT file? 1744.1 +2.633 Can you have an external file and link it? 1746.733 +3.467 Can you just maybe get into that a little bit further? 1750.2 +2.366 Pfeiffer: Yeah, let me just find it. Sorry. 1752.566 +5.2 So I've deliberately just put that there as a snippet, 1757.766 +2.2 because it actually doesn't matter where it lives. 1759.966 +1.167 Describer: SYLVIA PULLS UP 1761.133 +2.233 THE "USING CSS FOR RICHER STYLING" PAGE, 1763.366 +2.3 WITH THE ::CSS PSEUDO-ELEMENT 1765.666 +3.1 IN A SEPARATE TEXT BOX FROM THE WebVTT FILE. 1768.766 +4.034 Pfeiffer: At the moment, because we're doing the CSS 1772.8 +2.5 through the HTML page, 1775.3 +3.533 that includes this file up there... 1778.833 +3.967 that HTML page could either have this CSS piece 1782.8 +3.5 directly in the HTML page and address it-- 1786.3 +1.733 so an in-band CSS. 1788.033 +2.467 Or it could be in an external CSS file 1790.5 +3.8 and be pulled into the HTML page 1794.3 +2.7 together with the WebVTT file. 1797 +1.966 Foliot: So it always sits outside of the VTT file. 1798.966 +1.434 Pfeiffer: It can. Yeah, well... 1800.4 +2.6 So we're currently under discussion 1803 +1.233 whether this is a functionality 1804.233 +2.733 that we'd want to add to WebVTT as well. 1806.966 +4 So whether you want a WebVTT file 1810.966 +2.567 that links to a style sheet. 1813.533 +4.2 We're careful about that because we-- 1817.733 +4 this comes from a very web point of view. 1821.733 +2.767 We don't really want to pollute the WebVTT file 1824.5 +2.366 with this web functionality, 1826.866 +3.234 because there are applications outside the web browser 1830.1 +2.2 that do not want to have to implement, 1832.3 +2 for example, all of CSS style sheets 1834.3 +2.1 in order to display captions properly. 1836.4 +1.566 So it probably makes more sense 1837.966 +1.9 to have this as a separate file. 1839.866 +2.934 And if people do want to have this additional functionality, 1842.8 +2.166 they can use the WebVTT file 1844.966 +2.134 and the CSS file together 1847.1 +5.4 and parse them, and use them in their style sheet engine 1852.5 +5 to come up with the proper display. 1857.5 +2.233 Black: I actually, I have a practical example 1859.733 +1.533 of why you would want the CSS. 1861.266 +1.034 Is this on? Can you hear me? 1862.3 +1.033 Pfeiffer: Yes. 1863.333 +1.367 Black: I have a practical example 1864.7 +2.2 of why you would want the CSS to be outside the VTT. 1866.9 +2.6 I work currently with people who are producing caption files 1869.5 +1.566 for the UK market 1871.066 +2.867 and who are then redistributing that same video here in the US, 1873.933 +2.633 and they have to basically redo the entire caption file 1876.566 +2.567 because UK audiences are expecting, for instance, 1879.133 +3.367 to see a particular speaker marked up in a particular color. 1882.5 +1.9 Here in the US, we're not expecting that. 1884.4 +2.233 So you could imagine if you had one format 1886.633 +2.333 where you marked up the content semantically 1888.966 +2.7 that, depending on whether you showed it in a player 1891.666 +2.434 here for the US or there for the UK 1894.1 +1.333 with the same caption file, 1895.433 +1.533 you could display it differently 1896.966 +2.3 according to different users' regional preferences. 1899.266 +2.467 Foliot: I would also add that user style sheets, 1901.733 +2.4 the user could actually increase the font size 1904.133 +2.567 to their specific requirements. 1906.7 +1.133 Pfeiffer: Yes. 1907.833 +1.4 Foliot: A second, really minor question. 1909.233 +1.233 On the alignment, 1910.466 +3.267 you said it could be left, right, or-- 1913.733 +1.3 was it middle or center? 1915.033 +2.733 Can you actually have it justified as well? 1920.766 +1.4 Pfeiffer: No. 1922.166 +3.5 I don't think we have a means to justify the text. 1925.666 +3.067 But I also think that, actually-- 1928.733 +5.7 so from all the quality captions literature that I've read, 1934.433 +1.3 how to do quality captions, 1935.733 +4.533 I think justification has never been proposed 1940.266 +2.267 as a readable way of doing it 1942.533 +4 because it changes the spacing between characters 1946.533 +2.167 and so therefore makes the text actually harder to read. 1948.7 +1.433 Foliot: I agree. 1950.133 +1.9 Pfeiffer: I don't think we actually need that feature. 1952.033 +2.067 But it is a good point, yeah. 1954.1 +1.666 Foliot: Well, it just wasn't specifically declared 1955.766 +1 one way or the other. 1956.766 +1 I agree with you. 1957.766 +1.3 Pfeiffer: Yeah, okay, fine. 1959.066 +2.4 Excellent. 1961.466 +2.834 Have we got any more questions? 1964.3 +3.4 Well, thank you very much. 1967.7 +2.6 I suppose if anyone has any more questions, 1970.3 +2.4 we're always here to answer more questions. 1972.7 +3.033 There's the Accessibility Group at Google, 1975.733 +3.267 through which Naomi can be reached. 1979 +3.666 And I'm very active at the W3C. 1982.666 +2.6 Feedback can be sent also to the WHAT Working Group. 1985.266 +3.7 There's plenty of ways to get to us. 1988.966 +1.334 I should have probably put 1990.3 +2.366 a contact slide in there as well. 1992.666 +2.2 I'm also on Twitter and Facebook and so on. 1994.866 +2.034 But just Google my name, Silvia Pfeiffer. 1996.9 +1.066 You'll find me. 1997.966 +1.6 Thank you very much. 1999.566 +3.3 [applause]