Monthly Archives: February 2008

Vquence: Measuring Internet Video

I have been so busy with my work as CEO of Vquence since the end of last year that I’ve neglected blogging about Vquence. It’s on my list of things to improve on this year.

I get asked frequently what it is that we actually do at Vquence. So here’s an update.

Let me start by providing a bit of history. At the beginning of 2007 Vquence was totally focused on building a social video aggregation site. The site now lives at http://www.vqslices.com/ and is useful, but lacks some of the key features that we had envisaged to have a breakthrough.

As the year grew older and we tried to create a corporate business and an income with our video aggregation, search and publication technology, we discovered that we had something that is of much higher value than the video handling technology: we had quantitative usage information about videos on social video sites in our aggregated metadata. In addition, our “crawling” algorithms, are able to supply up-to-date quantitative data instantly.

In fact, I should not simply call our data acquisition technology a “crawler” because in the strict sense of the word, it’s not. Bill Burnham describes in his blog post about SkyGrid the difference between crawlers of traditional search engines and the newer “flow-based” approach that is based on RSS/ping servers. At Vquence we are embracing the new “flow-based” approach and are extending it by using REST APIs where available. A limitation of the flow-based approach is that just a very small part of the Web is accessible through RSS and REST APIs. We therefore complement flow-based search with our own new types of data-discovery algorithms (or “crawlers”) as we see fit. In particular: locating the long tail of videos stored on YouTube is a challenge that we have mastered.

But I digress…

So we have all this quantitative data about social videos, which we update frequently. With it, we can create graphs of the development of view counts, comment counts, video replies and such. See for example the below image for a graph that compares the aggregate view count of the videos that were published by the main political parties in Australia during last year’s federal election. The graph shows the development of the view count over the last 2.5 months before the election in 2007.

Aggregate Viewcount Graph Federal Election Australia

At first you will notice that Labor started far above everyone else. Unfortunately we didn’t start recording view counts that early, but we assume it is due to the Kevin07 website that was launched on 7th August. In the graph, you will notice a first increase on the coalition’s view count on the 2nd September – that’s when Howard published the video for the APEC meeting 2-9 Sept 2007. Then there’s another bend on the 14th September, when Google launched it’s federal election site and we saw first videos of the Nationals going up on YouTube. The dip in the curve of the Nationals a little after that is due to a software bug. Then on the 14th October the Federal Election was actually announced and you can see the massive increase in view count from there on for all parties, ending with a huge advantage of Labor over everybody else. Interestingly enough, this also mirrors the actual outcome of the election.

So, this is the kind of information that we are now collecting at Vquence and focusing our business around.

On that background, check out a recent blog post by Judah Phillips on “Thinking about Measuring Internet Video?”. It is actually a wonderful description of the kind of things we are either offering or working on.

Using his vocabulary: we can currently provide a mix of Instream and Outstream KPI to the video advertising market. Our larger aim is to provide outstream audience metrics that are exceptional and we know how to get them regardless of where the video goes on the Internet. Our technology plan centers around a mix of a panel-based approach (through a browser plugin) and a census-based approach (through a social network plugin for facebook et al, also using OpenID), and video duplicate identification.

This information isn’t yet published at our corporate website, which still mostly focuses on our capabilities in video aggregation, search, and publication. But we have a replacement in the making. Watch this space… 🙂

Activities for a possible Web Video Working Group

The report of the recent W3C Video on the Web workshop has come out and has some recommendations to form a Video Metadata Working Group, or even more generally a Web Video Working Group.

I had some discussions with people that have a keen interest in the space and we have come up with a list of topics that a W3C Video Working Group should look into. I want to share this list here. It goes into somewhat more detailed than the topics that the W3C Video on the Web workshop has raised. Feel free to add any further concerns or suggestions that you have in the comments – I’d be curious to get feedback.

First, there are the fundamental issues:

  • Choice of royalty-free baseline codecs for audio and video
  • Choice of encapsulation format for multi-track media delivery

Both of these really require the generation of a list of requirements and use cases, then analysis of existing format with respect to these requirements and finally a decision on which ones to use.

Requirements for codecs would encompass, amongst others, the need to cover different delivery and receiving devices – from mobile phones with 3G bandwidth, over Web video, to full-screen TV video over ADSL.

Here are some requirements for an encapsulation format:

  • usable for live streaming and for canned delivery,
  • the ability to easily decode from any offset in a media file,
  • the use for temporal and spatial hyperlinking and the required partial delivery that comes with these,
  • the ability to dynamically create multi-track media streams on a server and to deliver requested tracks only,
  • the ability to compose valid streams by composing segments from different servers based on a (play)list of temporal hyperlinks,
  • the ability to cache segments in the network,
  • and the ability to easily add a different “codec” track into the encapsulation (as a means of preparing for future improved codecs or other codec plugins).

The decisions for an encapsulation format and for a/v codecs may potentially require a further specification of how to map specific codecs into the chosen encapsulation format.

Then we have the “Web” requirements:

The technologies that have created what is known as the World Wide Web are fundamentally a hypertext markup language (HTML), a hypertext transfer protocol (HTTP) and a resource addressing scheme (URIs). Together they define the distributed nature of the Web. We need to build an infrastructure for hypermedia that builds on the existing Web technologies so we can make video a first-class citizen on the Web.

  • Create a URI-compatible means of temporal hyperlinking directly into time offsets of media files.
  • Create a URI-compatible means of spatial hyperlinking directly into picture areas of video files.
  • Create a HTTP-compatible protocol for negotiating and transferring video content between a Web server and a Web client. This also includes a definition of how video can be cached in HTTP network proxies and the like.
  • Create a markup language for video that also enables hyperlinks from any time and region in a video to any other Web resource. Time-aligned annotations and metadata need to be part of this, just like HTML annotates text.

All of these measures together will turn ordinary media into hypermedia, ready for a distributed usage on the Web.

In addition to these fundamental Web technologies, to integrate into modern Web environments, there would need to be:

  • a standard definition of a javascript API to interact with the media data,
  • an event model,
  • a DOM integration of the textual markup,
  • and possibly the use of CSS or SVG to define layout, effects, transitions and other presentation issues.

Then there are the Metadata requirements:

We all know that videos have a massive amount of metadata – i.e. data about the video. There are different types of metadata and they need to be handled differently.

  • Time-aligned text, such as captions, subtitles, transcripts, karaoke and similar text.
  • Header-type metadata, such as the ID3 tags for mp3 files, or the vorbiscomments for Ogg files.
  • Manifest-type description of the relationships between different media file tracks, similar to what SMIL enables, like the recent ROE format in development with Xiph.

The time-aligned text should actually be regarded as a codec, because it is time-aligned just like audio or video data. If we want to be able to do live streaming of annotated media content and receive all the data as a multiplexed stream through one connection, we need to be able to multiplex the text codec into the binary stream just like we do with audio and video. Thus, the definition of the time-aligned text codecs have to ascertain the ability to multiplex.

Header-type metadata should be machine accessible and available for human consumption as required. They can be used to manage copyright and other rights-related information.

The manifest is important for dynamically creating multi-track media files as required through a client-server interaction, such as the request for a specific language audio track with the video rather than the default.

Other topics of interest:

There are two more topics that I would like to point out that require activities.

  • “DRM”: It needs to be analysed what the real need is here. Is it a need to encrypt the media file such that it can only be read by specific recipients? Maybe an encryption scheme with public and private keys could provide this functionality? Or is it a need to retain copyright and licensing information with the media data? Then the encapsulation of metadata inside the media files may be a good solution already, since this information stays with the media file after a delivery or copy act.
  • Accessibility: It needs to be ascertained that the association of captions, sign language, video descriptions and the like in a time-aligned fashion to the video is possible with the chosen encapsulation format. A standard time-aligned format for specifying sign language would be needed.

This list of required technologies has been built through years of experience experimenting with the seamless integration of video into the World Wide Web in the Annodex project and through further recent discussions from the W3C Video on the Web workshop and elsewhere.

This list is just providing a structure towards what is necessary to address in making video a first-class citizen on the Web. There are many difficult detail problems to solve in each one of these areas. It is a challenge to understand the complexity of the problem, but I hope this structure can contribute to break down some of the complexity and help us to start attacking the issues.

Metadata and Ogg

I am really excited about the huge progress we made at FOMS with metadata and Ogg. The metadata specifications are actually not Ogg-specific – only their mapping into Ogg is. Here are the things that I expect will make for a very structured and sensible distributed handling of metadata on the Web.

At FOMS, we started improving CMML and are now specifying the next version of CMML. CMML is a timed text description language that can easily be multiplexed alongside audio or video data. It is very flexible with its fields and satisfies needs for hypermedia, captions, annotations and other time-aligned text. We took out the Ogg dependencies and it can now be used in any media container format. The specification is now also in an XML schema rather than a DTD, which enables us to reuse modules from XHTML and make it generally more extensible.

We introduced ROE, a description language (or a “manifest”) for multitrack media files. It describes media tracks and their dependencies and thus goes much further than the old stream and import elements in CMML, that now have been deprecated.

ROE can be used to author multitrack media files – in the Ogg case to author Ogg files with a Skeleton track and multiple media tracks. We are in the process of extending Skeleton to incorporate the description of dependencies between logical bitstreams. To complete this, we will be creating a description of how to map ROE into Ogg/Skeleton and vice versa.

ROE can also be used to negotiate with a Web client what media streams to send from the complete manifest that is available on the server. For example, a Web client could request the German sound track with a movie rather than the default English one, and to add English subtitles. This requires a small protocol for negotiation, which can easily be build using Web infrastructure. We are introducing some new HTTP request/response parameters and specific URLs, such as e.g. http://example.com/movie.ogg?track=V1,A2,TT2.

The set of ROE, Skeleton, CMML, and the HTTP and URI specifications will enable a very structured means of interacting with metadata-rich video on the Web. It will be distributed and integrated into the Web infrastructure, much like the Annodex set of technologies already is today.

Since I am also a business owner aside of being an open media enthusiast, let me add that I expect it to have a huge impact on online business around audio and video, enabling business processes and business models that are not possible today. Watch this space!

The greatest gathering of open media sw developers

When I started organising the first FOMS (Foundations of open media software developers workshop) in 2007, I did it because I saw a need to have media hackers get together in a room and discuss stuff in person. Email, irc, svn, bugzilla and wikis only get you a certain distance for collaboration. But no distance communication tool can replace the energy and creative spirit that is created through an in-person meeting and the ability to have a beer together in the evening. Discussions are more intense, impossibilities are identified faster, progress is amazing – and the energy will last and have an impact on the community for months to come after the event.

FOMS 2007 was great in that respect, because some 25 hackers got to know each other for the first time, friendships were formed, trust was built and new ideas (speaking: new code) was created. It was awesome and gave me the motivation to go and organise FOMS 2008. At this point let me express my gratitude to the organising committees of both FOMS 2007 and FOMS 2008 for the support they have given me to organise both workshops and hope they will help again next year in Tasmania.

So then FOMS 2008 took place and what can I say!? It totally blew me away. For me it was a much better experience than the year before because I didn’t also organise the video recordings at LCA. I was therefore more relaxed, got involved in design discussions, and was able to sit down during the week after FOMS at LCA and actually interact with people. On a side note here: Thanks so much to Donna Benjamin, the main organiser of LCA 2008, for getting the FOMS participants a room to ourselves where we were able to gather and get an awesome whole lot of work done.

Nearly the whole Xiph community was at FOMS and issues that have been brewing for years were tabled and discussed. A large number of audio hackers were there, too, and the issue of a standard sound APIs got some heated discussion. There’s a press release and the proceedings of the FOMS discussions up on the FOMS 2008 website, where you can make yourself a complete picture of all the issues that were discussed.

In addition to FOMS, Conrad Parker and I had also organised a Multimedia Miniconf at LCA. It was a great place to communicate some of the outcomes of FOMS and to present some of the latest developments in open media software in the Linux community. Video proceedings are available on the site.

Overall I must say that January has become the highlight of my year in open media software.

“IT’s a mad men’s world”

Last night I took part in a panel that was organised by Rachel Slattery under the title of “IT’s a mad men’s world.”. There were a whole lot of really fascinating women there, both in the audience and the panel, but also some stray men, which was good to see. With me on the panel were Sue Klose, Corporate Development Director of News Digital Media; Juliet Potter, Founder & Director of www.autochic.com.au; and Tim Batten, Head of eChannels & Payments at Westpac. The panel was moderated by Sandra Davey, Director of kcollective.

The discussion was really awesome and the stories that each of us could tell of situations where we had to stand up for ourselves just for being a woman were shocking. But what really got me was the universal message that we all had: don’t let the morons get you down and let go of your goals. Fight the fights that are worth it. It’s OK if not everybody loves you – you have to ask yourself: do you want to be liked or respected?

I actually did a little research before the event and wasn’t able to share half of the things I learnt. So I thought I’d put some more in this blog post.

I came across this xkcd cartoon just yesterday and thought: wow, this really is the essence of the problem why we have so few women in IT.

xkcd: how it works

You may be thinking that this cartoon represents a problem with male (or indeed societal) prejudices against women. I actually think the problem is deeper.

Imagine you’re a girl and have to decide on a career. You’re pretty good at many things and could be going into a technical career. But you have little experience since you’ve had little exposure and no mentors in the field before. Would you take the chance to expose yourself to looking really dumb, possibly even failing? Not just are you taking the hard road for yourself if you do. But there’s the larger impact on the perception of women. By looking dumb or failing, you will shed a bad light on all women and thus confirm the prejudice, making it even harder for other women to go into the field. Now do you start to understand why there are so few and each year even less women in technical jobs?

You think I’m taking this too far? Don’t. Women are being taught from very early on to not just think about themselves, but to be cooperative and always consider their environment. While such thoughts might not be consciously taken, they are there and play a role.

What do I really want to say with this? It’s not just a matter of changing men and indeed societal attitudes towards women. It’s also a matter of building up women’s self-confidence, teaching women how to be competitive and independent. And you have to start at school with encouraging and introducing women into IT. Because really: “Computing is too important to be left to the men” (quote from Karen Sparck-Jones).

UPDATE: I have heard from several men that they find that quote rather offensive and read it as in “we should not trust men with computing”. That is absolutely not the way I read it. I want it to be read as an encouragement to women to go into computing – it is an important field for the future of humanity and half of humanity is not taking part in shaping it. That’s just not right.

Further reading material:

Sexier new Vquence player

I’ve been meaning to write about this for a while, but haven’t found a good motivation yet. Today I stumbled across the videos from RailsConf2007 on Blip.tv and decided – this is it! I will show off the nice new sexy layout of the Vquence player with this content – after all, we are a rails shop (apart from all those other programming languages that we use).

Julian has worked over the design of the player in December and done an awesome job. The image pane’s scroll slows down as your reach the left or right border. It works similar to a scrollbar, where if you go to the middle of the image pane, it will scroll to the middle clip in the playlist. As you leave the image pane, it snaps back to focus on the clip that you are currently watching.

The new player also has a lot more text in it. As you mouse over the images, you get the titles of the clips. As you click on the (i) button, you get the annotations of the current clip (click (i) again to make it go away). At the beginning of each clip, there’s a small text reminder at the top that a click on the video will take you to the full video.

And finally – to give the video more space, the transport bar actually disappears as you keep watching and stop interacting with the player. This gives it more of a sit-back experience. The possibility to activate the full-screen display also adds to this experience.

Overall, I am really thrilled how far we have taken the player. Enjoy!

(But should you have any feedback or suggestions for improvement, feel free to shoot me an email or leave a comment.)