Videos will be everywhere on the web! Yes, cope with it: soon the majority of videos won’t be with some hosting site like youtube, but it will reside on our private servers, on company servers, actually on any and all web servers. And there will be interesting stuff, but it will be hard to find.
Yes, history will repeat itself again and finding those videos on the Web that satisfy our need – be it for information or entertainment – will be a nightmare. Why? Because google’s pagerank (and many other ranking algorithms) rely on Web pages pointing to the videos to give them a higher rank. However, the way in which videos are currently published is through embedding them into Web pages (let’s call such a page the “embedding page”). Thus, the link analysis will actually return the pagerank for the embedding page – but not for the video itself!
Now, if the embedding page can actually be seen as representative for the video because the only reason that the webpage exists is to publish the video and its annotations, then the pagerank for the embedding page is actually the same as the pagerank for the video. This is the case for google video and for youtube and for many other hosting sites.
However, you and I mostly publish our videos in blogs or on Web pages that describe more than just the video – some will even have several videos embedded. This is where the chaos for a Web search engine for videos begins. And this is where the discoverability of your videos through video search engines ends.
Here is the solution.
Just as we do with normal Web pages, we have to introduce SEO (search engine optimisation) for videos. That means, we have to make it easier for the search engines to find out information about our videos, i.e. to index and rank them.
Because videos are binary data, a common Web search engine cannot extract information about this Web resource directly from it (let’s ignore signal analysis and automatic content analysis approaches for the moment). We have to help the search engine.
The solution is to have a text file sitting “next” to the actual video file which contains indexable text about the video. It will have all the annotations, meta data, tags, copyright information and other textual meta information that search engines require to index and rank it better. This text file is an indexable textual representation of the video.
So, whenever a video search engine reaches a video in a crawl, it will check out this text file for its indexing work. If this text file is HTML, then people may link directly to it and it will be included in the pagerank calculations again. If it is a XML file, there should be a simple way to transcode it to HTML, e.g. via a xslt script, so links can go there directly again.
So much for the theory: here comes the practice.
For every video file (and incidentally it would work for audio, too), you should start writing a CMML file and publish it on your Web server together with the original. Here is a xslt script that you can use to transcode CMML to HTML. If you actually use Ogg Theora as your Video publishing format, you can even publish Annodex videos and make direct access to the clips that you defined in CMML and to time offsets possible by using the Apache Annodex module. Try using it in your blog with the external embedding of the Annodex Firefox extension.
When we’ve done this, all that remains is to encourage the video search engines to exploit the CMML data in their crawls.
So you think we’re in the middle of an “explosion” of online video clips, in particular consumer-created video clips? Think again. How many videos have you published online so far? Compare that to the number of web pages you have written or contributed to.
It’s still only very few people who upload clips. The “masses” haven’t even decided to start yet.
The “mass” consists of all the people who see something useful in uploading, making accessible, and finding video clips (and no, that’s not just pr0n). It took the Web a few years before companies started having a Web presence and to use the Web as a marketing instrument. It took private people even longer before they started having blogs and publish their cv and photo collections.
Videos can be used as much as a marketing instrument as a Web page. In a convergent world, videos will even be more important than text because it reaches the couch potato. People will start making videos about their success story in gardening, about their home-grown cooking receipe, about the way to repair a special valve on their car, about how to train pets – or children (“be your own super-nanny”). Small companies will make videos about their products, the corner-shop will advertise its services to the neighbourhood, the medical centre will present its doctors and procedures through online videos, the computer shop its software, the travel agency its best locations etc. The video explosion on the Web hasn’t even started yet.
Flumotion is a streaming server product developed by Fluendo. Flumotion runs in a distributed environment, where the video capture, encoding, and transmission can be run on different computers, so the load can be better balanced.
I have found it rather difficult to find an introductory help on how to get flumotion set up and running, so I’ll share my insights with you here.
Imagine a setup where you want machine A to capture and encode the video from a DV camera, machine B relaying the stream onto the Internet to several clients, and machine C getting the stream off machine B and writing it to disk. The software that you’d need to run on each of these machines is the following:
flumotion-manageron machine B.
flumotion-manageris the central component of a flumotion streaming setup, which connects up all the components and makes sure that everything works. It has to run before anything else can happen.
flumotion-workeron every machine where you want work to be done, i.e. on machine A, B, and C. The workers are demons that connect to the manager and wait for commands to do something.
flumotion-adminon any machine to set up the details of the flumotion streaming setup.
So, here are the commands, that I use to get it running using the default setup:
(which will run
flumotion-manager -D -n default /etc/flumotion/managers/default/planet.xmlfor you).
flumotion-worker -u pants -p off &
(yes, these are the default user name and password .
(and go through the GUI setup wizard).
… and you should be up and going with either your DV camera, your Webcam or your TV tuner card. Watch the cute smileys go happy! And connect to the stream using your favorite media player that can decode Ogg Theora/Vorbis, e.g. totem, vlc, xine.
I’ve found online man pages of
flumotion-admin helpful, because the flumotion package that my Ubuntu dapper installation installed did not have them. You might actually be better off using Jeff Waugh’s packages for each of the flumotion commands if you are setting up on Ubuntu Dapper. Another hint: use the library theora-mmx to get better performance.
Flumotion is an excellent solution to setting up video streaming. I have found the following conferences have used it before:
- GUADEC, June 2006, http://guadec.org/GUADEC2006/Live
- DebConf, May 2006, http://technocrat.net/d/2006/5/12/3384
- Linux Audio Conference, May 2006, http://lac.zkm.de/2006/streaming.shtml
- Washington DC LUG, http://dclug.tux.org/webcast/