Saturday, 25 October 2014

The Research Potential of Twitch TV


What is Twitch?

Twitch is a live video streaming service marketed to video game and e-sports enthusiasts. It arose from a community of dedicated competitive head-to-head games and "speedrunners" which complete single player games as fast as they can. Twitch, formerly Justin TV, allows players to upload a live video feed of their screen, perhaps with some overlays like a timer or a camera of the player's face or hands. Twitch earns its revenue much how traditional television does, with commercials that intersperse the play. For most players, whom have 0-10 viewers at any given time, these show up for 15-30 seconds upon beginning to watch a channel. Proven high-tier players (proven in terms of viewership, not in-game achievement) can enter into profit sharing agreements with Twitch, and may play additional commercials to generate revenue for themselves and for Twitch. Since many games have predictable moments of downtime, such as a restart of a speed run attempt, or a long loading screen, when an advertisement doesn't detract from the experience. These high-tier players can also monthly subscriptions that give small cosmetic benefits and the removal of ads to subscribed viewers.

Why Twitch is a big deal, and why it's different.

Recently Twitch was purchased by Amazon for nearly $1 billion USD (or $1000 million USD if you're using European notation). Aside from being the web's everything store, Amazon also sells a portfolio of large-scale computing services, so Twitch wasn't an out-of-character purchase. They have the hardware means to support the growing operation of receiving, compressing (often into multiple formats for viewers at different bandwidth levels), and sending video streams from thousands of players to, at the moment, roughly half a million viewers.

That is a LOT of bandwidth, and the senders and receivers could be anywhere and could change at any time. Logistically, that makes what Twitch does a lot more impressive than the feat or providing traditional television to 500,000 simultaneous viewers. With traditional television, a few hundred channel streams can be sent to a local station, which can copy those streams as needed for the short distance from the local station to the viewer. Bandwidth on the main lines does not increase as the number of viewers increases for traditional television. Youtube does what it can to get similar cost advantages. Youtube is a one-to-many system with being a central repository of videos that are pre-uploaded. It has the advantage of knowing which videos are popular or becoming popular, and can use that reduce its bandwidth costs by storing copies of popular video files with local internet service providers. I assume Netflix has a similar arrangement. Even with live streaming of large events, mainline bandwidth can be saved by branching if demand can be predicted such as with playoff hockey games, and Olympic events - in that order.

Twitch, however, with its many autonomous content providers and dispersed viewers, cannot predict where enough viewers of a given channel will be to take advantage of a local provider like traditional television can. They also can't store their main product, which is live, on a central repository. In short that massive amount of bandwidth has to go through the internet in an ad-hoc fashion that must make the per-viewer costs much higher than competing entertainment. Now that Amazon has put colossal gobs of money behind Twitch, it surely has some ideas to reduce these costs. Predictability, may have made Youtube more efficient, but what could be predictable about video game streamers?

"I'm tired of dating responsible, burly, musical chef-lumberjacks. What would really seduce me is a lecture on video compression and the power law." - Nobody. Ever. :(

The popularly of games follows the power law. Lots of things follow the power law. You may have heard of this as the 80-20 rule, such as "80% of alcohol is consumed by 20% of the drinkers". For television it would be "80% of the revenue comes from 20% of the shows.". Sometimes it's called the 90-10 rule for similar reasons (as I've heard it in the relative frequency of words in a language), but the basic principle is the same: There are a few things (channels, games, drunks) that far surpass all the others.

For Twitch, this works for games. It's almost 2am on a weeknight in Vancouver, Canada as I write this, which is a lull for Twitch. Three games: DOTA 2, League of Legends, and Hearthstone: Heroes of Warcraft, have 63,000, 40,000, and 33,000 viewers respectively. There are three games with 5000 - 10000 viewers presently, twelve games running between 1000 and 5000 viewers, and fifty games between 100 and 1000 viewers, and hundreds of games with 1-99 people watching. The viewership of channels within a game also follow the power law. 70% of Hearthstone's 33,000 viewers are watching a single channel. 15% are watching the next two most popular channels, and so on.

Why would anyone care that viewers (and therefore, revenues and costs) follow the power law? Because it means that improving the efficiency of the streaming of only a handful of games can go a long way towards improving the efficiency of all the streaming that happens.

By improving efficiency, I'm referring specifically to reducing the number of bits of information that have to be routed across the internet to deliver a stream to a single viewer at a fixed quality. The most common way of doing this is with video compression codecs*. To upload and download something that's essentially thirty pictures per second, digital videos typically use codecs, which are like books of shorthand that are shared between the sender and the receiver. In text, you can think of the acronyms*, jargon, and in-jokes that you know (BC, GLM, Old Gregg) as your personal codec, because you can use them to communicate much more quickly with other people that know the same acronyms, et cetera.*** Computers do similar things with streaming videos, they exploit common knowledge to produce a smooth video without. having. to. send. the. entire. picture. for. every. frame. Among other things, codecs exploit the fact that most frames are similar to the ones just before them. Most of these frames are for movement or animation. Even cuts within a scene share a colour palate, and there are usually transitions between scenes. This is why when things get really fast-paced in a video, or if you're skipping around it wildly, the picture goes a little crazy and you'll see ghosts of previous things or strange colours and blocks.

Twitch already uses codecs, but I imagine that the ones it currently uses are designed for general video, or at best for general video games. However, we already established that Twitch is primarily used to stream a handful of games. Each of these games has their own patterns that could be used to make codecs that will work better for those specific games.

Here are two typical screen captures for a Hearthstone stream, taken half an hour apart. (From Hafu's channel: Screencaps used without permission.)

This is a card game, so there's already a lot of static elements that a codec can use. Most of the outer rim is the same, save for a shading change, so a codec needs only send "no change in this area" each frame. Likewise for most of the centre board space. Several interface elements have changed little between the two pictures and have minimal changes from frame to frame. Aside from the shading change and the movement in the live camera, the biggest changes the play arena are the decorations in the corners. You can't tell from the static images, but the ruby in the moai statue's eye in the left screen glints regularly, and the water in the waterfall burbles down in an animated loop. Likewise, the spiky zeppelin in the right image floats around a fixed pattern, and the smoke the hut slowly billows.

Tor Norretranders would call this "exformation".

If the codec were specifically designed around Hearthstone, it could recognize the decoration elements like the glinting ruby and the billowing smoke. Then, instead of Twitch's servers having to send the slight changes to the ruby and the smoke over time, they could sent a much simpler "carry on" message as if that part of the image was static. The receiving end, knowing it had a ruby or some smoke, could fill in the animation in the video without having to have it described to it explicitly by the server. Since a large portion of the viewers of this stream have a copy of Hearthstone and play it themselves, the codec could even draw upon the art assets of the game to create the animations semi-autonomously.

Other object recognition could be done to leverage the sort of repetition that isn't found in general video, but is found in video games. The dark gems in the lower right corner of each image are repeated. With access to the art assets, Twitch could sent "draw four [dark gems] here" instead of the most longer "draw the following thousands of blue and black pixels.". Without the assets, a more general signal could be send to draw the pixels for one gem, and simply a repeat command for the next three.

Finally, object recognition could be used as a graphical form of automobile amputation autocomplete. See that "Soul of the Forest" card? If you knew every card in the game, you could cover up the bottom 85% of that card and still recognize it. A server at Twitch, streaming this to the 2,400 viewers that there were for this channel, could save a lot of effort by recognizing that card, and telling the viewers with art assets to draw that card in that position, rather than describing the card pixel-by-pixel to every viewer. It helps greatly that cards have many graphical characteristics in common, like the gem with the number in the upper left, that a server could use to recognize that the object in that portion of the screen is card, and where it should look for the rest of the card and what else to look for.

Streaming is bandwidth intensive, and bandwidth has an electricity cost and a carbon footprint. Amazon could save a lot of money, provide a more reliable product, and offset carbon at no extra cost if it found some ways like these to take advantage of the predictable content that people are streaming on Twitch.

My conjecture is that Amazon is already working on such technology, which I'm calling Application Specific Video Codecs for now. But just in case they're not, Fabian, you know how to find me. ;)

To clarify, this is about more than just a codec for gaming, that's just the inspiration. Imagine a means of video streaming that's halfway between pre-rendering and streaming. It could also be used for sports programs with consistent overlay formats, and shows with fancy transitions that get used a lot (and typically are hard to compress because they involve large movements).

It also seems like something a third party would develop and then sell the tailoring service to various streaming systems.

* I apologize for my inexact description of video compression and codecs. This is a blog post, not a textbook, and I am not yet an expert in this particular field.
** Acronyms and initialisms. Insufferable pedant.
*** et cetera. Something you should never end a sentence with. Also see "preposition".

No comments:

Post a Comment