The online game path of exile has a trick for reconciling a game that requires quick reaction times with the limitations of servers: it predicts the actions of the players. It doesn't have to predict far ahead - a couple hundred milliseconds - to keep the action running smoothly most of the time.
Prediction on this time frame isn't hard in principle; play often involves performing the same short
action repeatedly, such as firing an arrow, so the default prediction is just more of that. When the server predicts incorrectly, it usually has enough time to 'rewind' to the present and handle things as they come. The prediction is just an extra buffering layer that's in place when there is a lot of server lag.
Does video or music streaming do this? Could it?
In a song with a repetitive baseline, could the information to the client computer include: "repeat the sound from time x with the following deviations included in the buffer", rather than "play the following sound"? The "sound at time x" in this case is a note from the baseline and the deviations being the result of a human playing an instrument and not hitting the note exactly the same every time. In a case like that, potentially less data would need to be sent to reproduce the song, allowing for longer buffers or higher sound quality.
Likewise for video. Consider a live video feed of a soccer match, in which a player is running across the field. Video prediction may determine there is an object moving at some speed in some direction and predict a few frames ahead where that object will be, and thus what pixels to draw in that spot. Then the streaming service, making the same prediction, could just send the video information that deviates from this prediction.
For repetitive patterns like an animation of a like a spinning wheel or sparkling logo of a sports team. If the wheel spins in a predictable fashion, internet bandwidth could be saved by describing the movement of the wheel as "spinning as expected", where matching prediction software on the server and client sides both recognize that that part of the screen is taken up by an object in 10-frame loop.
This is different from encoding only the pixels that change from frame to frame. This prediction would incorporate likely changes in a picture based on simple movement patterns or on repetitive animations.
Consider a streaming game of hearthstone, like the last time I pretended to know about video encoding. There are certain animations that regularly impact video quality, such as sand sweeping across the entire screen, that involve a many pixels changing for a non-trivial amount of time. The video encoder does fine when the picture is mostly the same from frame to frame, but introduce one of these effects that causes a lot of pixels change at once, and the quality of the live stream.
However, the sand effect is one of sand moving slowly across the screen, its movement is predictable in that any one pixel of the effect is likely to follow the same trajectory as it did in the last few frames. Predictive video encoding is more scalable than the application specific encoding I mentioned before, but with time it could achieve the same effect if it was able to recognize frequently used pre-rendered effects such as lightning all over the screen. A predictive video encoder could recognize the first few frames of the 'lightning storm' effect and predict the rest without having to send any information about that part of the screen.
I'm no expert on video encoding, so this may all be jibberish.
Previous post on video encoding in Twitch, the possibility of application specific codecs.