What Is Transparent Caching?
This is another installment in our series of "What Is...?" articles, designed to offer definitions, history, and context around significant terms and issues in the online video industry.
With the explosive growth of video content on the web—including user-generated content that goes viral, resulting in a massive number of views in fairly short timeframes—the old way of doing the business of video has shown signs of faltering. Based on projected numbers, some of which place video content at over ninety per cent of all internet traffic by 2013, service providers have begun to look in earnest at a number of solutions that allow video content viewing without tying up precious backhaul data pipes.
One proposed solution that's finding traction is the concept of transparent caching. The idea has merit, for two key reasons, and we expect it to grow in use over the next few years.
Transparent Caching: An Overview
The term caching refers to storing objects in a group, ideally close to the need for that object, in much the same way that an Amazon warehouse would store hundreds of Kindle Fire tablets in each of several warehouses around the country—anticipating sales of the tablet would occur within a few hours' delivery of each warehouse.
In the computing world, caching is similar, but with a twist: A copy of a file is stored locally, or at least closer to the end-user device, so that it is available for re-use. The re-use part is key to understanding transparent caching. It's certainly feasible to cache all video content at the edge, very close to each user, and that is the business model for at least one large content delivery network (CDN). It works for premium content, such as television shows or movies, but the idea of edge caching all video content on the web—on the off chance that all of the content will be viewed in equal percentages—is neither practical or financially viable.
Still, there's a need to balance between caching just a small amount of content at the edge and caching it all. That's where the transparent part comes in. The idea is to set business rules that automate the process of moving content from the network core to the network edge, without requiring human intervention for any specific video file. Doing so allows the edge cache to refresh itself, based on changes in viewing preferences on a much more granular level than could be accomplished by even a large group of human operators. To both the end user, whose video starts faster, and the network operator, who doesn't have popular video constantly traversing the network, the idea of transparent caching holds promise.
So what are the two key reasons that transparent caching holds merit? First, it's been around for quite some time; second, streaming is moving towards a model that enhances the benefits of transparent caching.
The History of Transparent Caching
In the early days, caching was the sole domain of website hosting and serving. Content could be cached at a local computer, in the form of cookies or images, and most users understand the idea of "clearing the cache" to make sure the most recent content is available for their browsers. Caching devices on the ISP's network also held the more popular sites, such as news sites, but those caches also had to be frequently cleared to keep headline stories fresh.
The one area that caching didn't really work well with was dynamic content. Think of a webpage like Kayak.com, which aggregates data from numerous airlines' pricing databases: moment by moment, seat availability and pricing changes, so it's not really practical to cache the majority of this content.
Fortunately for the world of streaming video, where more than ninety-five per cent of all video is on-demand and static, caching works in the same way that it worked for website caching. Yet the files are significantly bigger, which means caching most often occurs on the network rather than at the local device.
Transparent caching has also moved in to mainstream computing, beyond just websites, and is even integrated in to recent operating systems. Let's use an non-video computing example to illustrate the point: a standard office server, with a remote user logging in across a thin connection.
Everyone who's a road warrior knows the pain of retrieving a large PowerPoint document from the office server to a remote laptop, making changes and then uploading the presentation again. And it doesn't get any better if all you want to do is view the content: If the laptop leaves the VPN or network, even for a short period of time, the process has to begin all over again. The process wastes precious time and resources, even for a recently viewed file.
When Microsoft announced Windows 7, one of the features it added was the concept of transparent caching. According to Microsoft, with Windows 7, "client computers cache remote files more aggressively, reducing the number of times a client computer might have to retrieve the same data from a server computer".
Microsoft handles its process in much the same way that a YouTube video is temporarily cached to a local desktop: The first time a user opens a file from a shared folder, the Windows 7 machine reads the file from the server computer and then stores it in a cache on the local disk. On subsequent attempts to access the content, a Windows 7 local machine will retrieve the file from the local disk instead of reading it from the server computer.
When we talk about transparent caching for streaming, the process is very similar, although the caching occurs at a device near the edge of the service provider's network rather than on the local hard disk.
One area that's key to making transparent caching work is the area of data integrity. Dynamic content was mentioned earlier, and it's interesting to see how Microsoft balances transparent caching and data integrity in Windows 7.
"Windows 7 always contacts the server computer to ensure the cached copy is up-to-date," the company's website notes. "The cache is never accessed if the server computer is unavailable, and updates to the file are always written directly to the server computer."
Speed (and the Delivery Protocol) Matters
For the Windows 7 example, above, Microsoft notes that transparent caching is not enabled by default on fast networks. This works for files that are less than 10MB, say, since the server can download these files in just a few seconds, but it doesn't really work for video files of several hundred megabytes—or multiple gigabytes—in size.
Videocentric transparent caching needs to work, regardless of the speed at which the content is delivered to the end user. For adaptive bitrate content, this means transparently caching not just the initially requested bitrate of a particular video file but all of the available bitrates for the file.
In addition, a robust transparent caching system also has the ability for improved caching, including configuration of the amount of disk space the cache uses and the ability to block particular types of video files from being cached, whether by format or file size or popularity.
Besides speed, the delivery protocol also matters, and this is an area where streaming is trending more towards traditional website caching, especially for on-demand content.
The ratification of MPEG DASH (dynamic adaptive streaming over HTTP) as a derivative of both Adobe and Microsoft adaptive streaming technologies, as well as Apple's HTTP Live Streaming (HLS), are solid steps in moving streaming delivery away from specialized protocols and back towards the granddaddy of web-serving protocols: HTTP, or the Hypertext Transfer Protocol.
Given the almost twenty-year history of HTTP caching, the move to stream content in small fragments / segments that are delivered via HTTP servers rather than specialized video servers will only increase the benefits of transparent caching.
Other Benefits of Transparent Caching
Beyond the two benefits pointed out above, transparent caching has two additional benefits.
Briefly, one of the extra benefits is acceleration of origin caching. Some companies claim increases of up to 10 times performance for the origin cache, with transparent caching gathering music, video, and web content into a dedicated origin caching box that then negotiates with both edge servers and edge-based transparent proxy caches.
The second additional benefit is to move the transparent caching to the middle, rather than at the edge or network core. This allows mezzanine content to be offloaded to a dedicated media server for subsequent conversion to adaptive bitrates for adaptive delivery. Granted some encoding systems do their own segmentation, which can then be stored as a standard HTTP proxy cache, but others simply encode to a mezzanine file, which then benefits from mid-network transparent caching.
Conclusion
The primary beneficiaries of transparent caching are the end user and the internet service provider. From a financial standpoint, even if a service provider uses a CDN to serve up content, the content still must traverse a portion of the ISP's backbone, adding additional transport costs for every video served. In a transparent caching scenario, however, the majority of content of interest to the ISP's specific user base is available.
As more and more content becomes available, we expect to see transparent caches across larger ISP's networks take on a geo-targeted flavor—not unlike the way that local affiliate television broadcasters cater to a local market—that will allow the ISP to partner with video ad networks to monetize video traversing the ISP network, reversing the trend from a business cost to a profit center.