The caching universe

In the beginning, we had the World Wide Wait. We had browsers and Web servers, and not much else in between. While life on the Web was architecturally simple, there wasn’t too much that we could do about improving the latency of our Web connections. When we wanted to view a particularly complex Web page, or one with lots of graphics, or one on a Web server in a far-flung corner of the world, we waited.

But the days of waiting are coming to a close, thanks to an array of caching technologies that deliver data faster to the impatient browsing public. And as corporate Web sites have evolved from a single, simple Web server into more complex architectures, the caching universe has expanded as well. In this article, the second in a series on new Web technologies for IT managers and intranets, we’ll examine which of these approaches makes the most sense for different situations, and how these technologies fit into expanding Web applications infrastructures. We’ll also explain how to improve the browsing experience for corporate users as well as for Web site visitors. The first article covered Web management tools and services. The next will examine the latest e-commerce payment processing products.

Caching is a relatively simple concept: You move content closer to the ultimate destination, removing delays in network latency, congested servers, and overloaded links and routers in the process. The ideal location for the best-performing cache is to place Web content on hard disks: If we could have one large enough to store the entire Web (or at least the Web sites that you might visit), we would never have to wait for a page to load ever again. However, disks aren’t really practical. We would all have to have huge hard disks on our own desktops to contain the terabytes of Web content. The better practice, then, is to locate the cache on the big hard disks of servers on local area networks (LANs), and share this wealth among multiple users.

What you need to know

But there are lots of subtle issues behind this relatively simple concept. For instance, what happens if the page changes between the time you cache some content and you go to view it? Are there ways to store frequently used page elements that don’t change often, such as logos, headers, and other graphics, while updating more dynamic content on the fly? Are you trying to save money by not having to buy additional outbound bandwidth from your corporation to the public Internet in addition to providing a better browsing experience for your end users?

Add to these issues the fact that many corporate Web sites have evolved from simple, single-server sites to more complex ones covering multiple sites and spanning several continents. To make matters worse, these complex servers have a wide array of data types, such as streaming media, Shockwave, and other animations, and run more sophisticated applications, such as secure credit card processing and dynamic database updates, than just serving up Web pages. Such applications as streaming media and e-commerce storefronts have different and more complex caching requirements than simple file services. This is because these applications have different sizes of data streams (think of a megabyte audio file next to several simple text labels) and pages with both static and dynamic elements pulled from multiple sources, including the Web, file servers, and databases.

The best caches need to understand how to implement the concept of content freshness, and keep track of when and how Web content changes. It also needs to examine how pages are constructed, and refresh only objects that change frequently. Caches have to understand how to find the location of your content and where your users are connecting to your servers from around the world. Some caches also act as proxy servers to control or conserve on outbound bandwidth, enabling multiple users to share a single Internet connection effectively. For example, let’s say everyone in your workgroup connects to the New York Times Web site every morning to read the day’s headlines. Rather than sending the same series of pages multiple times across the Internet from the Times’ Web site to your company’s desktops, you receive one version of these pages and share it among your workgroup.

There are four basic kinds of caching products and services. The newest entrants to this arena are caching services that are sold by application service providers. The other kinds are products, including that typically run on UNIX, prepackaged caching servers that combine caching software running on either UNIX or Novell Inc.’s NetWare with general-purpose server hardware, and specialty appliances that come designed just for caching.

Service providers

The first and newest caching category provides content distribution services to handle much more than caching. Service providers offer Web server reliability, peak load demand, and the geographic dispersion of the Web itself. These services mirror the trend toward more complex and distributed Web sites, where content may be located on different continents and on servers maintained by separate divisions. And just as your own Web sites might be more distributed, your visitors might also be coming from many remote corners of the world.

To handle both of these trends, vendors have come up with the concept of replicating your content and distributing it around the globe closer to the origins of potential site visitors. These service providers are building large, replicated data centers and high-speed network connections to the general Internet. The idea here is to ensure the browsing public will avoid congested network routes and peering points. These service providers include tools to manage how content is distributed and how to do load balancing during peak traffic times.

As you can imagine, this is a very expensive undertaking. However, the cost savings in terms of overseas bandwidth charges can be quite large as well, particularly for Internet Service Providers (ISPs) outside of North America that are servicing American content abroad. Some offshore ISPs estimate that between 30% and 50% of their overall traffic has North American destinations. The major players in this area include Digital Island Inc./Sandpiper Networks Inc., Akamai Technologies Inc., and Edgix Corp.

A variation on this theme is from SkyCache Inc. The company makes use of Inktomi Corp.’s server software but delivers content via a series of earth-orbiting satellites, bypassing the congested terrestrial Internet in the process. Users of this service need to install special rooftop dishes to communicate with their networks. Working with these service providers means making some changes to your existing ISP relationships. In some cases, such as with Akamai and Sandpiper, users will have to co-locate their Web servers at their data centers and add additional management software and tools to remotely manage these servers. In the case of SkyCache, users might need to upgrade routers and other network infrastructure devices to work with its networks.

Software-only products

But service providers aren’t the whole caching story. Indeed, a few years ago the first caching products to come to market were software-only and ran on UNIX. The most widely used and notable of these products is the freely available, open-source Squid project, originally funded by the U.S. National Science Foundation. Since then Squid, and other software-only caches, has been developed for many different operating systems, including Linux and even OS/2.

Squid served as the basis for a commercial product from Inktomi called Traffic Server. Inktomi has continued to expand its product line with a series of content distribution management tools it purchased earlier in the fall of 1999 from WebSpective Software. This notion of managing content is an important aspect of the overall caching universe, and can help improve latency by adding intelligent storage of frequently used page objects in the cache. Inktomi also bought shares in service provider InterNAP Network Services Corp. and has announced a deal with Intel Corp. to sell its software on Intel computers running Sun Microsystems Inc. Solaris, moving beyond the company’s software-only roots and providing less-expensive solutions. Look for Inktomi to continue to broaden its caching product line through further acquisitions.

In addition to Squid, there are software-only caches from the major operating system/Web server vendors themselves. Novell Inc.’s Internet Caching System along with Netscape Communications Corp.’s and Microsoft Corp.’s Proxy Servers, each augment the company’s Web server software products and are primarily designed to work in conjunction with their respective Web servers. This means more choices for your caching needs, and the ability to match the right server operating system with the existing level of expertise in your organization with these operating systems. It also means more competition and, hopefully, lower prices for future caching products.

The software-only caches are a good first step into the caching universe. They are relatively inexpensive (or free, in the case of Squid and its variants), easy to set up and configure, and don’t require much in the way of new hardware. If you already have a UNIX, NetWare, or NT Web server, you can run the caching server as another application on these existing servers and manage the caching part of the server as just another task for managing the entire server. For small networks or workgroups you can run them on existing file or Web servers, and you don’t have to learn your way around a new operating system. The downside is a lack of scalability and flexibility in terms of tuning these caches to specific networks and needs.

These software-only caches are best suited to improving the browsing experience for your own users. As I mentioned above, they can save local copies of frequently requested Web pages, freeing up both the time it takes to bring these pages across the Internet and the bandwidth of your outbound Internet connection as well.

Prepackaged servers

If you require scalability and flexibility, the next step up is to buy one of the prepackaged caching servers. These typically combine some type of UNIX or NetWare, software, and hardware to deliver better performance and ease of set up. They use commonly available parts, such as Intel processors and PC-style hard disks, to keep the costs down.

The prepackaged market used to be the exclusive province of UNIX. In the past few months, however, Novell has come on strong and expanded its OEM relationships to combine its caching server software with traditional PC server hardware to create prepackaged caching servers. Again, this means more choices and competition with the UNIX world as well as increased comfort for those IT managers who have stuck with Novell these many years and still have NetWare expertise in house. These products include Dell Computer Corp.’s Internet Caching servers, Compaq Computer Corp.’s TaskSmart, Quantex Microsystems Inc.’s WebXL, and Australia-based Microbits’ Intelli-App.These products resemble the traditional PCs sold by these vendors, but are exclusively caching devices. In other words, the caching servers can’t run Windows applications or function as traditional NetWare file servers. Novell has also enabled its caching server software to work with service vendors Akamai and Edgix, an indication that it intends to be a major player in this market and that users can have a choice of products to use with these service provider networks in the future. Given Novell’s strong OEM relationships, it is worth taking a careful look, particularly for low-cost caches that will serve smaller networks. Typically, Novell servers sell for under $5,000.

Not to be outdone by Novell, a number of vendors have taken versions of UNIX to create their own prepackaged servers. These include Cobalt Networks Inc.’s Cache Cube, Eolian Inc.’s InfoStorm, PacketStorm Technologies’ WebSpeed, Entera Corp.’s TeraNode, and Network Appliance Inc.’s NetCache. These servers are somewhat more expensive, ranging in price from $5,000 to $10,000, and offer the same kind of benefits as the Novell servers, only running on UNIX.

A second group of prepackaged servers bundles the Inktomi Traffic Server with a router, a server, or other network gear. This group includes products from Alteon Web Systems Inc., as well as Foundry Networks Inc.’s ServerIron.These products include something besides caching, such as load balancing or content switching, and are geared more toward larger networks and adding reliability and fail-safe operations to the caching equation. For example, to take advantage of the extra reliability found in Foundry’s products, users will have to replace their existing network switches or routers and connect their Web, database, and other critical servers to its switch. And given these extra features, expect to pay $10,000 and up for these products.

Like the software-only caches, these prepackaged servers are great for workgroups and small corporations looking to improve the browsing experience for their users. They are also appropriate for smaller ISPs looking to save on outbound bandwidth.

Specialty appliances

One disadvantage of the prepackaged servers is they are still running a general-purpose operating system, either UNIX or NetWare. While both OSs have lots of benefits, neither was specifically designed for delivering the optimum caching performance. To get around this problem several vendors have developed their own caching appliances, which come with specialty operating systems that just do caching and nothing else. Companies in this market include CacheFlow Inc., InfoLibria Inc., Cisco Systems Inc., and Lucent Technologies.

Suitable for both internal corporate networks and service providers, these OSs come in a wide variety of sizes and capacities to match the network load. For example, CacheFlow’s product line supports T1, 15Mb, 45Mb, and OC-3 speeds, with storage capacity ranging from four to 243 gigabytes. Specialty appliances are somewhat pricey, however, with typical models starting around $15,000. The appliances are similar to some of the more sophisticated prepackaged servers in that they do more than cache; they also do Web proxying, content management, and redundant network operations. In other words, they’re more like complete systems for delivering more reliable Web content. The main difference is that the appliances typically run their own operating systems and caching algorithms, which have been developed to work together for the best caching performance. The appliances also have additional features that would not normally be found on general UNIX or NetWare servers. For example, CacheFlow’s server can restart in seconds–important where power quality might be an issue.

Both the prepackaged and appliance vendors are after more serious Web server operators and take mainframe-like approaches to quality. For example, InfoLibria’s server is designed as a fail-over router, Web proxy, and cache all rolled into one. Lucent’s IPWorX WebCache line includes Web switching features from ArrowPoint Communications Inc. and Alteon, along with Lucent’s own caching software. Foundry includes load balancing and Web content switching features, and adds the ability to support redundant links in case a cache server fails. This means that users can obtain some of the same benefits from the large-scale service provider networks such as Akamai and Sandpiper but do so in house by using the Lucent and ArrowPoint products. For corporations that don’t want to outsource their Web applications to the caching service providers, either because of control, cost, or coverage issues, these are a good alternative.

Caching appliances are probably the best devices for improving the browsing experience for existing network users. They are designed to examine the packet streams and page structures and store the most frequent items locally, in a way to ensure that the freshest content is delivered to each browser.

Caching is still an evolving area. Look to see new players, new ideas, and new products in the months to come. But it is nice to have choices, particularly as you search for the best solution to match the size of your network, your budget, and your needs, whether you wish to improve the browsing experience for your own users or for external visitors to your Web site.IJ

About the author:

David Strom was the founding editor-in-chief of Network Computing magazine and has written over a thousand articles for dozens of computer trade publications. He publishes Web Informant, a weekly guide to new Web technologies, trends, and services and is a frequent speaker at industry events including Next Generation Networks and Networld+Interop. He can be reached at david@strom.com.

Caching products/services

The caching universe has a wide and ever-growing number of players. Here is a scorecard to identify which companies are offering which products and service types.

Company Name Product/Service Type
Adero Inc. AderoWorld Service
Akamai Technologies Inc. FreeFlow Service
CacheFlow Inc. CacheFlow series Appliance
Cisco Systems Inc. Cache Engine 500 series Appliance
Cobalt Networks Inc. Cache Cube Packaged Server
Compaq Computer Corp. TaskSmart Appliance Novell Inc.
Digital Island Inc./Sandpiper Netowrks Inc. Global Cache Service
Entera Corp. TeraNode Packaged Server
Eolian Inc. InfoStorm Packaged Server
Foundry Networks Inc. ServerIron Prepackaged Server
InfoLibria Inc. DynaCache Appliance
Inktomi Corp. Traffic Server Software
Lucent Technologies IPWorX WebCache Appliance
Microbits (Australia) Intelli-App Appliance Novell Inc.
Microsoft Corp. Proxy Server Software
Mirror Image Internet Inc. Proxy Server Service
Netscape Communications Corp. Proxy Server 3.5 Software
Network Appliance Inc. NetCache Packaged Server
Novell Inc. Internet Caching System Software
Pionex Technologies Inc. Elite PCA Appliance Novell Inc.
Quantex Microsystems Inc. WebXL Appliance Novell Inc.
Sandpiper Networks Inc./Digital Island Inc. Footprint Service
SkyCache Inc. SkyCache Service
Solid Data Systems Inc. Excellerator 600 and Excellerator Ultra Family Appliance
Squid Squid Software
WebSpective Software/Inktomi Corp. WebSpective Software
Workfire.com Workfire server Software
General caching information can be found at the following Web sites:
Brian Davison’s caching resources and product comparisons
Collaborative Research’s Internet Caching Center
Note: Internet Caching Protocol/Squid is designed for caches to pass objects amongst themselves while the Cache Interface Protocol/Akamai is designed to report back hit statistics between cache objects and host/origin servers.
Source: David Strom and vendor Web sites.

NewsletterDATAMATION DAILY NEWSLETTER

SUBSCRIBE TO OUR IT MANAGEMENT NEWSLETTER



Similar articles

Get the Free Newsletter!
Subscribe to Data Insider for top news, trends & analysis
This email address is invalid.
Get the Free Newsletter!
Subscribe to Data Insider for top news, trends & analysis
This email address is invalid.

Latest Articles