Adaptive Web Caching

A Collaboration between Lixia Zhang (UCLA), Sally Floyd, and Van Jacobson ( NRG, LBNL).
The Adaptive Web Caching project is a new DARPA-funded research project whose aim is to design and prototype implement protocols for a self-configuring, highly adaptive Web caching system that will scale to the global information infrastructure. At a later stage, the project will investigate possibilites for the incremental deployment of the new protocols into the existing manually-configured web caching infrastructure.

Overview

With the exponential growth of the Internet, the World Wide Web is rapidly becoming a global-scale data dissemination system. The unprecedented success of Web, however, also caused traffic overload at data sources and along network paths. ``Hot spots'' of network load have been repeatedly observed, with the same data being transmitted over the same network links again and again to thousands of users. Although the problem is not entirely new, a similar overload had occurred in the past at popular FTP servers, however the old solution of manually configuring a few replication sites no longer works.

Using multicast delivery

Given that the basic problem is data dissemination to thousands or millions of users, the basic solution ought to be some form of multicast delivery. That is, the data should be fetched only once from the origin server, and then forwarded via a multicast tree to all the interested parties. Unlike multicast delivery for realtime multimedia applications, however, Web requests for the same data come asynchronously because different users surf the Web at different times. Therefore Web ``multicasting'' must be done via caching: the network temporarily buffers popular Web pages at places the pages have traveled through (due to previous requests), so that future requests for those pages can be served from the cache.

We propose to undertake the design and prototype implementation of a self-configuring, highly adaptive Web caching system that will enable the World Wide Web as well as other data dissemination applications to scale to the dimension of the global information infrastructure and beyond. Our design uses IP multicast as a basic building block. IP multicast serves two distinguished functions, one being the most efficient way to deliver the same data to multiple receivers, the other being an information discovery vehicle---a host can multicast a query to a relevant group when it does not know exactly whom to ask. Our caching design makes use of both features; we multicast page requests in order to locate the nearest cache copy, and multicast page responses in order to efficiently disseminate pages that have common interest.

The need for a self-configuring system

In our proposed design, Web servers and cache servers are organized into multiple, overlapping multicast groups, so that a client page request can be either met by some cache server in a local group, or otherwise forwarded to other group(s) that lie on the path towards the information source or are otherwise judged as most likely to have the referenced object. In order for this caching infrastructure to be {\em robust, scalable, and efficient}, the organization of Web caches into overlapping groups must be self-configuring. We propose to {\bf develop self-organizing algorithms and protocols} that allow cache groups to dynamically adjust themselves according to changing conditions in network topology, traffic load, and user demands.

We believe this need for self-configuring systems to be an essential component for a range of loosely coupled, globally distributed systems such as the Internet. Examples include the need for self-configuring groups for scalable session message distribution in RTP, the need for self-configuring groups for session messages and for local recovery in scalable reliable multicast, and the need for self-configuring search structures for information discovery protocols. We envision that the basic approaches to self-configuration developed in this research be further extended to other large scale systems.


Return to [ Adaptive Web Caching at LBL]