Application security encompasses measures taken throughout the application's life-cycle to prevent exceptions in the security policy of an application or the underlying system (vulnerabilities) through flaws in the design, development, deployment, upgrade, or maintenance of the application.

Wednesday, December 25, 2013

CDNs and Cache Control

Fundamentally, A content delivery network or content distribution network (CDN) is a large distributed system of servers deployed in multiple data centers across the Internet. The goal of a CDN is to serve content to end-users with high availability and high performance. CDNs  sit between the end user and the web server. Each of these servers are designed to cache the web server content according to the cache rules set in the various HTTP headers.

When configured properly, CDNs will deliver content to end-user from the fastest, (and typically nearest) server available. Additionally, CDNs act as a buffer between web servers and end-users. The number we are most concerned with is the cache hit ratio, which describes the percentage of requests the CDN was able to answer out of its own cache without having to bother our web servers. Depending on your traffic and architecture, this number can go high, even at lower figures you'll experience a gain. 

CDNs and Cache Control

A word of warning though; if you just setup your cache and fail to configure your caching headers correctly, it's quite possible to end up paying twice for every request. Beyond their intended use for caching, CDNs also have a pleasant side-effect; provided you're dealing with a website, or a particularly well-crafted web application, in the event that your servers experience a momentary outage, your CDN may be able to buffer the experience for your end users, ensuring they never even notice.

Simply put, caching allows end-users to store web assets on remote points along the way to the visitors’ browsers. Of course the browser itself also maintains an aggressive cache, which keeps clients from continually ask web server for resources each time it comes up.

In the case of assets, things like your company logo, the favicon for your site, or your core CSS files aren't likely to change from request to request, so it is safe to tell the requester to hold onto their copy of the asset for a while.By cutting down on the requests web server has to deal with, the web server is able to handle more requests, and end-users will enjoy a faster browsing experience. Generally, assets like images, JavaScript files, and style-sheets can all be cached fairly heavily, while assets that are dynamically generated, like dashboards, forums, or many types of web-applications, benefit less, If at all your concern is performance, your dynamic content will be shunted to a bare minimum of AJAX-type resources, while the rest of your assets will be heavily cached.

Cache Control

If there were a default super-header for caching behavior, this would be it. Typically you will see string of settings for this header and these settings are called cache response directives. 

cache-control: private, max-age=0, no-cache

Public: Sets Cache-Control: Public to specify that the response is cacheable by clients and shared (proxy) caches. 

Private (Default value):  Sets Cache-Control: Private to specify that the response is cacheable only on the client and not by shared (proxy server) caches. Do not make the mistake of assuming that this in any way provides you with some kind of security or privacy: Keep using SSL. 

no-cache: This directive CACHE-CONTROL:NO-CACHE indicates cached information should not be used and instead requests should be forwarded to the origin server. PRAGMA:NO-CACHE, has the same semantics but this directive gets interpreted by newer implementations. I would not generally recommend worrying about it, but for the sake of completeness, there it is. No new HTTP directives will be defined for "pragma" going forward. 

no-store: This directive CACHE-CONTROL:NO-STORE specify that caches should not store this response. It will also ensure that no part of the request is stored. "no-store" was designed with sensitive information requirements in mind, and so is kind of like the G-Man of cache headers. 

expires: Back in the day, this was the standard way to specify when an asset expired, and is just a basic date-time stamp. It is still fairly useful for older user agents but on most modern systems, the "cache-control" headers "max-age" and "s-maxage" will take precedence, but it's always good practice to set a matching value here for compatibility. Just make sure you format the date correctly, or it will be evaluated as an expired date. 

max-age: Traditionally, you would let caches know when an asset is expired by using the "expires" header. However, if you want to be more explicit, you may set a max-age, in seconds, which will override the expires header. 

s-maxage: We can see some similarities between this header and the last one. "s-" is for shared, as in "shared cache", as in CDN. These directives are explicitly for CDNs and other intermediary caches. When present, this directive overrides both the max-age and expires header, and most well behaved CDNs will obey it. 

must-revalidate: If your responses include this directive, you are telling the cache that it needs to revalidate a cached asset on any subsequent request, and that it may not, under any circumstance, serve stale content (which is sometimes a desired behavior). Apparently this directive exists because some protocols require it, typically involving transactions. 

no-transform: Some proxies will convert image formats and other documents to improve performance. If you don’t like the idea of your CDN making automated guesses about how your content should be encoded or formatted, I suggest including this header. 

proxy-revalidate: Basically same as the "must-revalidate" directive, except it's just for the shared caches. This directive is designed for intermediary proxies and not user agents. The idea here is that you validate each end-user only once between the proxy and their agent, but each new user should revalidate back to the server. 

etag: Short for "entity-tag", the etag is a unique identifier for the resource being requested, typically comprised of the hash of that resource, or a hash of the timestamp the resource was updated. Basically, this lets a client ask smarter questions of the CDNs, like "give me X if it's different than the etag I already have". There's a neat trick you can do with etags, which is to make them weak validators. This basically tells the user that although they are not the same, the two resources are functionally equivalent. Support for this feature is considered optional. 

vary: Essentially, "vary" lets the caches know which of the headers to use to figure out if they have a valid cache for a request; if a cache were a giant key-value store, adding "vary" fields appends those values to the key, thus changing which requests are considered valid matches for what exists in the cache.

You would commonly set this to something like "Accept-Encoding" to make sure your gzip'ed assets get served where appropriate, saving you all that bandwidth you might otherwise waste. Additionally, setting: 

vary: User-Agent 

will put you in the SEO good-books if you happen to be serving different versions of your HTML/CSS depending on the User-Agent of the request. Google will note the header and have the Googlebot crawl your mobile content as well.

1 comment: