Cripple the Google CDN’s caching with a single character
jQuery, Performance By Dave Ward. Updated October 11, 2011It’s no secret that I’m a proponent of using a shared CDN to host jQuery. As more and more sites take advantage of public CDNs for their jQuery reference, the cross-site caching benefit is becoming almost a given. However, there are a couple ways that even I recommend against using these public CDNs.
With the impending policy change on hotlinking copies of jQuery hosted on jQuery.com, I expect that at least several sites will be migrating their hotlinked script references to one of the public CDNs soon. So, I think this is a good time to address one CDN-related usage mistake that I’ve seen an uptick in lately.
Firesheep and SSL
By now, you’ve probably heard about last year’s release of Firesheep, a Firefox addon that sniffs out HTTP session cookies transmitted in cleartext, and the turmoil that it caused. Seeing someone hijack a stranger’s Twitter or Facebook identity with a single click is enough to make anyone more conscious about the security of their browsing habits.
Traditionally, most sites have used encrypted connections for authentication and sensitive information, to avoid transmitting passwords or private data in cleartext, but they generally avoid forcing SSL connections on most other pages. This lack of site-wide support is usually attributed to the (perceived) server-side overhead of encrypting every connection and the tedious mixed content warnings that browsers display when HTTPS pages contain references to unencrypted content.
However, with SSL certificates available for cheaper than ever, Firesheep having raised so much awareness, and the prevalent use of unsecured WiFi networks in public settings, many sites have responded by offering SSL encryption for pages on their entire sites.
Unfortunately, this recent proliferation of SSL usage has at least one performance drawback that isn’t necessary obvious at first glance.
HTTPS and mixed content
To ensure airtight security, pages served via SSL should contain no references to content served through unencrypted connections. The reasoning behind this rule is sound. After all, the browser has no way of knowing whether an image contains a chart with sensitive financial data or if a JavaScript include contains a JSON collection detailing the user’s medical history.
Different browsers react to mixed HTTP and HTTPS content with varying degrees of severity, with Internet Explorer being the most hostile, but every browser displays some type of warning by default. If you’re curious about how your browser handles the situation, navigate to https://encosia.com.
Visiting my site via HTTPS retrieves the page’s content via a legitimately secured connection, encrypted with a valid SSL certificate. The catch is that its HTML contains insecure references to content such as CSS and images, which triggers the dreaded mixed content warning. The result is especially painful in Internet Explorer:

It’s obvious why avoiding the mixed content warning is an absolute necessity for any serious public-facing site.
An obvious solution
In cases where a given page may potentially be served via either HTTP or HTTPS, path-relative URLs can be used to automatically choose the right protocol for on-site content. For example, references such as /images/foo.png and /css/bar.css will automatically adapt to the correct protocol based on what was used to load the underlying document containing them.
Utilizing off-site resources like advertisements, third-party widgets, and CDN hosted libraries can be a bit more frustrating though. Because their fully qualified URLs specify a protocol, it’s easy to run afoul of the mixed content rule when pages are available through both HTTP and HTTPS protocols. Even a single, innocuous HTTP reference is enough to completely break a page served via HTTPS.
Since the inverse case – secure references on an unsecured page – isn’t subject to any obvious penalties, a common solution is to simply use HTTPS exclusively when linking to off-site resources that support it. On the secure pages, the HTTPS reference avoids a mixed content warning, and it still appears to work fine on HTTP pages as well. On sites where a given page might be viewed using either HTTP or HTTPS, assuming HTTPS can be the quickest, easiest remedy to the mixed content problem.
Unfortunately, that approach is burdened by a significant performance drawback on unsecured pages, which may not be readily apparent.
SSL == Super Slow Loading?
Using the secure reference everywhere seems like a workable solution, but there’s a major problem with over-using SSL for cacheable, static resources (such as jQuery). For the same reasons that browsers require those assets to be encrypted in the first place, most browsers default to not caching files to disk if they’ve been retrieved via SSL.
Worse, even if the user has a locally cached copy of jQuery sitting on disk that was requested from Google’s CDN via HTTP, their browser will not utilize that local copy when it encounters an HTTPS reference to the same resource on the same server.
In other words, this URL:
http://ajax.googleapis.com/ajax/libs/jquery/1.4.4/jquery.min.js
Is entirely different than the following one, as far as a browser is concerned, thus the two are not subject to the sizable cross-caching benefit that comes with using Google’s CDN:
https://ajax.googleapis.com/ajax/libs/jquery/1.4.4/jquery.min.js
The result is that using HTTPS references to Google’s CDN will result in under-optimized caching when used on regular HTTP pages. Though you must use secure reference on pages that are secure themselves, you should avoid HTTPS references on pages that don’t require them.
Update: As several people have pointed out, the Google CDN does serve its assets with a Cache-Control header that allows most browsers to cache its copies of jQuery to disk. However, that doesn’t help mitigate the cross-site caching issue. A local copy that was originally requested via HTTP cannot be used as a cache hit when the browser later encounters an HTTPS reference to the same file (and vice versa). Two separate copies of the file will be stored and each treated as distinct resources.
A better solution
It’s not exactly light reading, but section 4.2 of RFC 3986 provides for fully qualified URLs that omit protocol (the HTTP or HTTPS) altogether. When a URL’s protocol is omitted, the browser uses the underlying document’s protocol instead.
Put simply, these “protocol-less” URLs allow a reference like this to work in every browser you’ll try it in:
//ajax.googleapis.com/ajax/libs/jquery/1.4.4/jquery.min.js
It looks strange at first, but this “protocol-less” URL is the best way to reference third party content that’s available via both HTTP and HTTPS.
On a page loaded through regular, unencrypted HTTP, script references using that URL will be loaded via HTTP and be cached as normal. Likewise, on a secure page that was loaded via HTTPS, script references targeting that protocol-less URL will automatically load the script from Google’s CDN via HTTPS and avoid the mixed content warning.
Thus, using the protocol-less URL allows a single script reference to adapt itself to what’s most optimal: HTTP and it’s full caching support on HTTP pages, and HTTPS on secured pages so that your users aren’t confronted with a mixed content warning.
Conclusion
I probably could have boiled this post down to a couple of sentences, but I hope you found the underlying “why” useful. Just saying that SSL resources aren’t cached isn’t nearly as interesting as understanding why, and how that relates to the mixed content issue.
Perhaps a big part of the problem is that the Google AJAX Libraries developer guide was recently updated to list only HTTPS URLs. Anyone who copy/pastes one of those URLs into their script reference without knowing better will be impacted by the loss of disk caching and cross-site caching, which is unfortunate. If anyone reading this has the right contacts, it would be great if that page could be updated to display the protocol-less URL for all of those libraries instead.
I hope if you see someone using a fixed HTTPS reference to a resource like jQuery on the Google CDN, you’ll point them here and hopefully help speed up the web just a tiny bit for everyone.
Similar posts
What do you think?
I appreciate all of your comments, but please try to stay on topic. If you have a question unrelated to this post, I recommend posting on the ASP.NET forums or Stack Overflow instead.
If you're replying to another comment, use the threading feature by clicking "Reply to this comment" before submitting your own.
8 Mentions Elsewhere
- Cripple the Google CDN’s caching with a single character « Laboratory B
- Tweets that mention Cripple the Google CDN’s caching with a single character - Encosia -- Topsy.com
- Links for 2011-01-22 — Business Developer Talk
- Cutting the Google Analytics Script in Half « Of code and color
- Compilado de enlaces programación « Programación – por droope
- Protocol-less URLs | Sure Fire Web Services Inc.
- Coolest web development trick I’ve learned in a long time | Chainsaw on a Tire Swing
- What’s the Protocol? – Code Thug



People who do this often complain that their pages take a while to open in Windows (on various browsers) when doing local development.
That’s cuz the // actually triggers a lookup in a network share (i believe) which is seriously slow.
So… good tip! good practice.. but might make you go wtf when you’re developing in file:// protocol.
Connecting to a computer in your local network in Windows is done by 2 backslashes such as \\computername\shareddir
It should be noted that Paul Irish is correct: This will not work with local html files because of the file:/// protocol . You can download this test to your PC and test: http://encosia.com/samples/protocol-less-test/
Wow, great post. I had no idea that using protocol-less urls was valid. I had implemented a workaround that examined the request, determined the protocol used and dynamically created the script tags to match, but this will be so much cleaner.
If I understand this correctly, using the “protocol-less” approach would STILL get non-cached copies of the content file for each https site visited, right? I suppose though that even if you host it on your site, it would download it again anyway, so there is no win with that.
Great explanations in the article!
That’s right. There’s nothing to be done (currently) about the situation on secure pages. The idea behind the protocol-less approach is to improve the caching situation on insecure pages, while still avoiding the mixed content warning on secure pages.
Do you know if google’s CDN supports SSL session caching?
If not, I would expect to see a big win using a local copy on SSL (though this is only workable if your content is dynamic and you’re serving SSL other than via a transparent proxy.)
The important detail is that SSL handshaking is much more costly, and makes HTTP pipelining crucial. Doing two (or more) SSL handshakes for a single page will add quite a bit of latency.
Research by Google (it’s on the Google Blog from mid-to-late last year, and I don’t have time to dig up the link) showed that the handshake process increased latency by about 3.5x
This won’t have a significant impact on sites that are already in the ~24 to ~120ms latency range that I’ve observed to be nearly universal on U.S. hosted sites that I’ve visited. We’re looking at a loading time increase of about a second.
Very interesting read.
I hadn’t heard of the protocol-less // URLs before.
I’ll be sure to use them in any future project involving SSL.
It’s very unfortunate that Google’s own advice negates perhaps the biggest advantage of using their CDN in the first place.
Hi David,
Have you tried running:
curl –HEAD https://ajax.googleapis.com/ajax/libs/jquery/1.4.4/jquery.min.js
It clearly shows a Cache-Control public, and proper Last-Modified and Expires fields. This resource will be cached on all browsers.
You may be correct about that. The last time I did in-depth testing, browsers were raft with inconsistencies when it came HTTPS caching (regardless of whether you sent the correct headers). Firefox wasn’t even caching HTTPS content in memory at one point. Thorough analysis of more current browser versions would be interesting.
That’s not very central to the actual point of the post though. Even if every version of every browser did respect the cache-control header for SSL content, over-referencing the SSL version of the script is still fragmenting your local cache unnecessarily. With many thousand sites already using the regular HTTP references to the Google CDN, sites that use the HTTPS reference on HTTP pages are missing out on the cross-site caching benefit, which is probably the biggest advantage of using a shared CDN to begin with.
I’ve been seeing more and more people using the fixed HTTPS reference on simple sites like WordPress blogs, thinking it’s more secure or that it allows them the flexibility to offer their site through both HTTP and HTTPS. In reality, that’s just harming their site’s performance for most visitors, whereas the protocol-less reference gives them the best of both worlds.
Note that this technique when used for stylesheets (which would be the case for the jQuery UI CSS) in Internet Explorer 7 and 8. IE will download the file twice. It’s just for stylesheets and just when linked with a protocol relative URL. The only good way around that is something server side, to link with HTTP or HTTPS depending on the current state.
Here’s some more info on the topic:
http://www.stevesouders.com/blog/2010/02/10/5a-missing-schema-double-download/
Firefox and IE will cache SSL/TLS resources. Under basic circumstances, the ‘in-memory’ cache will have a copy of the resource (subject to size of memory cache, and how many tabs you have open).
The other case is based upon the included headers, namely if you add ‘Cache-Control: Public’ when delivering the resource, FF & IE will persist to the on-disk cache for that object. For example, Google includes this when retrieving jQuery over HTTPS.
The real issue isn’t whether or not HTTPS content can be cached, but that the same resource requested via HTTP and HTTPS is cached separately. Using the fixed HTTPS reference to a shared CDN on unsecured pages that don’t require SSL misses out on the biggest benefit of using a shared CDN in the first place, whereas the protocol-less URL doesn’t.
But if everyone is referencing HTTPS to grab CDN content, there’s no real problem. The resource is grabbed once by the first request (which is HTTPS) and cached for everyone else referencing it as HTTPS. So you’ll get better performance than having two (HTTP and HTTPS) potential copies of the resource requested on your website, right?
At this point, trying to start a wholesale shift from HTTP references to HTTPS references across the board would be pretty painful. Even if you believe that every browser handles the caching right and that none of the other impediments like caching proxies would matter, you’d be asking everyone to throw away the cross-site caching benefit we’ve accumulated over the past 2+ years of sites adopting the HTTP reference to Google’s CDN. That’s a tough sell at this point. I’d need to see a majority of the most trafficked CDN-referencing sites switch to SSL-only before I’d be comfortable recommending the turmoil/fragmentation of trying to shift all references to HTTPS.
Interesting, this also means we can replace Google Analytics’ include code with a standard script block:
https://gist.github.com/786747
Thanks! I’d never heard of protocol-less absolute URLs.
You can’t unfortunately. There is a custom SSL subdomain to serve secure copies of the ga.js script. Grabbing it off the www host will cause a security certificate error in IE6.
Also. here is more background on the “protocol-less URLs” http://paulirish.com/2010/the-protocol-relative-url/
Well, that’s a bummer.
Good point, Paul. This is exactly what came to mind as I was reading Dave’s article. A great example would sharethis.com where they serve their button.js script using different subdomains depending on the protocol:
http://w.sharethis.com/button/buttons.js
https://ws.sharethis.com/button/buttons.js
Not sure of the reasons for this, but the only solution I could think of would be to use a conditional which would check for protocol:
(‘https:’ == document.location.protocol ? ‘https://ws’ : ‘http://w’)
That’s quite interesting tip here ;) Thanks!
Can’t believe i’d never heard of this! Shame about IE with CSS, but an awesome tip nonetheless!
if using the .net framework, you can always render the current protocol during the rendering process.
HttpContext.Current.Request.Url.Scheme
protocol-less urls scare me. Yeah, it’s a standard. But if everyone followed standards we would need jQuery so much! :)
It’s scary at first, but the protocol-less URLs are safe to use. The only browser I could find that didn’t handle them correctly is an obscure browser named Dillo.
If you want to test some for yourself, you can borrow the page I used to test with: http://encosia.com/samples/protocol-less-test. You can point a browser at that and quickly determine if it supports protocol-less URLs. If the protocol-less reference works, you’ll see an H1 saying so. Else, a blank page.
I’m sure that they are safe to use… i’m not too concerned about that. I know that IE7 and IE8 handle it, but it handles it twice. If I’m paying for bandwidth (non-CDN calls), that’s a problem for me.
All that being said… it’s a nice feature to have when I need it.
That IE7/8 issue is limited to CSS links and @imports, as far as I’m aware. They shouldn’t double-request JavaScript includes with protocol-less URLs.
I use the Google CDN for the jQuery UI theme files (JS & CSS). This tip about the IE problem is relevant for the CSS file in particular … if I were to switch to protocol-less URLs.
My site doesn’t currently offer HTTPS, so this is a moot point for me (until I add HTTPS someday). Good article, as always.
I must say that was a fire-starter for me. Recently, I’s researching on converting an HTTP site into HTTPS. And I’s looking for what all could be the BottleNeck problem for my conversion. This article has added few cents for me.
Thanks.
This is eye opening. I never had the slightest idea that ‘protocol-less’ URL is even possible. This is certainly interesting. Out of curiousity, what abut the solution of testing whether the page is secure or not server side and therefore using http or https accordingly…?
Switching on the server-side is definitely a valid option too (like Paul mentioned above); so is explicitly switching on the client-side like Google Analytics’ default snippet does. This is just a bit easier, in that you can put the reference in your master/layout template’s footer once for the entire site and then never worry about it again.
Great article. Lately, I’ve been seeing a lot of encrypted login pages that are partially encrypted. As I learned from Netscape back in 1995 unencrypted objects (files) on encrypted pages creates a backdoor to the encrypted server.
Why bother go to the expense and effort to encrypt a page when you are going to provide hackers with a backdoor to the secure server? So what if the data being transmitted between the browser and server is encrypted when someone can program a packet sniffer to listen at the backdoor and capture the data after it has been decrypted by the server? The whole idea is to prevent anyone accessing the confidential data at any time. That is why the page was encrypted in the first place.
I.E. is notorious for being insecure. Microsoft has dumbed everyone down and prefers that we all play three blind mice. Until we all wise up and realize there are alternatives to Microsoft products that are light years ahead where security is concerned, the better off we all will be. Do not be like sheep being led to the slaughter, stop using I.E. so you will know when encrypted pages are not secure. Better still, stop using Microsoft products that has security issues. Also, make sure to contact the company whose encrypted pages are not secure and let them know you will not do business with them till the security issue is corrected.
As this article clearly shows, there is no reason for having unencrypted files on encrypted pages. Thank you for the information.
I have replaced all occurances of “http://” with “//” and am getting CSS complaints about zoom, opacity, overflow-z, inherit and filter. Also @ block is now not recognized. These errors were generated compile errors in VS 2008. Any ideas on eliminating them, and why the protocoless references would cause these errors?
Thanks
Gord
I haven’t been using the protocol-less references with CSS, due to the double-download issue in IE. So, I don’t have any insight into those VS issues, sorry.
I am a total newbie in may web design respects , know a bit of html and CSS and usually skip ” difficult ” explanations , but you seem to do it just right as I understood this somehow new matter in minutes . Thanks , I may be proof that one is never to old to learn . Cheers from Australia : jacob
This is simply wrong: “browsers do not cache files to disk if they’ve been retrieved via SSL.” It’s “not even wrong” given you put it in bold.
People are quoting this article back to me elsewhere. Please update this with tests & facts.
This has been changing over time. You used to need a Cache-Control header to get browsers to cache any SSL content to disk. Now, a run-of-the-mill Expires header seems to be enough for Chrome and IE9 to cache assets served via HTTPS.
So, you’re right that I was wrong to so definitively state that. I updated the post to be more accurate.
The main point of that section still stands though. Even if you get an asset cached locally for HTTPS requests, it cannot be used in a cache hit for a reference encountered later on a page served via HTTP (and vice versa). So, it’s not advisable to over-use the SSL reference when you don’t need it, which makes the protocol-less reference very handy.
I would only use HTTPS to retrieve JS libraries. Imagine what could happen if a man-in-the-middle poisened someones browser cache with an evil JS library. Because many websites point to the same file and location to load libraries like jQuery, the evil cached file would be loaded for all of those websites. Only when the user deletes the browser cache he or she will be able to browse safely again.
Nice post! Like others I’d never seen that URL format before.
I’m just nervous about 3rd party hosting I guess, particularly with scripts. If they change the script to something malicious, or get hacked by someone else who does, your site is also then wide open for attack.
Also if they changed the caching policy they could get all those lovely referrers and google.com cookies. Not that I’m paranoid! *8-}
Great article. Thanks for taking the time to do an in-depth explanation even for us that understand the why.
Its good from performance perspective but I doubt its good for security perspective and worried if secure sites can take advantage of this or not.
The point of using the protocol-relative reference is that any pages served over SSL will automatically use SSL to load jQuery from the CDN too.