Intermittent Cloudfront CDN failures (monitoring) - CDN Failover
For the past 2 months I have been experiencing Amazon Cloudfront intermittent failures (2-3 times a week) whereby the page would load from my web server but all the assets from the CDN would block in pending for minutes at the time (I confirmed that with shell curl from different datacenters some work some don't depending on the edge location - London?). Once the pending requests succeed all goes back to normal. We have been reporting this to amazon but they always reply with 开发者_如何学Python"Don't expect reply from us. If gazillion people will complain only then will we consider looking into this" kind of message. Often it resumes normal operation before I'm done writing the support request.
I came to a conclusion that the best way to proceed due to lack of development time for migrating to other CDN is to add a script in the html header that will let us know whenever something similar happens. So say in the header try to download a tiny gif from the CDN if the request takes longer than N msec then call an arbitrary url within the root domain (for monitoring).
The question: How does one reliably, across all popular browsers, request a file with callback on timeout. i.e.:
- request file from CDN using AJAX - will not work due to cross-domain limitations?
- setTimeout("callbackTimeout",2000) callbackTimeout(){getElementById() else ...HttpWebRequest...} - would that be blocked by pending HttpWebRequest request or will it work?
How else?
Thanks.
This has been briefly tested in IE.7&8, up to date FF on Windows & OSX as well as Chrome. I suggest you test it yourself. Minify! If you know better way of doing this please suggest your improvements. The way using i.e. script instead of an image has been considered and decided against probably mostly due to my ignorance.
The next version will write a cookie on timeout and the future requests will be handled on the server side (using relative asset path). The cookie will expire after say 30 minutes. Every consecutive timeout will renew that cookie. Not sure how I'll handle the first failover. Could be a redirect (not very elegant but simple). Perhaps I will figure out smarter way (possibly more elegant but more complex too).
<script type="text/javascript">
//<![CDATA[
// Absolute path to a picture on your CDN to be monitored
cdnImagePath = "http://YOURCDNADDRESS.net/empty.gif";
//this is relative path (cross domain limitation)
//will be followed by "timeout" or "other" as a reason i.e. /cdnMonitor.php?message=timeout
cdnMonitoringPath = "/cdnMonitor.php?message=";
// Recommended 3000 for 3 second(s) timeout
cdnTimeoutMilisec = 3000;
// Set to true to be notified after timeout (provides extra information)
cdnNotifyAfterTimeout = false;
// Handler methods
cdnOK = function(){
if (!cdnTimer && cdnNotifyAfterTimeout) cdnNotify('success');
}
cdnFail = function(reason){
if (reason != "timeout") {
if (cdnTimer) clearTimeout(cdnTimer);
message = "error"
} else {
message = reason;
}
cdnNotify(message);
}
cdnTimeout = function() {
cdnTimer = false;
if (cdnImage.complete == false) {
cdnFail("timeout");
}
}
cdnNotify = function(message) {
if (window.XMLHttpRequest) {
xmlhttp = new XMLHttpRequest();
xmlhttp.open("GET", cdnMonitoringPath + message, true);
xmlhttp.send();
} else {// code for IE6, IE5
xmlhttp=new ActiveXObject("Microsoft.XMLHTTP");
}
}
// Load test image and define event handlers
cdnTimer = setTimeout("cdnTimeout()", cdnTimeoutMilisec);
cdnImage = new Image();
cdnImage.onload = cdnOK;
cdnImage.onerror = cdnFail;
cdnImage.src = cdnImagePath + "?" + Math.floor(Math.random()*1000000);
//]]>
</script>
Also this is what I'll use for ad hoc monitoring on the server side cdnMonitor.php:
error_log(date('Y-m-d H:i:s.') .next(explode('.',microtime(1))). ' - '. $_GET['message'] . ' - '. $_SERVER['HTTP_X_REAL_IP']. ' - ' . $_SERVER['HTTP_USER_AGENT'] ."\n", 3, '/tmp/cdnMonitor.log');
You will need to change the "HTTP_X_REAL_IP" to REMOTE_ADDR or whatever suits your needs. I use reverse proxy so that's what I do.
Lastly I made some last minute changes in the post editor and might have broken something. Fingers crossed.
精彩评论