AppFabric: Large cluster slow to respond compared with small cluster
I am using AppFabric for caching on my website. We have recently added more hosts to the cache cluster and since it is likely that we will be adding more in the [medium] future, we decided to reconfigure the cluster as a large cluster rather than small. We are now seeing some troubling side affects of this - namely AppFabric takes a long time to come back up after a restart.
OK, so if you have got this far, I have got your attention and I can tell you the full story :O). AppFabric always took a long time to come back up after a restart, but we were able to configure and code for this so that our users saw no adverse affects. In web.config we have:
<dataCacheClient channelOpenTimeout="5" requestTimeout="1000" >
<!-- cache host(s) -->
<hosts>
<host name="localhost" cachePort="22233"/>
</hosts>
</dataCacheClient>
Which (unless I've misunderstood the documentation) will cause the AppFabric client to throw an exception if it doesn't receive a response within 1 second. In our code, we handle this and fall back to reading the data we are attempting to read directly from the database:
public object Get(object key)
{
if ( key == null )
{
return null;
}
try
{
return cache[key.ToString()];
}
catch ( CacheException ex )
{
if ( ex.ErrorCode == DataCacheErrorCode.ConnectionTerminated || ex.ErrorCode == DataCacheErrorCode.RetryLater || ex.ErrorCode == DataCacheErrorCode.Timeout )
{
// Calling code should try reading from the database instead
return null;
}
else
{
throw;
}
}
}
Since we have started using the large cluster configuration, it is as though the requestTimeout attribute of the dataCacheClient config entry is having no affect. After issuing the Restart-CacheCluster command our website stops responding for between 3 and 5 minutes which is about how long it currently takes for the cluster to come back up again after a restart.
To troubleshoot this further I did some testing on my local machine to see how long开发者_如何学JAVA it took the home page of the website to load after a full refresh ([ctrl] [f5]) with AppFabric in various states. The results are as follows (times are average times in seconds):
Small Cache up: 11.4462
Small Cache down: 12.4346 Small Cache restarted: 11.5794 Small Cache restarted[1]: 14.99Large Cache up: 11.5534
Large Cache down: 16.576 Large Cache restarted: 59.4582 Large Cache restarted[1]: 62.9526As you can see from the results above, there is a significant difference between the time it takes for the homepage to load normally, and after AppFabric is restarted.
In case you are wondering why we are restarting the cluster at all, we sometimes want to invalidate the cache for the purpose of forcing changes to values such as settings which have a very large TTL, to take affect immediately.
[1] Second restart tests are with the channelOpenTimeout and requestTimeout attributes removed from web.config
You can always enumerate over all cache items and remove them from the cache. This way you will achieve your goal without restarting the cache.
DataCache cache; // TODO: initialize
foreach (var regionName in cache.GetSystemRegions())
{
Trace.WriteLine(string.Format("Enumerating objects in region '{0}'", regionName));
foreach (var item in cache.GetObjectsInRegion(regionName))
{
Trace.WriteLine(string.Format("Removing cache item '{0}'", item.Key));
cache.Remove(item.Key);
}
}
Also Small cache cluster means 1-5 cache servers while Large means more than 15 cache servers. When you restart the cluster it means it has to stop AppFabricCachingService on all of those machines, wait for all of them to stop and then start them all and wait for all that.
精彩评论