Questions on the Azure scalability targets and the use of multiple Azure storage accounts?
The Windows Azure Storage Abstractions and their Scalability Targets blog post indicates there is a 5,000 entities/second transaction limit for a single storage account, and there is a 500 entities/second limit for a single table partition. And to meet the first limit one should used multiple accounts, and for the partition limit one should design their partitions carefully.
I'd like to ask for others who have experience on the 5000 limit to a single storage account. Right now, I'm designing a community of blogs/wikis and say one day the site becomes popular and attracts a lot of traffic. Should I split the user related tables to one storage account and blog related tables to another account and yet wiki related tables to another to prevent this limit right now? Or should I add more accounts as there is a need, by the way is there a way to tran开发者_开发百科sfer azure storage tables from one account to another? The article says when you hit that limit you will get “503 server busy” responses, is there a way to know the limit is getting close so I could something in advance without accully resulting 503 errors?
I haven't hit the account limit overall, but I have hit the limit for number of transactions on a Queue by trying to set the number of worker roles reading from that queue to a ridiculous level.
As far as I know there is no "you're about to hit the limit" warning. The first time you know that you've hit the limit is you get the 503 error.
With transferring data from one account to another, there is no built in functionality that will do it for you. You either have to roll your own solution to read through every row in the source table and write it to the destination table, or use something like the Cerebrata Cloud Storage Studio which allows you to download and upload the contents of tables or their CMDLTS which let you do the same thing, but are cheaper/free.
If you're just starting out and you have logical ways of partitioning the data across storage accounts and it doesn't make the code too complicated, then do it. But I wouldn't worry about it too much at this stage. Chances are if your site does become popular and you start hitting the transaction limit, it will likely come from an area that you hadn't expected or may come from too many transactions to just one table. As you said this was for a community of blogs, the area that's likely to get the most transactions is where ever you store comments. If you get more than 5000 transactions a second against your comments table you may need to partition the comments across multiple storage accounts. Of course if the blogs are that popular, chances are you'll have other problems to deal with as well.
If scalability is what you are after, then you might consider Sql Azure Federations instead of the Azure Table Storage. The Federations feature has been made available starting with December 2011. You can find a good overview here.
With Sql Azure Federations you have better control on the amount of resources you are using. In Table Storage you are encouraged to create many partitions so that the underlying engine could at some point distribute your data on multiple machines and you will get an increased throughput. However, a partition is just a hint for the Table Storage engine. It will not necessarily move the data to a new machine. It might do that, based on the usage and on its internal algorithms, but you can never be sure when it does. With Sql Azure Federations you are the one controlling the number of instances you are using. You will control the balance between a small number of instances ( = small cost) and a big amount of instances ( = big throughput).
With Federations you can still enjoy most of the benefits from relational databases. That is you can still have transactions, joins, indexes. In fact you can have all the functionalities from a standalone Sql Azure database. The only limit is that you can only act on one federation instance at one time (at the moment the is no built in cross instance select support inside a federation).
It is true that you can increase the throughput from Table Storage by creating multiple accounts but you will have manage that manually. You will be responsible for moving the data between the accounts when making a split and for implementing the application level logic that would route to the correct account when searching certain data. That is managed automatically with Federations.
Probably the only reason to consider Table Storage is related to its price / GB which is a lot lower compared to Sql Azure (table storage pricing described here, Sql Azure pricing described here). So if you are considering to store huge amounts of data, then you might indeed consider the Table Storage (as long as you can live with its limitations).
Strictly from the throughput perspective a single instance of Sql Azure can provide a similar performance with a Table Storage account. As long as you can obtain a good distribution of the requests, with Federations you can multiply the throughput of a single database with the total number of used instances.
If you are interested on some numbers, a few months ago I have made a benchmark and run it against a federated database. The results are to be found here.
精彩评论