开发者

Sharding at application level

I am designing a multi-tenant system and am considering sharding by tenant at the application layer level instead of database.

Hypothetically, the way this should work is that for incoming request a router process has a global collection of tenants containing primary attributes to determine the tenant for this request as well as the virtual shard id. This virtual shard id is further mapped to an actual shard.

The actual shard contains both the code for application as well as whole data for this tenant. These shards would be LNMP (Linux, Nginx, MySQL/MongoDB, PHP) servers.

The router process should act as proxy. It should be able to run some code to determine the target shard for incoming request based on the collection stored in some local db or files. To be able to scale this better, i am considering making the shards themselves act as routers also so that they can run a revers开发者_如何学Goe proxy that will forward the request to appropriate shard. Maybe the nginx instance running on shard can also act as that reverse proxy. But how will it execute the application logic needed to match up the request with the appropriate shard.

I will appreciate any ideas and suggestions for this router implementation.

Thanks


Another option would be to use a product such as dbShards. dbShards is the only sharding product that shards at the application level. This way you can use any RDMS (Postgres, MySQL, etc.) and still be able to shard your database without having to put some kind of proxy in-between. A lot of the other sharding products rely on a proxy to point the transactions to the correct shard, but dbShards knows where to go without having to "ask" anyone else.

Great product. dbshards


Unless you expect your tenants to generate approximately equal data volume, sharding by tenant will not be very efficient.

As to application level sharding in general, let me share my own experience:

Version 1 of our high-volume SaaS product sharded at the application level. You will find that resharding as you grow will be a major headache if you shard against a SQL type solution at the application level, or you will have to write significant tooling to automate the process.

We switched to MongoDB (after considering multiple alternatives including Cassandra) in no small part because of all of the built-in support for resharding / rebalancing as data grows.

If your application does not need the relational capabilities of MySQL, I would suggest concentrating your efforts on MongoDB (since you have already identified that as a possible data platform) if you expect more than modest data growth. Allow MongoDB to handle the data sharding.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜