Is creating many schemas in H2 a good strategy for sharding and performance?
On a mailing list, someone exposed the following issue:
- We have millions of users (1 to 5 MB of data per user)
- A given user data does not access or modify other user data
- How can we implement sharding using H2 while remaining per开发者_JS百科formant?
Someone else answered the following:
- You could create 1 schema per user
- The benefit is that user data would be located in separate table instances
- Hence, this would improve performance when updating those tables
My question is:
- has anyone attempted this?
- Is this really an interesting strategy to shard data and improve/keep performance?
If you have millions of users, and 2 MB data per user on average, then you get about 2 TB. I think it's too much to store in one single database file. On the other hand, you don't want to use millions of database files either.
I would use multiple databases, each database with up to 1000 users (depending on the amount of data).
You can then either create multiple schemas (but please note that for H2, the schema metadata is kept in memory), or add a 'userId' column to each table.
精彩评论