开发者

Mysql one big database or small many tables

Our app would be creating dynamics forms for each users.

We are considering two approaches

a) We store all users forms data in single table as key value pair, then using pivot table technique fetch data and do filtering which works but i think might be really huge resource guzzler

b) Create / Design table for each form user creates that fits to its design, here we dont have to pivot and we get all benefits of mysql and also we wont have joins for more than couple of tables

Our major concern is scalability, in scenario 'a' if table grows really huge (that will obviously happen if i consider even 1000 users register), with plan 'b' if tomorrow our tables grow with it our traffic we might have to use mysql cluster but cluster have table limitations as i read here http://dev.mysql.com/doc/refman/5.0/en/mysql-cluster-limitations-database-objects.html

Update

A new idea struck us, why not create a single table (as key, value) that fits single form element (Text Field, Text area, etc), using join (No of joins = No of fields) based on form definition created by user we can create a horizontal data, without pivoting what you all think about this idea.

Update 9-April-2011

We tested pivoting against joins

Scenario - We have a form with 7 fields, the data is stored as (sample data for single form insert)

------------------------------
|   Key       |    Value     |
------------------------------
|   Key1      |    Value1    |
|   Key2      |    Value2    |
|   Key3      |    Value3    |
|   Key4      |    Value4    |
|   Key5      |    Value5    |
|   Key6      |    Value6    |
|   Key7      |    Value7    |
------------开发者_开发问答------------------

For pivot query it took 0.92 seconds to pivot 70000 rows into 10000 form inserts For join it took 17.63 seconds (woof.....) to show these 10000 form inserts

My table

CREATE TABLE IF NOT EXISTS `vet` (
  `id` int(11) NOT NULL,
  `form_id` int(11) NOT NULL,
  `key` varchar(255) NOT NULL,
  `value` varchar(255) NOT NULL,
  KEY `id` (`id`),
  KEY `form_id` (`form_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

Pivot Query

SELECT id, 
       GROUP_CONCAT(if(`key` = 'k1', value, NULL)) as 'key1',
       GROUP_CONCAT(if(`key` = 'k2', value, NULL)) as 'key2',
       GROUP_CONCAT(if(`key` = 'k3', value, NULL)) as 'key3',
       GROUP_CONCAT(if(`key` = 'k4', value, NULL)) as 'key4',
       GROUP_CONCAT(if(`key` = 'k5', value, NULL)) as 'key5',
       GROUP_CONCAT(if(`key` = 'k6', value, NULL)) as 'key6',
       GROUP_CONCAT(if(`key` = 'k7', value, NULL)) as 'key7'
FROM vet
WHERE form_id = 2
GROUP BY id

JOIN Query

SELECT v.id, v1.value as key1, v2.value as key2, v3.value as key3, 
       v4.value as key4, v5.value as key5, v6.value as key6, v7.value as key7
FROM vet v
LEFT JOIN vet v1 ON v1.id = v.id AND v1.`key` = "k1"
LEFT JOIN vet v2 ON v2.id = v.id AND v2.`key` = "k2"
LEFT JOIN vet v3 ON v3.id = v.id AND v3.`key` = "k3"
LEFT JOIN vet v4 ON v4.id = v.id AND v4.`key` = "k4"
LEFT JOIN vet v5 ON v5.id = v.id AND v5.`key` = "k5"
LEFT JOIN vet v6 ON v6.id = v.id AND v6.`key` = "k6"
LEFT JOIN vet v7 ON v7.id = v.id AND v7.`key` = "k7"
WHERE v.form_id = 2
GROUP BY v.id

I think we will stick to pivoting for now, please also see my queries if the huge difference is due to them.

Please suggest which solution is better or if there is another better solution

Sudesh


I suggest design it using one single database. Then you always can use sharding for horizontal scaling when needed.


You don't want to create a new table for each form that a user creates.
However you also do not want to store all data in one big table.
Best to store stuff in a few tables so that you do not repeat stuff, something like this.

First a table with user data:

Table user
id: integer autoincrement primary key
username: varchar(255)
other user data

Then a table that links the user to the form data (but does not actually hold any form data

Table UserForm
id: integer autoincrement primary key
user_id: integer index
... other fields for fixed data that always occurs only once in each form.

Then the table with the Form data
Table FormProperties
id: integer autoincrement primary key
UserForm_id: integer index
PropertyName: varchar(255)
PropertyValue: varchar(255)

Now when you want to access data from a form you use a query like

select * from FormProperties
inner join UserForm on (FormProperties.UserForm_id = UserForm.id)
inner join User on (UserForm.User_id = user.id) 
where UserForm.id = 103

That way you will not store redundant data. Note that if you never need to uniquely access a individual property you can drop the unique id on the properties table.

And don't worry about MySQL cluster, just start with vanilla MySQL (5.x) with MySAM tables and if that is not fast enough then you can start thinking about tricks, but below one million record I would not bother.


The problems with having many small databases is well-understood: - inefficient use of disk and memory when indexes and tables are small - mysql server scales poorly when #databases > 100 and #tables>10000 - admin/management nightmare But some advantages: - isolation of customer provides better security - dump/load/lock/alter an individual customer's tables without impacting others - easier to manage multiple versions and custom schema additions

Having one large database, with sharding, ssounds good but there are disadvantages: - all customers are locked into the same schema, upgrades must happen for all at once - poor security for the individual customer - requires modification of code to manage shards - very difficult to change shards once they're established - "bad neighbor" effect: one over-active customer impacts the SLA of neighbors in the shard The main advantage is that it scales well.

Full disclosure: I work at ParElastic. But I sincerely believe we have the only truly complete solution to this problem with our virtual multi-tenant database: - each customer sees a complete, isolated instance of their own database - independent admin commands for their database - scalable sharding under-the-cover to distribute data across and within a user - per-customer schema management supports rolling upgrades and custom extensions - secure, cross-customer queries using normal SQL

If you're interested, you can download a pre-packaged ParElastic environment for free in the https://aws.amazon.com/marketplace (search "parelastic"). Or check out http://parelastic.com.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜