SQL Insert performance question
I have this table phonebook SQL Server 2005:
username(PK) Serial(PK) contact_name 开发者_Python百科 contact_adr contact_email contact_phone
bob 1 Steve 12 abc street steve@bb.com 1234
bob 2 John 34 xyz street john@bb.com 5345
bob 3 Mark 98 ggs street mark@bb.com 1234
patrick 4 lily 77 fgs street lily@bb.com 1234
patrick 5 mily 76 fgs street mily@bb.com 1234
von 8 jim 6767 jsd way jim@bb.com 4564
Now you can see the phonebook stores all contacts of same user together. Storing this way has advantages which I can't avoid.
My question is: If I have 100 million entries in the table for all users, will my future insertion in the above table be very expensive?
Since SQL Engine needs to find the actual location where to enter the data (I mean under which username)
I tested with 1 million rows, I don't see noticeable issues.
I am asking if anyone has this experience or suggestions for me?
Thanks
The approach that is optimal for an address book is a NOSQL hashed-table. There's no need for an index on the PK. The algorithm returns the "page" where the row identified by the PK can be found. The address book of the user is also stored with the user, as a denormalized relation. Insert overhead is negligible. Hashed-PK is optimized for insert/retrieval when the PK is known. Excellent for OLTP systems. Now if you want to do something like figure out who knows whom, so that a given user's contacts need to be related to the contacts of all other users, then you have a different can of worms. But a straightforward address-book application, where the contacts of a given user remain "private" to that user, then a hashed primary key system is superb.
One of the first principles in db design is data non-redundancy: your db table design doesn't comply to that principle as you have same data repeated many times. A resonable solution would be to create separate table for users, a separate table for contacts and a table for realationship between users and contacts.
It depends on the underlying database. Every implementation has something different under its sleeves.
But! Performance will almost definitely suffer if you use indexes on that table and you have many, many, many, many rows inside of it.
First of all, username doesn't seem to be a primary key for your table by itself. You will probably have to use it in combination with another field if you want it to work. At this point, I would rather use your serial
column as primary key, and have an index on username
to answer the query get bob's contacts efficiently.
You insert will certainly become slower as your table grow. But I don't think it will be too slow to avoid following this approach.
You can't force the data to be stored together. Are you re-sequencing the Serial upon an insert? How are you ensuring the data is "stored together"?
If you mean putting all this data in one table, then it really depends on your index structure. The more indexes on the table, the more processing that takes place on very insert. Since user tables are usually heavily queried and rarely inserted (relatively), they are usually indexed heavily, in which case inserts can be slow. The answer, as with almost every DB question is: "It depends".
精彩评论