开发者

SQL Insert performance question

I have this table phonebook SQL Server 2005:

username(PK) Serial(PK) contact_name 开发者_Python百科 contact_adr      contact_email  contact_phone 
bob          1           Steve         12 abc street    steve@bb.com   1234          
bob          2           John          34 xyz street    john@bb.com    5345          
bob          3           Mark          98 ggs street    mark@bb.com    1234          
patrick      4           lily          77 fgs street    lily@bb.com    1234          
patrick      5           mily          76 fgs street    mily@bb.com    1234          
von          8           jim           6767 jsd way     jim@bb.com     4564          

Now you can see the phonebook stores all contacts of same user together. Storing this way has advantages which I can't avoid.

My question is: If I have 100 million entries in the table for all users, will my future insertion in the above table be very expensive?

Since SQL Engine needs to find the actual location where to enter the data (I mean under which username)

I tested with 1 million rows, I don't see noticeable issues.

I am asking if anyone has this experience or suggestions for me?

Thanks


The approach that is optimal for an address book is a NOSQL hashed-table. There's no need for an index on the PK. The algorithm returns the "page" where the row identified by the PK can be found. The address book of the user is also stored with the user, as a denormalized relation. Insert overhead is negligible. Hashed-PK is optimized for insert/retrieval when the PK is known. Excellent for OLTP systems. Now if you want to do something like figure out who knows whom, so that a given user's contacts need to be related to the contacts of all other users, then you have a different can of worms. But a straightforward address-book application, where the contacts of a given user remain "private" to that user, then a hashed primary key system is superb.


One of the first principles in db design is data non-redundancy: your db table design doesn't comply to that principle as you have same data repeated many times. A resonable solution would be to create separate table for users, a separate table for contacts and a table for realationship between users and contacts.


It depends on the underlying database. Every implementation has something different under its sleeves.

But! Performance will almost definitely suffer if you use indexes on that table and you have many, many, many, many rows inside of it.


First of all, username doesn't seem to be a primary key for your table by itself. You will probably have to use it in combination with another field if you want it to work. At this point, I would rather use your serial column as primary key, and have an index on username to answer the query get bob's contacts efficiently.

You insert will certainly become slower as your table grow. But I don't think it will be too slow to avoid following this approach.


You can't force the data to be stored together. Are you re-sequencing the Serial upon an insert? How are you ensuring the data is "stored together"?

If you mean putting all this data in one table, then it really depends on your index structure. The more indexes on the table, the more processing that takes place on very insert. Since user tables are usually heavily queried and rarely inserted (relatively), they are usually indexed heavily, in which case inserts can be slow. The answer, as with almost every DB question is: "It depends".

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜