开发者

Int PK inner join Vs Guid PK inner Join on SQL Server. Execution plan

I just did some testing for Int PK join Vs Guid PK.

Tables structure and number of records looking like that:

Int PK inner join Vs Guid PK inner Join on SQL Server. Execution plan

Performance of CRUD operations using EF4 are pretty similar in both cases.

There is well known statement that Int PK has better performance rather than strings when used in joins. So SQL server execution plan with INNER JOINS are complet开发者_如何学编程ely different

Here is an execution plan:

Int PK inner join Vs Guid PK inner Join on SQL Server. Execution plan

As i understand according with execution plan from above Int join has better performance because it is taking less resources for Clustered index scan and it is go in two ways, am i right?

May be some one may explain this execution plan in more details?

Is this example is enough to show that Int PK has better performance in joins?


Kimberly Tripp (the Queen of Indexing) has an excellent blog post on the topic:

Disk space is cheap.... that's not the point!

She nicely shows how the argument of "disk space is cheap - using GUID instead of INT doesn't hurt" is totally bogus in many ways.


I'm not totally sure I understand what you are trying to achieve or find out from this test, but here are a few random throughts that popped into my mind as I read your question...

1) In a real life use case you are probably not going to join two entire tables together, but there will be filters on other columns etc, reducing the records to be joined in one or both tables. This will affect what type of join algorithm is most suitable/most effective.

The plans above are the results of joining two tables together, but if you were to filter one or both of the tables on some other column then the optimizer might go for a completely different join type.

2) Which type of join is best when joining GUID columns depend a lot on how the guids are generated. If you're joining a lot of guids that are completely random (e.g. generated with SQL Server's NewID() or CLR Guid.NewGuid()) then a hash join is probably the best choice. If however you're joining a smaller set of sequential (newsequentialid() / UuidCreateSequential()), or even identical guids, then a loop join can often be the most efficient choice.

The optimizer uses index statistics to determine what type of join to use, but sometimes for complex queries with many guid joins it can be necessary to force join type with optimizer hints.


In short, if what you're trying to do is decide whether you should use GUID or INT PKs then a more real-world test is a better choice. Create tables matching your use case, populate them with an ample amount of somewhat realistic sample data and do some of the types of queries you envision that you will be doing down the line. Joining the entire contents of two dummy tables together doesn't really tell anything about the I/O impact you could see from using Guid keys, or what the execution plan will look like for other queries involving int vs guid keys.

If using Guid keys, consider the different options for generating them and keep in mind that using sequential guids is often a good way to avoid excessive page reads if you're joining a lot of records...


If you think about how, internally, a computer compares values, it becomes obvious.

  • Comparing 2 integers is a fast, single, operation.
  • Comparing 2 16-byte GUIDs will take several instructions (or one lengthy one).

In addition, GUIDs use 4 times as much space, which will produce more paging, poorer cache usage and so forth.

Kimberly Tripp's post mentioned by Marc proves this.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜