开发者

Optimal Strategy to Resolve Performance in Search Operations - SQL Server 2008

I'm working on a mobile website which is growing in popularity and this is leading to growth in some key database tables - and we're starting to see some performance issues when accessing those tables. Not being database experts (nor having the money to hire any at this stage) we're struggling to understand what is causing the performance problems. Our tables are not that big so SQL Server should be able to handle them fine and we've done everything we know to do in terms of optimising our queries. So here's the (pseudo) table structure:

[user] (approx. 40,000 rows, 37 cols):

id               INT (pk)
content_group_id INT (fk)
[username]       VARCHAR(20)
...

[content_group] (approx. 200,000 rows, 5 cols):

id    INT (pk)
title VARCHAR(20)
...

[content] (approx. 1,000,000 rows, 12 cols):

id                  INT (pk)
content_group_id    INT (fk)
content_type_id     INT (fk)
content_sub_type_id INT (fk)
...

[content_type] (2 rows, 3 cols)

id INT (pk)
...

[content_sub_type] (8 rows, 3 cols)
id              INT (pk)
content_type_id INT (fk)
...

We're expecting those row counts to grow considerably (in particular the user, content_group, and content tables). Yes the user table has quite a few columns - and we've identified some which can be moved into other tables. There are also a bunch of indexes we've applied to the affected tables which have helped.

The big performance problems are the stored procedures we're using to search for users (which include joins to the content table on the content_group_id field). We have tried to modify the WHERE and AND clauses using various different approaches and we think we have got them as good as we can but still it's too slow.

One other thing we tried which hasn't helped was to put an indexed view over the user and content tables. There was no noticeable performance gain when we did this so we've abandoned that idea due to the extra level of complexity inherent in having a view layer.

So, what are our options? We can think of a few but all come with pros and cons:

Denormalise of the Table Structure

Add multiple direct foreign key constraints between the user and content tables - so there would be a different foreign key to the content table for each content sub type.

Pros:

  • Joining the content table will be more optimal by using its primary key.

Cons:

  • There will be a lot of changes to our existing stored procedures and website code.
  • Maintaining up to 8 additional foreign keys (more realistically we'll only use 2 of these) will not be anywhere near as easy as the current single key.

More Denormalisation of the Table Structure

Just duplicate the fields we need from the content table into the user table directly.

Pros:

  • No more joins for to the content table - which significantly reduces the work SQL has to do.

Cons

  • Same as above: extra fields to maintain in the user table, changes to SQL and website code.

Create a Mid-Tier Indexing Layer

Using something like Lucene.NET, we'd put an indexing layer above the database. This would in theory improve performance of all search and a开发者_Python百科t the same time decrease the load on the server.

Pros:

  • This is a good long-term solution. Lucene exists to improve search engine performance.

Cons:

  • There will be a much larger development cost in the short term - and we need to solve this problem ASAP.

So those are the things we've come up with and at this stage we're thinking the second option is the best - I'm aware that denormalising has it's issues however sometimes it's best to sacrifice architectural purity in order to get performance gains so we're prepared to pay that cost.

Are there any other approaches which might work for us? Are there any additional pros and/or cons with the approaches I've outlined above which may influence our decisions?


non clustered index seek from the content table using the content_sub_type_id. This is followed by a Hash Match on the content_group_id against the content table

This description would indicate that your expensive query filters the content table based on fields from content_type:

select ...
from content c
join content_type ct on c.content_type_id = ct.id
where ct.<field> = <value>;

This table design, and the resulting problem you just see, is quite common actually. The problems arise mainly due to the very low selectivity of the lookup tables (content_type has 2 rows, therefore the selectivity of content_type_id in content is probably 50%, huge). There are several solutions you can try:

1) Organize the content table on clustered index with content_type_id as the leading key. This would allow the join to do range scans and also avoid the key/bookmark lookup for the projection completeness. As a clustered index change, it would have implications on other queries so it has to be carefully tested. The primary key on content would obviously have to be enforced with a non-clustered constraint.

2) Pre-read the content_type_id value and then formulate the query without the join between content and content_type:

select ...
from content c
where c.content_type_id = @contentTypeId;

This works only if the selectivity of content_type_id is high (many distinct values with few rows each), which I doubt is your case (you probaly have very few content types, with many entries each).

3) Denormalize content_Type into content. You mention denormalization, but your proposal of denormalizing content into users makes little sense to me. Drop the content_type table, pull in the content_type fields into the content table itself, and live with all the denormalization problems.

4) Pre-join in a materialized view. You say you already tried that, but I doubt that you tried the right materialized view. You also need to understand that only Enterprise Edition uses the materialized view index automatically, all other editions require the NOEXPAND hint:

create view vwContentType 
with schemabinding
as 
select content_type_id, content_id
from dbo.content c
join dbo.content_type_id ct on c.content_type_id = ct.content_type_id;

create unique clustered index cdxContentType on vwContentType (content_type_id, content_id);

select ...
from content c
join vwContentType ct with (noexpand)
on ct.content_id = c.content_id
where ct.content_type_id = @contentTypeId;

Solutions 2), 3) and 4) are mostly academic. Given the very low selectivity of content_type_id, your only solution that has a standing chance is to make it the leading key in the clustered index of content. I did not expand the analysis to content_Sub_type, but with only 8 rows I'm willing to bet it has the very same problem, which would require to push it also into the clustered index (perhaps as the second leading key).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜