开发者

Best (simple and efficient) way to query distinct tag list?

There is a big Table -- Blog (assume records larger than 10 thousand), structure like:

Blog_ID  Title  Tag
----------------------
1        AAA    T1|T2|T3
2        BBB    T2|T4
3        CCC    T3|T1|T2|T6|

..................

Blog Tags 开发者_如何学编程are split with '|', and definitely COULD duplicate, I want to write a query to get all distinct Tags as well as Tag appearance, result like:

Tags   Appearance Count
------------------------
T1     14
T2     35
T3     88
T4     45

............

Then what is the best (simple and efficient) way to achieve this please? Once more limitation, SQL Server 2000 capability is highly preferred since my web app host provider is using SQL Server 2000.

Appreciate any help!!


You have to normalize your data. Meaning, you shouldn't have more than one value in a single column -- no delimiters. To normalize the data, you need to create a table to hold the tags:

TAGS

  • tag_id (primary key)
  • tag_name

Then, you need a table that links a BLOG record to a particular tag:

BLOG_TAGS

  • blog_id (primary key)
  • tag_id (primary key)

The fun part is pulling the data out of your existing format using the pipe delimiter, to store it properly.

Then, you'll be able to get your desired output using:

   SELECT t.tag_name, 
          COUNT(bt.blog_id) AS appearanceCount
     FROM TAGS t
LEFT JOIN BLOG_TAGS bt ON bt.tag_id = t.tag_id


In general OMG Ponies is definitely right! But if you really have to do this, look at this code sample (I hope it is valid for SQL Server 2000):

create table #t (Blog_ID int, Title varchar(50), Tag varchar(4000));
insert into #t select 1, 'AAA', 'T1|T2|T3';
insert into #t select 2, 'BBB', 'T2|T4';
insert into #t select 3, 'CCC', 'T3|T1|T2|T6';

select Tag, count(*) as AppearanceCount from (
select substring(t.Tag,
        a.n+b.n+c.n+d.n+e.n+f.n+g.n+h.n+i.n+j.n+k.n+l.n,
        charindex('|',t.Tag+'|',a.n+b.n+c.n+d.n+e.n+f.n+g.n+h.n+i.n+j.n+k.n+l.n)
            - (a.n+b.n+c.n+d.n+e.n+f.n+g.n+h.n+i.n+j.n+k.n+l.n)) as Tag
from    (select 0 as n union select 1) a,
        (select 0 as n union select 2) b,
        (select 0 as n union select 4) c,
        (select 0 as n union select 8) d,
        (select 0 as n union select 16) e,
        (select 0 as n union select 32) f,
        (select 0 as n union select 64) g,
        (select 0 as n union select 128) h,
        (select 0 as n union select 256) i,
        (select 0 as n union select 512) j,
        (select 0 as n union select 1024) k,
        (select 0 as n union select 2048) l,
        #t t
where substring('|'+t.Tag,a.n+b.n+c.n+d.n+e.n+f.n+g.n+h.n+i.n+j.n+k.n+l.n,1)='|'
) x
group by Tag;

drop table #t;

You could (and perhaps should) use a UDF for generating the numbers. It get much easier to read:

seelct Tag, count(*) as AppearanceCount from (
select substring(t.Tag, n.n, charindex('|',t.Tag+'|',n.n) as Tag
from dbo.fn_Numbers(4000) n cross join #t t
where substring('|'+t.Tag,n.n,1)='|') x group by Tag;

(Thanks to Itzik Ben-Gan)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜