开发者

Count distinct co-occurrences

I have a database with a listing of documents and the words within them. Each row represents a term. What I'm looking to do is to开发者_运维知识库 count how many documents a word occurs in.

So, given the following:

+  doc  +  word  +
+-------+--------+
+   a   +  foo   +
+-------+--------+
+   a   +  foo   +
+-------+--------+
+   a   +  bar   +
+-------+--------+
+   b   +  bar   +
+-------+--------+

I'd get a result of

+  word  +  count  +
+--------+---------+
+  foo   +    1    +
+--------+---------+
+  bar   +    2    +
+--------+---------+

Because foo occurs in only one document (even if it occurs twice within that doc) and bar occurs in two documents.

Essentially, what (think) I should be doing is a COUNT of the words that the following query spits out,

SELECT DISTINCT word, doc FROM table

..but I can't quite figure it out. Any hints?


You can actually use distinct inside count, like:

select  word
,       count(distinct doc)
from    YourTable
group by
        word


This may be an aside, but i'm guessing this is not the best way to do this. Why are you tracking every word in every document? Take a look at Oracle Intermedia. It was built for this sort of thing (specifically text search).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜