Count distinct co-occurrences
I have a database with a listing of documents and the words within them. Each row represents a term. What I'm looking to do is to开发者_运维知识库 count how many documents a word occurs in.
So, given the following:
+ doc + word +
+-------+--------+
+ a + foo +
+-------+--------+
+ a + foo +
+-------+--------+
+ a + bar +
+-------+--------+
+ b + bar +
+-------+--------+
I'd get a result of
+ word + count +
+--------+---------+
+ foo + 1 +
+--------+---------+
+ bar + 2 +
+--------+---------+
Because foo occurs in only one document (even if it occurs twice within that doc) and bar occurs in two documents.
Essentially, what (think) I should be doing is a COUNT of the words that the following query spits out,
SELECT DISTINCT word, doc FROM table
..but I can't quite figure it out. Any hints?
You can actually use distinct
inside count
, like:
select word
, count(distinct doc)
from YourTable
group by
word
This may be an aside, but i'm guessing this is not the best way to do this. Why are you tracking every word in every document? Take a look at Oracle Intermedia. It was built for this sort of thing (specifically text search).
精彩评论