minimal checks to find repeats in a list
Given a sequence (d1, d2, ..., dn), I want to evaluate the product (1 - Dij) for all i != j where Dij = 1 if di = dj and 0 otherwise.
The code I have only checks Dij when i
prod = 1;
for (int i=1; i<n; ++i) {
for (int j=i; j<=n; ++j) {
prod *= (1 - Dij);
}
}
I know I can stop when I get Dij=1, but what I'm trying to do is get a minimal expression of the Dij's to check. This way I have one expression and then I can use difference sequences and evaluate it. So I know that I can do i<j
instead of i != j
. So I want to expand out this product and get something like this for n=3:
(1 - D12) (1 - D13) (1 - D23) = 1 - D12 - D13 - D23 + D12*D13 + D12*D23 + D13*D23 - D12*D13*D23
But there is more that I can do. This expression is actually always equal to
1 - D12 - D13 - D23 + 3 * D12*D13 - D12*D13*D23
My questions are:
Why is D12 * D13 = D12 * D23? This is always true (meaning it doesn't matter what the d sequence is) but I don't really get why because it seems to me that this means D13 = D23 which isn't always true (it depends on the d sequence). This is the relation that helps make the expression smaller.
How can I find all the relations like this 开发者_StackOverflow中文版and get a minimal expression? Is the expression above minimal? I don't even know.
You are trying to determine if D contains any duplicates or not. Ultimately, that requires you to compare every entry with each other, which is simply enumerating all unique combinations of two elements. That ends up being N*(N-1)/2
. You can do a little better by sorting D first and then searching for duplicate adjacent pairs (O(N*log(N)
), or, assuming you are sticking to a bounded range of integers, you can reduce it to linear time with a bit vector, or if you are feeling adventurous, a radix sort.
I can answer 1 for you. Consider these two cases:
Case 1: D13 = D23
Here you can just multiply by D12
on both sides to get D12 * D13 = D12 * D23
.
Case 2: D13 != D23
This means that either d1 = d3
XOR d2 = d3
but not both. Therefore we know that d1 != d2
. This implies that D12 = 0
. Therefore
D12 * D13 = 0 * D13 = 0 = 0 * D23 = D12 * D23
The problem with your logic when you think this implies D13 = D23
is that you cannot divide by 0
, and D12
might be 0
(as always happens in the second case).
Your second question is interesting, and I do not know the answer off the top of my head, but here are some observations that may be helpful.
Draw the numbers 1, 2, ..., n
in a row:
1 2 3 ... n
Given an expression D_(i1,j1) * D_(i2,j2) * ... * D_(ik,jk)
, make an arc from i1 to j1
and i2 to j2
and so on. This turns that row into a graph (vertices are numbers, edges are these arcs).
Each connected component of that graph represents a subset of the number 1, 2, ..., n
, and taken as a whole this gives us a set partition of {1, 2, ..., n}
.
Fact: Any two terms that have the same corresponding set partition are equal.
Example:
D12 * D23 = D12 * D13
---------
| |
1 -- 2 -- 3 = 1 -- 2 3
Sometimes this fact will mean the degree is the same, as in the case above, and sometimes the degree will decrease as in
D12 * D13 * D23
---------
| |
1 -- 2 -- 3
The upshot is that now you can express the product (1 - Dij) as a sum over set partitions:
\prod_{i<j} ( 1 - Dij ) = \sum_{P set partition of \{1,2,...,n\}} c_P * m_P
where the monomial term is given by
mP = mP1 * mP2 * ... * mPk
when P = P1 union P2 union ... union Pk
and if Pi = { a < b < c < ... < z }
then
m_Pi = Dab * Dac * ... * Daz
Finally, the coefficient term is just
c_P = \prod (#P1)! (#P2)! ... (#Pn)!
Having worked this out, I'm now certain this belongs on http://math.stackexchange.com rather than here.
I haven't followed the math, but isn't this something you would code using a hashtable, or possibly even a sparse bit-array if you know the size of the di are bounded? Just iterate over the list, filling in your data structure with a "1" at the position corresponding to the value of di - if it's already 1, return 0. If you complete (in n steps), return 1. It should be O(n).
精彩评论