Is this a linear programming problem?
I have been pulling my hair out on one problem... The overall problem is complicated... but let me try my best to explain the part that really matters...
I have a graph where each edge represents the correlation between the connected two node开发者_JS百科s. Each node is a time course (TC) (i.e., 400 time points), where events will occur at different time points. The correlation between two nodes is defined as the percentage of overlapped events. For the simplicity of this example, let us assume that the total number of events happening on each node is the same as $tn$. And if two TCs (nodes) have $on$ overlapped events (i.e., events that happened at exactly the same time point). Then, the correlation can be defined simply as $on$/$tn$.
Now, I have a network of 11 nodes; and I know the correlation between every two nodes. How do I generate the TCs for all 11 nodes that meet the correlation constraints???
It is easy to do this for two nodes when you know the correlation between the two. Assume TC_1 and TC_2 have a correlation value of 0.6, which means there are 60 percent overlapped events in two TCs. Also, assume that the total number of events are the same for both TC_1 and TC_2 as $tn$. A simple algorithm to place the events in the two TCs is first randomly pick 0.6*$tn$ time points, and consider those as time slots where overlapped events happened in both TCs. Next, randomly pick (1-0.6)*$tn$ time points in TC_1 to place the rest of the events for TC_1. Finally, randomly pick (1-0.6)*$tn$ time points in TC_2 where no events happened in correspondent time points in TC_1.
However, it starts to get harder, when you consider a 3-node network, where the generated three TCs need to meet all three correlation constraints (i.e., 3 edges)... It's hardly possible to do this for a 11-node network...
Does this make any sense to you? Please let me know if it's not...
I was thinking that this is just a trick computer science programing issue... but the more I think about it, it's more like a linear programing problem, isn't it?
Anyone has a reasonable solution? I am doing this in R, but any code is ok...
I think there is a simpleminded linear programming approach. Represent a solution as a matrix, where each column is a node, and each row is an event. The cells are either 0 or 1 to say that an event is or is not associated with a given node. Your correlation constraint is then a constraint fixing the number of 11s in a pair of columns, relative to the number of 1s in each of those columns, which you have in fact fixed ahead of time.
Given this framework, if you treat each possible row as a particular item, occuring X_i times, then you will have constraints of the form SUM_i X_i * P_ij = K_j, where P_ij is 0 or one depending on whether possible row i has 11 in the pair of columns counted by j. Of course this is a bit of a disaster for large numbers of nodes, but with 11 nodes there are 2048 possible rows, which is not completely unmanageable. The X_i may not be linear, but I guess they should be rational, so if you are prepared to use astounding numbers of rows/events you should be OK.
Unfortunately, you may also have to try different total column counts, because there are inequalities lurking around. If there are N rows and two columns have m and n 1s in them there must be at least m + n - N 11s in that column pair. You could in fact make the common number of 1s in each column come out as a solution variable as well - this would give you a new set of constraints in which the Q_ij are 0 and one depending on whether column row i has a 1 in column j.
There may be a better answer lurking out there. In particular, generating normally distributed random variables to particular correlations is easy (when feasible) - http://en.wikipedia.org/wiki/Cholesky_decomposition#Monte_Carlo_simulation and (according to Google R mvrnorm). Consider a matrix with 2^N rows and 2^N-1 columns filled with entries which are +/-1. Label the rows with all combinations of N bits and the columns with all non-zero columns of N bits. Fill each cell with -1^(parity of row label AND column label). Each column has equal numbers of +1 and -1 entries. If you multiple two columns together element by element you get a different column, which has equal numbers of +1 and -1 entries - so they are mutually uncorrelated. If your Cholesky decomposition provides you with matrices whose elements are in the range [-1, 1] you may be able to use it to combine columns, where you combine them by picking at random from the columns or from the negated columns according to a particular probability.
This also suggests that you might possibly get by in the original linear programming approach with for example 15 columns by choosing from amongst not the 2^15 different rows that are all possibilities, but from amongst the 16 different rows that have the same pattern as a matrix with 2^4 rows and 2^4-1 columns as described above.
If there exist solution (you can have problem whit no solution) you can represent this as system of linear equtions.
x1/x2 = b =>
x1 - b*x2 = 0
or just
a*x1 + b*x2 = 0
You should be able to transform this into solving system of linear equations or more precisely homogeneous system of linear equations since b in Ax=b is equal 0.
Problem is that since you have n nodes and n*(n-1)/2 relations (equations) you have to many relations and there will be no solution.
You might represent this problem as
Minimize Ax where x > 0 and x.x == konstant
You can represent this as a mixed integer program.
Suppose we have N
nodes, and that each node has T
total time slots. You want to find an assignment of events to these time slots. Each node has tn <= T
events. There are M
total edges in your graph. Between any pair of nodes i
and j
that share an edge, you have a coefficent
c_ij = overlap_ij/tn
where overlap_ij
is the number of overlapping events.
Let x[i,t]
be a binary variable defined as
x[i,t] = { 1 if an event occurs at time t in node i
= { 0 otherwise.
Then the constraint that tn
events occur at node i
can be written as:
sum_{t=1}^T x[i,t] == tn
Let e[i,j,t]
be a binary variable defined as
e[i,j,t] = { 1 if node i and node j share an event a time t
= { 0 otherwise
Let N(i)
denote the neighbors of node i
. Then we have that at each time t
sum_{j in N(i)} e[i,j,t] <= x[i,t]
This says that if a shared event occurs in a neighbor of node i
at time t
, then node i
must have an event at time t
. Furthermore, if nodei
has two neighbors u
and v
, we can't have e[i,u,t] + e[i,v,t] > 1
(meaning that two events would share the same time slot), because the sum over all neighbors is less than x[i,t] <= 1
.
We also know that there must be overlap_ij = tn*c_ij
overlapping events between node i
and node j
. This means that we have
sum_{t=1}^T e[i,j,t] == overlap_ij
Putting this all together you get the following MIP
minimize 0
e, x
subject to sum_{t=1}^T x[i,t] == tn, for all nodes i=1,...,N
sum_{j in N(i)} e[i,j,t] <= x[i,t],
for all nodes i=1,...,N and all time t=1,...,T
sum_{t=1}^T e[i,j,t] == overlap_ij for all edges (i,j)
between nodes
x[i,t] binary for i=1,...,N and t=1,...,T
e[i,j,t] binary for all edges (i,j) and t=1,...,T
Here the objective is zero, since your model is a pure feasibility problem. This model has a total of T*N + M*T
variables and N + N*T + M
constraints.
A MIP solver like Gurobi can solve the above problem, or prove that it is infeasible (i.e. no solution exists). Gurobi has an interface to R.
You can extract the final time series of events for the i
th node by looking at the solution vector x[i,1], x[i,2], ..., x[i,T]
.
精彩评论