How can I efficiently find the subset of activities that stay within a budget and maximizes utility?
I am trying to develop an algorithm to select a subset of activities from a larger list. If selected, each activity uses some amount of a fixed resource (i.e. the sum over the selected activities must stay under a total budget). There could be multiple feasible subsets, and the means of choosing from them will be based on calculating the opportunity cost of the activities not selected.
开发者_如何学CEDIT: There are two reasons this is not the 0-1 knapsack problem:
- Knapsack requires integer values for the weights (i.e. resources consumed) whereas my resource consumption (i.e. mass in the knapsack parlance) is a continuous variable. (Obviously it's possible to pick some level of precision and quantize the required resources, but my bin size would have to be very small and Knapsack is
O(2^n)
in W. - I cannot calculate the opportunity cost a priori; that is, I can't evaluate the fitness of each one independently, although I can evaluate the utility of a given set of selected activities or the marginal utility from adding an additional task to an existing list.
The research I've done suggests a naive approach:
Define the powerset
For each element of the powerset, calculate it's utility based on the items not in the set Select the element with the highest utility
However, I know there are ways to speed up execution time and required memory. For example:
- fully enumerating a powerset is
O(2^n)
, but I don't need to fully enumerate the list because once I've found a set of tasks that exceeds the budget I know that any set that adds more tasks is infeasible and can be rejected. That is if{1,2,3,4}
is infeasible, so is{1,2,3,4} U {n}
, where n is any one of the tasks remaining in the larger list. - Since I'm just summing duty the order of tasks doesn't matter (i.e. if
{1,2,3}
is feasible, so are{2,1,3}
,{3,2,1}
, etc.). - All I need in the end is the selected set, so I probably only need the best utility value found so far for comparison purposes.
- I don't need to keep the list enumerations, as long as I can be sure I've looked at all the feasible ones. (Although I think keeping the duty sum for previously computed feasible sub-sets might speed run-time.)
I've convinced myself a good recursion algorithm will work, but I can't figure out how to define it, even in pseudo-code (which probably makes the most sense because it's going to be implemented in a couple of languages--probably Matlab for prototyping and then a compiled language later).
The knapsack problem is NP-complete, meaning that there's no efficient way of solving the problem. However there's a pseudo-polynomial time solution using dynamic programming. See the Wikipedia section on it for more details.
However if the maximum utility is large, you should stick with an approximation algorithm. One such approximation scheme is to greedily select items that have the greatest utility/cost. If the budget is large and the cost of each item is small, then this can work out very well.
EDIT: Since you're defining the utility in terms of items not in the set, you can simply redefine your costs. Negate the cost and then shift everything so that all your values are positive.
As others have mentioned, you are trying to solve some instance of the Knapsack problem. While theoretically, you are doomed, in practice you may still do a lot to increase the performance of your algorithm. Here are some (wildly assorted) ideas:
- Be aware of Backtracking. This corresponds to your observation that once you crossed out
{1, 2, 3, 4}
as a solution,{1, 2, 3, 4} u {n}
is not worth looking at. - Apply Dynamic Programming techniques.
- Be clear about your actual requirements:
- Maybe you don't need the best set? Will a good one do? I am not aware if there is an algorithm which provides a good solution in polynomial time, but there might well be.
- Maybe you don't need the best set all the time? Using randomized algorithms you can solve some
NP
-Problems in polynomial time with the risk of failure in 1% (or whatever you deem "safe enough") of all executions.
(Remember: It's one thing to know that the halting problem is not solvable, but another to build a program that determines whether "hello world" implementations will run indefinetly.)
I think the following iterative algorithm will traverse the entire solution set and store the list of tasks, the total cost of performing them, and the opportunity cost of the tasks not performed.
It seems like it will execute in pseudo-polynomial time: polynomial in the number of activities and exponential in the number of activities that can fit within the budget.
ixCurrentSolution = 1
initialize empty set solution {
oc(ixCurrentSolution) = opportunity cost of doing nothing
tasklist(ixCurrentSolution) = empty set
costTotal(ixCurrentSolution) = 0
}
for ixTask = 1:cActivities
for ixSolution = 1:ixCurrentSolution
costCurrentSolution = costTotal(ixCurrentSolution) + cost(ixTask)
if costCurrentSolution < costMax
ixCurrentSolution++
costTotal(ixCurrentSolution) = costCurrentSolution
tasklist(ixCurrentSolution) = tasklist(ixSolution) U ixTask
oc(ixCurrentSolution) = OC of tasks not in tasklist(ixCurrentSolution)
endif
endfor
endfor
精彩评论