Maximize sum of table where each number must come from unique row and column
Suppose we have a table of numbers like this (we can assume it is a square table):
20 2 1 3 4
5 1 14 8 9
15 12 17 17 11
16 1 1 15 18
20 13 15 5 11
Your job is to calculate the maximum sum of n numbers where n is the number of rows or columns in the table. The catch is each number must come from a unique row and column.
For example, selecting the numbers at (0,0), (1,1), (2,2), (3,3), and (4,4) is acceptable, but (0,0), (0,1), (2,2), (3,3), and (4,4) is not because the first two numbers were pulled from the same row.
My (laughable) solution to this problem is iterating through all the possible permutations of the rows and columns. This works for small grids but, of course, it is incredibly slow as n gets big. It has O(n!) time complexity, if I'm not mistaken (sample Python code below).
I really think this can be solved in better time, but I'm not coming up with anything sufficiently clever.
So my question is, what algorithm should be used to solve this?
If it helps, this problem seems similar to the knapsack problem.
import itertools
import re
grid = """20 2 1 3 4
5 1 14 8 9
15 12 17 17 11
16 1 1 15 18
20 13 15 5 11"""
grid = [[int(x) for x in re.split("\s+", line)] for line in grid.split("\n")]
possible_column_indexes = itertools.permutations(range(len(grid)))
max_sum = 0开发者_开发百科
max_positions = []
for column_indexes in possible_column_indexes:
current_sum = 0
current_positions = []
for row, col in enumerate(column_indexes):
current_sum += grid[row][col]
current_positions.append("(%d, %d)" % (row, col))
if current_sum > max_sum:
max_sum = current_sum
max_positions = current_positions
print "Max sum is", max_sum
for position in max_positions:
print position
This is the maximum cost bipartite matching problem. The classical way to solve it is by using the Hungarian algorithm.
Basically you have a bipartite graph: the left set is the rows and the right set is the columns. Each edge from row i
to column j
has cost matrix[i, j]
. Find the matching that maximizes the costs.
For starters, you can use dynamic programming.
In your straightforward approach, you are doing exactly the same computation many, many times.
For example, at some point you answer the question: "For the last three columns with rows 1 and 2 already taken, how do I maximize the sum?" You compute the answer to this question twice, once when you pick row 1 from column 1 and row 2 from column 2, and once when you pick them vice-versa.
So don't do that. Cache the answer -- and also cache all similar answers to all similar questions -- and re-use them.
I do not have time right now to analyze the running time of this approach. I think it is O(2^n) or thereabouts. More later maybe...
精彩评论