Approximate algorithm for minimum raggedness word wrap
I'm looking for an efficient algorithm (ideally, C-like pseudocode) to give an approximate solution to the following partition problem. Given a sequence S = {a_i : i=1,...,n} and a bound B, determine a partition of S into some number m of contiguous subsequences as follows. For each k, let s_k be the sum of the elements of k-th subsequence. The partition must satisfy:
- s_k ≤ B for each k (assume that the values of B and the a_i are such that this is always possible)
- m is minimal (no smaller partition s开发者_如何学运维atisfies #1);
- some measure of dispersion (for example, the variance or the maximum pair-wise difference among the s_k) is minimal among all partitions of size m.
I know that this is closely related to the minimum raggedness word wrap algorithm. I am looking for something that can give a "good enough" solution for small values of n (less than 15) without pulling out heavy ammunition like dynamic programming, but also something a little faster than brute force.
Let S denote the sum of all the items and let n be the number of items. If you fit the items on m lines, you will have at least S/m weight on every line. Because S/m ≤ B, you get m ≥ S/B. I would start from ceiling(S/B) as the value of m and then increase m by one until a solution is found.
When m is set and n is given, it's just a matter of recursively searching for the correct boundaries. You guess the boundaries between lines one by one (recursively), and backtrack when solution becomes infeasible. If you find a solution, you store it for reference because it might be the best dispersion-wise. Eventually you choose the best solution. If there are no solutions, then you increase m by one and redo.
I ended up doing a simple backtracking search. First I calculated m as described by antti.huima (that part was very easy and fixed m). I then allocated items greedily (packing each partition in order as much as possible). Finally I ran a backtracking algorithm to assign a delta to the start index for each partition boundary, starting with the last partition. Each delta_j represents how far back to move the start of partition j from the greedy start position. It's not hard to show that 0 <= delta_j < (size of partition j-1) for each j > 1 (otherwise the greedy algorithm would have behaved differently). This greatly cuts down on the search space as compared to a search to fill the partitions from 1 to m.
I suppose some sort of heuristic variant of this search is possible and would be more efficient, but I didn't try very hard to find one.
精彩评论