Divide a list of numbers into smaller list with "sum" approximately same

2022-12-18 07:50 问答作者：

I execute around 2000 tests on grid, each test being run as separate task on grid. The tests do have rather big startup time. Total execution takes 500 hours, finishes in less than 10 hours on 60 node SunGridEngine. Runtime of tests vary from 5 minutes to 90 minutes. Combining tests without much intelligence gave some performance gain. I would like to create "tasks" that are approximately equal size. How can I do so?

(what we do now: Sorting all tests and keep adding till the sum of execution time is approximately开发者_Go百科 5 hours. Looking for some thing better )

Doing this optimally is NP-complete. This is a variation of the partition problem, which is a special case of the subset sum problem, which itself a special case of the knapsack problem.

In your case you probably don't need an exact solution, so you can probably use some heuristics to get something "good enough" in a reasonable amount of time. See the Methods section of the partition problem page for a description of some approaches.

What you are looking for is the Partition problem for k sets.

There is som literature about k=3, called the 3-partition problem. This is NP complete in the strong sense.

There are many heuristics that should give an approximate result quickly.

I suggest you start here: http://en.wikipedia.org/wiki/Partition_problem

Hope this helps.

This is a version of the subset-sum problem, and is NP-complete. Your best bet is to employ some subset-sum heuristics.

Your problem sounds a little like a shop scheduling problem. There are all kinds of different sequencing approaches some of which are described here. Sorting in increasing order of processing time, for instance, will minimized the mean waiting time and a whole bunch of other measures. If you elaborate a bit more on the objective, the setup times, the processing time, and any interdependence that would help.

Looking at the links Laurence posted I thought I would try whipping something up. The algorithm is to assign the longest test to the shortest task list (repeat until all the tests are assigned). Using your examples and random test times the std deviation was pretty low, less than 2 minutes in running it several times (code in C#, but nothing that wouldn't be trivial to convert):

    private static void BuildJobs()
    {
        PriorityQueue<Task> tasks = new PriorityQueue<Task>();

        //create a task list for each node
        for (int i = 0; i < 60; i++)
        {
            Task t = new Task();
            tasks.Enqueue(t);
        }

        //get the list of tests, in order from longest to shortest
        int[] testList = new int[2000];

        for (int i = 0; i < testList.Length; i++)
        {
            testList[i] = random.Next(5, 90);
        }

        Array.Sort<int>(testList);
        Array.Reverse(testList);

        // add the longest running test to the current shortest task list
        foreach (int time in testList)
        {
            Task t = tasks.Dequeue();
            t.addTest(time);
            tasks.Enqueue(t);
        }

        Debug.WriteLine(CalculateStdDev(tasks));

    }

继续阅读：algorithm graph grid

Divide a list of numbers into smaller list with "sum" approximately same

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？