Better way to copy several std::vectors into 1? (multithreading)

2023-01-06 17:13 问答作者：

Here is what I'm doing:

I'm taking in bezier points and running bezier interpolation then storing the result in a std::vector<std::vector<POINT>.

The bezier calculation was slowing me down so this is what I did.

I start with a std::vector<USERPOINT> which is a struct with a point and 2 other points for bezier handles.

I divide these up into ~4 groups and assign each thread to do 1/4 of the work. To do this I created 4 std::vector<std::vector<POINT> > to store the results from each thread.In the end all the points have to be in 1 continuous vector, before I used multithreading I accessed this directly but now I reserve the size of the 4 vectors produced by the threads and insert them into the original vector, in the correct order. This works, but unfortunatly t开发者_如何学运维he copy part is very slow and makes it slower than without multithreading. So now my new bottleneck is copying the results to the vector. How could I do this way more efficiently?

Thanks

Have all the threads put their results into a single contiguous vector just like before. You have to ensure each thread only accesses parts of the vector that are separate from the others. As long as that's the case (which it should be regardless -- you don't want to generate the same output twice) each is still working with memory that's separate from the others, and you don't need any locking (etc.) for things to work. You do, however, need/want to ensure that the vector for the result has the correct size for all the results first -- multiple threads trying (for example) to call resize() or push_back() on the vector will wreak havoc in a hurry (not to mention causing copying, which you clearly want to avoid here).

Edit: As Billy O'Neal pointed out, the usual way to do this would be to pass a pointer to each part of the vector where each thread will deposit its output. For the sake of argument, let's assume we're using the std::vector<std::vector<POINT> > mentioned as the original version of things. For the moment, I'm going to skip over the details of creating the threads (especially since it varies across systems). For simplicity, I'm also assuming that the number of curves to be generated is an exact multiple of the number of threads -- in reality, the curves won't divide up exactly evenly, so you'll have to "fudge" the count for one thread, but that's really unrelated to the question at hand.

std::vector<USERPOINT> inputs; // input data   
std::vector<std::vector<POINT> > outputs; // space for output data

const int thread_count = 4;

struct work_packet {           // describe the work for one thread
    USERPOINT *inputs;         // where to get its input
    std::vector<POINT> *outputs;   // where to put its output
    int num_points;                // how many points to process
    HANDLE finished;               // signal when it's done.
};

std::vector<work_packet> packets(thread_count); // storage for the packets.
std::vector<HANDLE> events(thread_count);       // storage for parent's handle to events

outputs.resize(inputs.size);                    // can't resize output after processing starts.

for (int i=0; i<thread_count; i++) {
    int offset = i * inputs.size() / thread_count;
    packets[i].inputs = &inputs[0]+offset;
    packets[i].outputs = &outputs[0]+offset;
    packets[i].count = inputs.size()/thread_count;
    events[i] = packets[i].done = CreateEvent();

    threads[i].process(&packets[i]);
}


// wait for curves to be generated (Win32 style, for the moment).
WaitForMultipleObjects(&events[0], thread_count, WAIT_ALL, INFINITE);

Note that although we have to be sure that the outputs vector doesn't get resized while be operated on by multiple threads, the individual vectors of points in outputs can be, because each will only ever be touched by one thread at a time.

If the simple copy in between things is slower than before you started using Mulitthreading, it's entirely likely that what you're doing simple isn't going to scale to multiple cores. If it's something simple like bezier stuff I suspect that's going to be the case.

Remember that the overhead of making the threads and such has an impact on total run time.

Finally.. for the copy, what are you using? Is it std::copy?

Multithreading is not going to speed up your process. Processing the data in different cores, could.

继续阅读：multithreading vector

Better way to copy several std::vectors into 1? (multithreading)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？