Flaws in Shared Memory of Massively Multi-Threaded Designs
I am trying to create my fir开发者_Python百科st application of multi-threading, one that is scalable to multi-core technology. Its inspiration comes from the concept of a event-driven spiking neural network.
The design is a little like this: The data structure of the algorithm is stored in 1 location in memory, in the form of instances of classes. An example of a task that can be performed on this structure is a neuron spiking: it will modify several values in the neuron and connected neurons, and identify any future tasks that may need to be performed. The tasks to be performed are added a queue. There are several threads whose only function is to pull a task from the queue, perform the task, and lather rinse repeat. Any updates to values can be performed in any order, as long as they are performed. Small but rare errors that result from this parallelism would have a statistically insignificant effect on the performance of the system.
This design does not use any memory other than shared memory (except for possibly a small amount of dedicated memory used for calculations). I've recently watched a few lectures where the speaker implied that the use of shared memory in multi-core and GPU applications was very slow. Even though I have a few ideas as to why that might be the case, I'd like to find out from people who have experience with this sort of thing, and maybe be directed to a useful resource to help me out.
Accessing shared state from multiple threads in multicore system can be slow due to CPU cache coherency protocol. That is every change in the shared state must be reflected in the cache lines of all the cores.
http://msdn.microsoft.com/en-us/magazine/cc163715.aspx#S2 provides good explanation why accessing shared data from multiple threads can be slow and what can be done about it.
精彩评论