Suppose, I declare a local variable in a CUDA kernel function for each thread: float f = ...; // some calculations here
I only found a remark that local memory is slower than register memory, the two-per-thread types. Shared memory is supposed to be fast, but is it faster than local memory [of the thread]?
Does anyone know of any good solutions (Eclipse plugins presumably) for using Eclipse to develop in ActionScript 3?