Copying a tree to GPU memory
i have a tree of nodes and that i want to copy to GPU memroy. the Node looks like this:
struct Node
{
char *NodeName;
int NumberOfChildren;
Node *Children;
Node *Parent;
}
every node have a dynamic number of children but a single parent. how can i copy this tree to CUDA global memory? should i BFS through the tree and allocate/copy data to GPU? or开发者_运维知识库 can i use a single cuda memory copy instruction?
I am not sure of the source of your data but, you could use a flat memory space and use an index offset as the index for accessing the memory.
i.e.
Node would be defined as:
struct Node
{
unsigned int name;
unsigned int number_of_children;
unsigned int parent;
unsigned int children;
}
You would malloc one big block of memory and build your tree in there. (Keep a memory counter of the last place you inserted an item).
You do this for adding the strings as well.
This way you would have one continuous lump of memory and it would be a simple memory copy. The access the items a simple cast will work.
Does mean re-writing the tree and string code, but would keep it consistent. If you don't know how big your memory is going to be, you can do this is pages and change the references to be a pair of ints that will make allocating memory easier.
Peter.
PS: Embedded engineer not a CUDA programmer, but have encountered similar problems move trees across processors without having to do parsing.
In general you want to use a single memory copy, as multiple small copies will kill performance. Probably the correct thing to do is to keep track of the total size needed while inserting into the tree (or walk the tree to compute it), allocate that (or a larger) amount and then do a single data transfer. If you later need to copy a larger tree than was allocated, free that memory and allocate a new chunk.
Unfortunately, all of the pointers will be invalid on the gpu, so you may to to expand your structure to something like:
struct Node { char *NodeName; int NumberOfChildren; Node *Children; /* children on host */ Node *Parent; /* parent on host */ Node *d_children; /* children on device */ Node *d_parent; /* parent on device */ }
and then walk the tree after the allocation, assigning to the new nodes.
In terms of performance, you definitely want to avoid multiple small data transfers.
精彩评论