How can I find all possible combinations of a string? Using CUDA
I am trying to speed up my algorithm by using CUDA to find all possible combination of a string. What is the best way I can achieve this?
example:abc
gives:
a
b
c
ab
ac
bc
i have nothing so far. i am not asking for code. i am just asking for the best way to do i开发者_高级运维t? an algorithm? a pseudocode? maybe a discussion?
The advantage to using CUDA is massive parallelism with potentially thousands of threads with little overhead. To that end, you have to figure out a way to divide the problem into small chunks without relying too much on communication between the threads. In this problem you have n
characters and each can be either present or absent in each output string. This yields 2^n
total output strings. (You've left off the empty string and the original string from your list...if that's the desired result then you have 2^n - 2
total output strings.)
In any event, one way you can divide up the work of creating the strings is to assign each potential output string a number and have each thread compute the output strings for a certain range of numbers. The mapping from number to output string is easy if you look at the binary representation of each number. Each binary digit in an n
-bit number corresponds to a character in the string of length n
. Thus, for your example, the number 5 or 101
in binary maps to the string "ac"
. The strings you listed would be created by computing the mappings for numbers from 1 to 6 as follows:
1 c
2 b
3 bc
4 a
5 ac
6 ab
You could compute 7
to get abc
or 0
to get the empty string if desired.
Unless you're doing this for words longer than a dozen or so characters, I'm not sure this will be that much faster though. If you're doing it for words longer than 25 or so characters you might start running into memory constraints since you'll be wrangling hundreds of megabytes.
I will be very, very surprised if CUDA is the right solution to this problem.
However, I would write a kernel to find all substrings of length n, and launch the kernel in a loop for each value of n from 0 to the length of the string. Thus, each thread in a kernel will have exactly the same instructions (no threads will sit around idle while others finish).
Each thread will "find" one substring, so you might as well have thread i
find the substring starting at index i in the string. Note that each substring length requires a different number of threads.
so, for n=1:
thread 0: a
thread 1: b
thread 2: c
and for n=2:
thread 0: ab
thread 1: bc
精彩评论