Find shortest combination of words that contain every character from a specified set
I got an array (let's call it a1
) of words (like dog
, fish
, run
, programming
anything) that's really huge.
I can combine any of the words out of a1
with any other of the words (e.g. you could combine dog
and programming
into dog-programming
), and then again and again, until the strings gets really big.
I also got an array (let's call it a2
) of strings (such as de
, s
, x?
, umh
, again they could be pretty much anything). It is guaranteed that there are no strings in a2
that cannot be found in any string of a1.
What I'm looking for is the shortest string (shortest in terms of how many combinations it takes to create that string, not how many characters that string contains) that contains all of the strings inside a2
. If there are more than one string that all feature that minimum length, I can just pick any, and bail out of the program.
Now, I don't think I can just brute-force this because even if the arrays are relative small, there are virtually endless options of combining the words, but I'd love to be proven wrong!
Is there any good way to get the shortest string that will surely yield the shortest result or do I have to use heuristical algorithms to just search for pretty short strings?
I tried开发者_开发技巧 just picking a string from a1 that covered most strings from a2, then removing those items from a2, and starting again, but it doesn't work! It yields pretty good results, but not the best.
if you combine the words with a dash like in your example, e.g.
dog + programming + sky = dog-programming-sky
AND the words in A2 do not contain dashes, then it's a just SET-COVER in disguise, an NP-complete optimization problem. You can then solve the problem using any solution strategy available for SET-COVER. There is a fast approximation algorithm for SET-COVER, but if you want to have the absolute smallest solution, you'd need to resort to a worst-case exponential algorithm.
If you combine the words without a dash, e.g.
dog + programming + sky = dogprogrammingsky
then the problem is more difficult, because now e.g. "ogpro" is found in the combined word even if it isn't a substring of any of the constituent strings.
精彩评论