开发者

Find common chars in array of strings, in the right order

I spent days working on a function to get common chars in an array of strings, in the right order开发者_StackOverflow, to create a wildcard.

Here is an example to explain my problem. I made about 3 functions, but I always have a bug when the absolute position of each letter is different.

Let's assume "+" is the "wildcard char":

Array(
0 => '48ca135e0$5',
1 => 'b8ca136a0$5',
2 => 'c48ca13730$5',
3 => '48ca137a0$5');

Should return :

$wildcard='+8ca13+0$5';

In this example, the tricky thing is that $array[2] as 1 char more than others.

Other example :

Array(
0 => "case1b25.occHH&FmM",
1 => "case11b25.occHH&FmM",
2 => "case12b25.occHH&FmM",
3 => "case20b25.occHH&FmM1");

Should return :

$wildcard='case+b25.occHH&FmM+';

In this example, the tricky parts are :

- Repeating chars, such as 1 -> 11 in the "to delete" part, and c -> cc in the common part

- The "2" char in $array[2] & [3] in the "to delete" part is not in the same position

- The "1" char at the end of the last string

I really need help because I can't find a solution to this function and it is a main part of my application.

Thanks in advance, don't hesitate to ask questions, I will answer as fast as possible.

Mykeul


Seems you want to create something like regular expression out of set of example strings. This might be quite tricki in general. Found this link, not sure if it's relevant: http://scholar.google.com/scholar?hl=en&rlz=1B3GGGL_enEE351EE351&q=%22regular%20expression%20by%20example%22&oq=&um=1&ie=UTF-8&sa=N&tab=ws

On the other hand, if you need only one specific wildcard meaning "0 or more characters", then it should be much easier. Levenshtein distance algorithm computes similarity between 2 strings. Normally only result is needed, but in your case the places of differences are important. You also need to adapt this for N strings.

So I recommend to study this algorithm and hopefully you'll get some ideas how to solve your problem (at least you'll get some practice with text algorithms and dynamic programming).

Heres algorithm in PHP: _http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#PHP

You might want also to search for PHP implementations of "diff". http://paulbutler.org/archives/a-simple-diff-algorithm-in-php/


Main code:
Step 1: Sort strings by length, shortest to longest, into array[]
Step 2: Compare string in array[0] and array[1] to get $temp_wildcard
Step 3: Compare string in array[2] with $temp_wildcard to create new $temp_wildcard
Step 4: Continue comparing each string with $temp_wildcard - the last $wildcard is your $temp_wildcard

OK, so now we're down to the problem of how to compare two strings to return your wildcard string.

Subroutine code: Compare strings character-by-character, substituting wildcards into your return value when the comparison doesn't match.

To handle the problem of different lengths, run this comparison an extra time for each character that the second string is longer with an offset. (Compare string1[x] to string2[x+offset].) For each returned string, count the number of wildcard characters. The subroutine should return the answer with the fewest number of wildcard characters.

Good luck!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜