开发者

phrase split algorithm in PHP

Not sure how to explain. Let's use an example. Say I want to split the sentence

"Today is a great day."

into

today
today is
today is a
today is a great
today is a great day
is
is a
is a great开发者_开发百科
is a great day
a
a great
a great day
great
great day
day

The idea is to get all the sequential combination in a sentence.

I have been thinking what's the best way to do it in PHP. Any idea is welcome.


Here's an example:

$sentence = 'Today is a great day.';

// Only leave "word" characters and whitespace
$sentence = preg_replace('/[^\w\s]+/', '', strtolower($sentence));

// Tokenize
$tokens = explode(' ', $sentence);

for($i = 0; $i < count($tokens); $i++) {
    for($j = 1; $j <= count($tokens) - $i; $j++) {
        echo implode(' ', array_slice($tokens, $i, $j)) . "<br />";
    }
}

Output:

today
today is
today is a
today is a great
today is a great day
is
is a
is a great
is a great day
a
a great
a great day
great
great day
day


split it into an array of words using the php-function explode. Then use two nested loops. The outer one (i) goes through the array-indicies (0..count(array)-1) and is about the first word in the output line. The inner loop (j) goes from i+1 to the length of the array. Then inside the inner loop, you have to output the words from i to j-1. Use implode to do that. Use it on a subarray of the word array from i to j-1. You can get it using array_slice


$phrase = 'Today is a great day';
$pieces = explode(' ', strtolower($phrase));
$sets = array();
for ($i=0; $i<count($pieces);$i++) {
    for ($j=0; $j<count($pieces);$j++) {
        if ($i<=$j)
            $sets[$i][] = $pieces[$j];
    }
}
print "<ul>";
foreach($sets as $set) {
    while(count($set) > 0) {
        print "<li>" . implode(' ', $set) . "</li>\n";
        array_pop($set);
    }
}
print "</ul>";

Result:

  • today is a great day
  • today is a great
  • today is a
  • today is
  • today
  • is a great day
  • is a great
  • is a
  • is
  • a great day
  • a great
  • a
  • great day
  • great
  • day


Recursive approach:

function iterate($words) {
    if(($total = count($words)) > 0) {
        $str = '';
        for($i = 0; $i < $total; $i++ ) {
            $str .= ' ' . $words[$i];
            echo $str . PHP_EOL;
        }
        array_shift($words);
        iterate($words);
    }
}

$text = "Today is a great day.";
$words = str_word_count($text, 1);
iterate($words);

The above will only consider words. It will not remove duplicates. Numbers are not words and punctuation ain't either. With the given test sentence of five words, the recursive approach performs neglectably faster than the array_splice solution. However, this increases significantly with each additional word. A quick benchmark on my machine with a ten word sentence finished in almost half the time.


Disclaimer: Isolated Benchmarks depend on a number of factors and may produce different results on different machines. If anything, they can give an indicator about code performance (often in the realms of micro-optimzations), but nothing more.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜