开发者

Decode pages line by line in PHP?

I would like to check if every word in a text file exists in any "LINES" of another large diction开发者_如何学Goary text file.

Every way I have tried this has failed, or worked only briefly.

How can I do without a million nested loops?


I'm answering this way too often. But a regex would avoid much of the looping.

// get words
preg_match_all(':\p{L}{2,}:u', $text_file, $words);
$words = end($words);

// make a search regex  "abc|foobar|xyz|text|.."
$rx_words = implode("|", $words);

// find all words that exist on a line
preg_match_all(':^($rx_words)$:', file_get_contents("LINES"), $cmp);

// everything found if:
$found_all = !array_diff($cmp[1], $words);

Reading in the whole LINES file can be avoided with some extra coding. But I wanted to keep it simple here.


Psuedocode If you have enough memory:

for each line in text file:
   break line into words
   for each word in line:
       $wordMap[lowercase($word)] = 1;

for each line:
   break line into words
   for each word:
       if $wordMap[lowercase($word)] == 1:
          line has word $word

If you don't have enough memory for $wordMap, then make $wordMap some sort of database. You might also try a bloom filter (http://code.google.com/p/php-bloom-filter/, http://en.wikipedia.org/wiki/Bloom_filter).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜