Decode pages line by line in PHP?
I would like to check if every word in a text file exists in any "LINES" of another large diction开发者_如何学Goary text file.
Every way I have tried this has failed, or worked only briefly.
How can I do without a million nested loops?
I'm answering this way too often. But a regex would avoid much of the looping.
// get words
preg_match_all(':\p{L}{2,}:u', $text_file, $words);
$words = end($words);
// make a search regex "abc|foobar|xyz|text|.."
$rx_words = implode("|", $words);
// find all words that exist on a line
preg_match_all(':^($rx_words)$:', file_get_contents("LINES"), $cmp);
// everything found if:
$found_all = !array_diff($cmp[1], $words);
Reading in the whole LINES
file can be avoided with some extra coding. But I wanted to keep it simple here.
Psuedocode If you have enough memory:
for each line in text file:
break line into words
for each word in line:
$wordMap[lowercase($word)] = 1;
for each line:
break line into words
for each word:
if $wordMap[lowercase($word)] == 1:
line has word $word
If you don't have enough memory for $wordMap, then make $wordMap some sort of database. You might also try a bloom filter (http://code.google.com/p/php-bloom-filter/, http://en.wikipedia.org/wiki/Bloom_filter).
精彩评论