How to search a multidimensional array using GET
Hey guys, I've had a lot of help from everyone here and i am really appreciative! I'm trying to create a text file search engine and i think i am on the final stretch now! All i need to do now is to be able to search the multi-dimensional array i've created for a certain word submitted by a form and grabbed with GET, and return the results in highest to lowest order (TF-IDF will come later). I can perform a simple search on the content variable which is not really what i want (see in code for $new_content) but not on the $index array.
Here is my code:
<?php
$starttime = microtime();
$startarray = explode(" ", $starttime);
$starttime = $startarray[1] + $startarray[0];
if(isset($_GET['search']))
{
$searchWord = $_GET['search'];
}
else
{
$searchWord = null;
}
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Untitled Document</title>
</head>
<body>
<div id="wrapper">
<div id="searchbar">
<h1>PHP Search</h1>
<form name='searchform' id='searchform' action='<?php echo $_SERVER['PHP_SELF']; ?>' method='get'>
<input type='text' name='search' id='search' value='<?php echo $_GET['search']; ?>' />
<input type='submit' value='Search' />
</form开发者_StackOverflow中文版>
<br />
<br />
</div><!-- close searchbar -->
<?php
include "commonwords.php";
$index = array();
$words = array();
// All files with a .txt extension
// Alternate way would be "/path/to/dir/*"
foreach (glob("./files/*.txt") as $filename) {
// Includes the file based on the include_path
$content = file_get_contents($filename, true);
$pat[0] = "/^\s+/";
$pat[1] = "/\s{2,}/";
$pat[2] = "/\s+\$/";
$rep[0] = "";
$rep[1] = " ";
$rep[2] = "";
$new_content = preg_replace("/[^A-Za-z0-9\s\s+]/", "", $content);
$new_content = preg_replace($pat, $rep, $new_content);
$new_content = strtolower($new_content);
preg_match_all('/\S+/',$new_content,$matches,PREG_SET_ORDER);
foreach ($matches as $match) {
if (!isset($words[$filename][$match[0]]))
$words[$filename][$match[0]]=0;
$words[$filename][$match[0]]++;
}
foreach ($commonWords as $value)
if (isset($words[$filename][$value]))
unset($words[$filename][$value]);
$results = 0;
$totalCount = count($words[$filename]);
// And another item to the list
$index[] = array(
'filename' => $filename,
'word' => $words[$filename],
'all_words_count' => $totalCount
);
}
echo '<pre>';
print_r($index);
echo '</pre>';
if(isset($_GET['search']))
{
$endtime = microtime();
$endarray = explode(" ", $endtime);
$endtime = $endarray[1] + $endarray[0];
$totaltime = $endtime - $starttime;
$totaltime = round($totaltime,5);
echo "<div id='timetaken'><p>This page loaded in $totaltime seconds.</p></div>";
}
?>
</div><!-- close wrapper -->
</body>
</html>
foreach ($index as $result)
if (array_key_exists($searchWord,$result['word']))
echo "Found ".$searchWord." in ".$result['filename']." ".$result['word'][$searchWord]." times\r\n";
As an aside, I would highly recommend only searching the files if the search term has been filled rather than searching with every refresh to the page.
Also, some other things to keep in mind: - Make sure you declare variables before using them (such as your $pat and $rep variables, should be $pat = Array(); before using it). - You do the right thing at the top and check for the existence of a $searchWord but keep referencing the $_GET['search']; I would advise continuing to use $searchWord and checking against is_null($searchWord) throughout the page instead of using $_GET. It's good practice to not just output those variables on the page without an integrity check. - Also, it may be more useful to check if the $searchWord (or words) are in the $commonWords, then process the file. Could take some time off the search if there are a lot of files or big files with a lot of words. I also don't fully understand why you're storing all words when you are only looking for keywords, but if this gets too big you'll be hitting a memory limit in the near future.
精彩评论