Search Engine Keywords Parser
Here is what I want to do:
I need to create a search engine parser that uses the following operators:
- Apples AND Oranges (AND operator)
- Apples OR Oranges (OR operator)
- Apples AND NOT Oranges (AND NOT op开发者_如何学运维erator)
- " Apples " (Quotes operator)
- Apples AND ( Oranges OR Pears ) (Parentheses operator)
- Appl* (Star operator)
With some preg_replace, I manage to convert the string into an array and then I parsed this array to get a MySQL query. But I don't like that way and it's very unstable!
I searched the web for some script that does that and I didn't have any luck!
Can someone please help me implement this??
Thanks
Ok, this is going to be a large answer.
I think what you need is a parser generator. A piece of software that generates code to parse text according to a given grammar. These parsers often have 2 main components: a lexer and a parser. The lexer identify TOKENS (words), the parser check whether the token order is right according to your grammar.
In the lexer, you should declare the following tokens
TOKENS ::= (AND, OR, NOT, WORD, WORDSTAR, LPAREN, RPAREN, QUOTE)
WORD ::= '/w+/'
WORDSTAR ::= '/w+\*/'
The grammar should be defined like this:
QUERY ::= word
QUERY ::= wordstar
QUERY ::= lparen QUERY rparen
QUERY ::= QUERY and QUERY
QUERY ::= QUERY or QUERY
QUERY ::= QUERY and not QUERY
QUERY ::= quote MQUERY quote
MQUERY ::= word MQUERY
MQUERY ::= word
This grammar defines a language with all the features your need. Depending on the software you use, you could define functions to handle each rule. That way, you can transform your text-query into a sql where clause.
I'm not really into php, but i searched the web for a parser generator and PHP_ParserGenerator appeared.
Keep in mind that as long as your database grows these queries may become a problem for a structured storage system.
You may want to try a full-text search engine that allows you to perform this and many other features related to text search. This is how IndexTank works
First, you add (or 'index' in search dialect) all your db records (or documents) to IndexTank.
$api = new ApiClient(...);
$index = $api->get_index('my_index');
foreach ($dbRows as $row) {
$index->add_document($row->id, array('text' => $row->text));
}
After that, you can search in the index with all the operators you want
$index = $api->get_index('my_index');
$search_result = $index->search('Apples AND Oranges');
$search_result = $index->search('Apples OR Oranges');
$search_result = $index->search('Apples AND NOT Oranges');
$search_result = $index->search('"apples oranges"');
$search_result = $index->search('Apples AND ( Oranges OR Pears )');
$search_result = $index->search('Appl*');
I hope I answered your question.
Also, this is not exactly what you're looking for, but maybe close: MySQL Full-text searching.
- http://devzone.zend.com/article/1304
- http://www.artfulcode.net/articles/full-text-searching-mysql/
- http://jeremy.zawodny.com/blog/archives/000576.html
did you look at ANTLR
You could homebrew something like the following (IMPORTANT: $search
string must first be sanitized or u get hacked) ...
if (substr($search[0]=='*' and substr($search,-1)=='*') {
// *ppl*
$query = "SELECT * FROM `table` WHERE `field` LIKE (%'". str_replace('*','',$search) ."%')";
} elseif (substr($search,-1)=='*') {
// Appl*
$query = "SELECT * FROM `table` WHERE `field` LIKE ('". str_replace('*','',$search) ."%')";
} elseif ($search[0]=='*') {
// *Appl
$query = "SELECT * FROM `table` WHERE `field` LIKE ('%". str_replace('*','',$search) ."')";
} elseif (substr_count($search,'"')==2) {
// " Apples " ... just remove the "
$query = 'SELECT * FROM `table` WHERE `field` = "'. str_replace('"','',$search) .'"';
} elseif (strpos($search,')') or strpos($search,'(')) {
// uh ... something more complex here
$query = '#idunno';
} else {
// the rest
$query = 'SELECT * FROM `table` WHERE `field` = "'. $search .'"';
$search = array(
' AND ',
' OR ',
' AND NOT '
);
$replace = array(
'" AND `field` = "',
'" OR `field` = "',
'" AND `field != "'
);
str_replace($search,$replace,$query);
}
Try this: http://www.isearchthenet.com/isearch/index.php
From readme:
- Searches are normally performed with "may contain" words. A match requires any of the words entered to be present on the page.
- You can search for pages which contain a specific word by prefixing it with a plus (+) sign. Only pages which contain that word will be shown.
- You can ignore all pages which contain a specific word by prefixing it with a minus (-) sign. Any page that contains that word will not be displayed in the search results.
- You can search for a specific phrase by enclosing it in double quotes ("). Only pages that contain that exact phrase will be shown.
It's easy to install and use. Also take a look at http://sphinxsearch.com/ - the most powerful engine, but not for newbies.
精彩评论