开发者

mySQL: Search multiple rows for a string, and order the results based on how often the string

EDIT: As can be seen, I decided to go with mySQL's "Match". That said, if someone knows of a clean method to do what I wanted within a SELECT statement, I would appreciate the information (knowledge for knowledge sake and all that)

I'm currently working on developing a local search engine for a website I'm designing, and as such one way in which I am using to determine the relevance of articles is the number of times the search terms appear in the article itself. As such, I'm looking for an SQL query that will allow me to pull rows (articles) containing the search term, and than order them based on how many times the search terms appear in each row (articles).

In other words, I need something like this...

SELECT article_id FROM articles_table WHERE article_content LIKE '%Search Terms%' ORDER BY COUNT(number of times string a开发者_开发问答ppears in article_content);

So if a user were to search for "The Empire" and pulled up the following three articles...

  1. The Empire is The Empire.
  2. The Empire is the name of a position in baseball.
  3. The Empire The Empire The Empire.

It would sort them as so..

  1. The Empire The Empire The Empire
  2. The Empire is The Empire
  3. The Empire is the name of a position in baseball.

I am working in PHP, and although ideally I would like to perform this operation with nothing more then one SQL query, I'm open to PHP solutions if this is not possible.

Any and all help is greatly appreciated.


You should really consider a Full Text search solution. Either use MyISAM tables and MySQL native full text search, or you can go the external way and use something like Sphinx fulltext search or Lucene


I totally agree with other answers. Theorically you could do something like this

select (char_length('The Empire The Empire The Empire') - 
       char_length(replace(lower('The Empire The Empire The Empire'),lower('empire'),''))) / char_length('empire') as occurrences

to find how often a search term occurs in your string but this is a terrible method


Not strictly an answer, but have you considered a full-text search engine such as Lucene?

Rather than build your own which will not be as good, I mean.


Here is a clever without using FULLTEXT searching

use test
DROP TABLE IF EXISTS articles_table;
CREATE TABLE articles_table
(
article_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
article_content TEXT
) ENGINE=MyISAM;
INSERT INTO articles_table (article_content) VALUES
('The Empire is The Empire'),
('The Empire is the name of a position in baseball.'),
('The Empire The Empire The Empire');
SELECT * FROM articles_table;

lwdba@localhost (DB test) :: SELECT * FROM articles_table;
+------------+---------------------------------------------------+
| article_id | article_content |
+------------+---------------------------------------------------+
| 1 | The Empire is The Empire |
| 2 | The Empire is the name of a position in baseball. |
| 3 | The Empire The Empire The Empire |
+------------+---------------------------------------------------+
3 rows in set (0.00 sec)

SELECT article_content,
REPLACE(article_content,'The Empire','') newstring,
LENGTH(article_content) origlen,
LENGTH(REPLACE(article_content,'The Empire','')) newlen,
FLOOR((LENGTH(article_content) - LENGTH(REPLACE(article_content,'The Empire','')))/(LENGTH('The Empire'))) score
FROM articles_table;

+---------------------------------------------------+-----------------------------------------+---------+--------+-------+
| article_content | newstring | origlen | newlen | score | +---------------------------------------------------+-----------------------------------------+---------+--------+-------+
| The Empire is The Empire | is | 24 | 4 | 2 |
| The Empire is the name of a position in baseball. | is the name of a position in baseball. | 49 | 39 | 1 |
| The Empire The Empire The Empire | | 32 | 2 | 3 |
+---------------------------------------------------+----------------------------------------+---------+--------+-------+

The score is the number of deletions from the original string.

Augment the query to show only the original text and the score:

SELECT * FROM (SELECT article_content,FLOOR((LENGTH(article_content) - LENGTH(REPLACE(article_content,'The Empire','')))/(LENGTH('The Empire'))) score FROM articles_table) AA ORDER BY score DESC;

Here is the final product

lwdba@localhost (DB test) :: SELECT * FROM (SELECT article_content,FLOOR((LENGTH(article_content) - LENGTH(REPLACE(article_content,'The Empire','')))/(LENGTH('T he Empire'))) score FROM articles_table) AA ORDER BY score DESC;
+---------------------------------------------------+-------+
| article_content | score |
+---------------------------------------------------+-------+
| The Empire The Empire The Empire | 3 |
| The Empire is The Empire | 2 |
| The Empire is the name of a position in baseball. | 1 |
+---------------------------------------------------+-------+
3 rows in set (0.06 sec)

Just insert any desired string into the two places in the query !!!

Give it a Try !!!

UPDATE: Oh well, I tried !!!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜