How to get related posts using PHP and MySQL
What is the best way to get related posts using PHP and MySQL? The second question is how would I get the top 5 related posts from by comparing tags and categories from each post. My MySql tables are listed below.
CREATE TABLE categories (
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
parent_id INT UNSIGNED NOT NULL DEFAULT 0开发者_如何转开发,
category VARCHAR(255) NOT NULL,
url VARCHAR(255) NOT NULL,
PRIMARY KEY (id),
INDEX parent (parent_id),
UNIQUE KEY(parent_id, url)
);
CREATE TABLE posts_tags (
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
tag_id INT UNSIGNED NOT NULL,
users_posts_id INT UNSIGNED NOT NULL,
PRIMARY KEY (id)
);
CREATE TABLE tags (
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
tag VARCHAR(255) NOT NULL,
PRIMARY KEY (id)
);
CREATE TABLE users_posts (
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
user_id INT UNSIGNED NOT NULL,
title TEXT NOT NULL,
posts_content LONGTEXT NOT NULL,
PRIMARY KEY (id)
);
Post relevance is a big area of research with no nice & smart solution. You may assign each post +0.1 point for tag match, +0.4 for category match. Later you may consider post content too. Then you may sort by this value.
This is not something you can easily do in 1 sql query.
SQL is for data retrieval, and is useful for retrieving data based on objective criteria, where there is a right or wrong answer. There is no objective measure of what makes a post a "related post", so it's not something that you can effectively do with SQL alone.
Document clustering, which means grouping related documents, is a large and active research area, so that's a good place to start, but implementing something yourself will be very difficult. Depending on the language you're using, you might look at clustering libraries. For example, if you're using Java (or anything that runs on the JVM, or you can set up a web service to do the clustering), you could look at using Weka.
精彩评论