MySQL query to check for certain phrases (duplicate article, plagiarism)
Is there a way to check for multiple phrases in mysql?
I need to check if an article have a duplicate version stored in mysql.
This is the algorithm I first create an array of sentences that needed to be check (removing all non-alpha-numeric characters) Then Build the query (how?) After I get the result I compare if 50% of the sentences are duplicate than I consider the article to be duplicate. Articles in the table are stored with removed non-alpha-numeric characters. Example:
iamdevelopingatoolt开发者_C百科ocheckduplicatearticlesstoredinmysqldatabasehoweveriveencountered
Any suggestions?
Yes, look into "Programming Collective Intelligence" to learn about these algorithms. They have to do with grouping.
精彩评论