finding similar strings to given by keywords, each keyword have got it's own 'power'
This question is a chalenge for me, my friend can`t tell me how to do it, but he is really good programmer (I think).
Users can put into database sentences. When user puts a sentence it is saved in sentences
table.
Next, sentence is split into words, each soundex of the word is saved into table tags
with id of the splited sentence.
Last, each soundax of the word is put into weights
table, if there arleady is the same soundex, function adds 1 to counter
of this soundex.
(For those who dont know: soundex is a function that returns a phonetic representation (the way it sounds) of a string)
Structure of the database:
One table sentences
contains two rows: id
and sentence
.
Other table tags
contains id
(with is id of a sentence) and tag
(with is one word from the sentence).
tag
isn't really just plain word, but soundex of this word.
Last table weights
contains tag
and weight
(with is number, it tells us how many there is tags like this in table tags
)
My question is: how can I make a function witch returns similar sentences to given string.
It should use tags (soundex of word) and each tag should have its own power开发者_如何学运维 based on weights
table.
Tags, that are often used are more important, then more original tags. Can it be done in just one mysql query?
Next question: I think that this way of looking for similar sentences is good, but what with speed of this function? I need to use it very very often in my site.
Well instead of having a weights table, why don't you have a table that relates tags to sentences? So have a table called sentence_tags
with a sentence_id
and a tag_id
column. Then you can compute the weights by doing a join on those two tables, and still reference back to the sentence that contains the tag. You may as well store both the tag and the soundex in the tags table, while you're at it.
Perhaps the Levenshtein Distance is what you are looking for. It calculates the number of steps there are needed to transfer from one word to another.
Do realize this is a costly operation.
Joe K's suggestion seems spot on for good database design.
Do not store information that can be extrapolated.
Meaning, use the join statement and PHP to calculate the weight at run-time.
I understand this may not be the correct solution in your design, but often a few moments spent on smart database struture design will make everything work that much better.
精彩评论