Algorithm to give more weight to the first word
Right now, I'm trying to create an algorithm that gives a score to a user, depending on his input in a text field.
This score is supposed to encourage the user to add more text to his personal profile.
The way the algorithm should work, is that it should account a certain weight to the first word, and a little less weight to the second word. The third word will receive a little less weight than the second word, and so on.
The goal is to encourage users to expand their texts, but to avoid spam in general as well. For instance, the added value of the 500th word shouldn't be much at all. The difference between a text of 100 words and a text of 500 words should be substantial.
Am I making any se开发者_运维百科nse so far?
Right now, I wouldn't know where to begin with this question. I've tried multiple Google queries, but didn't seem to find anything of the sort. Can anyone point me in the right direction? I suppose such an algorithm must already exist somewhere (or at least the general idea probably exists) but I can't seem to be able to find some help on the subject.
Can anyone point me in the right direction? I'd really appreciate any help you can give me.
Thanks a lot.
// word count in user description
double word_count = ...;
// word limit over which words do not improve score
double word_limit = ...;
// use it to change score progression curve
// if factor = 1, progression is linear
// if factor < 1, progression is steeper at the beginning
// if factor > 1, progression is steeper at the end
double factor = ...;
double score = pow(min(word_count, word_limit) / word_limit, factor);
It depends how complex you want/need it to be, and whether or not you want a constant reduction in the weight applied to a particular word.
The simplest would possibly be to apply a relatively high weight (say 1000) to the first word, and then each subsequent word has a weight one less than the weight of the previous word; so the second word has a weight of 999, the third word has a weight of 998, etc. That has the "drawback" that the sum of the weights doesn't increase past the 1000 word mark - you'll have to decide for yourself whether or not that's bad for your particular situation. That may not do exactly what you need to do, though.
If you don't want a linear reduction, it could be something simple such as the first word has a weight of X, the second word has a weight equal to Y% of X, the third word has a weight equal to Y% of Y% of X, etc. The difference between the first and second word is going to be larger than the difference between the second and third word, and by the time you reach the 500th word, the difference is going to be far smaller. It's also not difficult to implement, since it's not a complex formula.
Or, if you really need to, you could use a more complex mathematical function to calculate the weight - try googling 'exponential decay' and see if that's of any use to you.
It is not very difficult to implement a custom scoring function. Here is one in pseudo code:
function GetScore( word_count )
// no points for the lazy user
if word_count == 0
return 0
// 20 points for the first word and then up to 90 points linearly:
else if word_count >= 1 and word_count <= 100
return 20 + 70 * (word_count - 1) / (100)
// 90 points for the first 100 words and then up to 100 points linearly:
else if word_count >= 101 and word_count <= 1000
return 90 + 10 * (word_count - 100) / (900)
// 100 points is the maximum for 1000 words or more:
else
return 100
end function
I would go with something like result = 2*sqrt(words_count)
, anyway you can use any function that has derivative less then 1 e.g. log
精彩评论