Index Stemming to process text in C# or ruby
Given this text:
"Friends are friendlier friendlies that are friendly and classify the friendly classification class. Flowery flowers flow through following the flower flows"
I need to apply stemming to the text to achieve the following outcome:
frequency("following") = 1
frequency("flow") = 2
frequency("classification") = 1
frequency("class") = 1
frequency("flower") = 3
frequency("friend") = 4
frequency("friendly") = 4
frequency("classes") = 1
As we interface with the FAST search en开发者_如何学运维gine. FAST indexes content to provide relevant search results to a query. One aspect of indexing is stemming and we need to use either C# or ruby to solve this.
Would appreciate anyone's views on the best approach
public StemmingProcessorResults ProcessText(string text)
{
return new StemmingProcessorResults(
new []{
new StemmingProcessorResultItem("following", 1),
new StemmingProcessorResultItem("flow", 2),
new StemmingProcessorResultItem("classification", 1),
new StemmingProcessorResultItem("class", 1),
new StemmingProcessorResultItem("flower", 3),
new StemmingProcessorResultItem("friend", 4),
new StemmingProcessorResultItem("friendly", 4),
new StemmingProcessorResultItem("classes", 1)
}
);
}
There you go, that should be perfect for your copy-paste needs
You cannot "apply stemming" to the text to get those results because the acceptance criteria contains a mistake. Namely frequency("friend") should be 5. Every single stemming algorithm by definition cannot produce the acceptance criteria. Therefore any algorithm that gives those values will have to do - as per Rob Ashton. You could also use a switch statement or a dictionary lookup, whatever, it just needs to output those numbers.
精彩评论