开发者

Index Stemming to process text in C# or ruby

Given this text:

"Friends are friendlier friendlies that are friendly and classify the friendly classification class. Flowery flowers flow through following the flower flows"

I need to apply stemming to the text to achieve the following outcome:

frequency("following")                = 1
frequency("flow")                     = 2
frequency("classification")           = 1
frequency("class")                    = 1
frequency("flower")                   = 3
frequency("friend")                   = 4
frequency("friendly")                 = 4
frequency("classes")                  = 1

As we interface with the FAST search en开发者_如何学运维gine. FAST indexes content to provide relevant search results to a query. One aspect of indexing is stemming and we need to use either C# or ruby to solve this.

Would appreciate anyone's views on the best approach


    public StemmingProcessorResults ProcessText(string text)
    {
            return new StemmingProcessorResults(
                    new []{
                        new StemmingProcessorResultItem("following", 1),
                        new StemmingProcessorResultItem("flow", 2),
                        new StemmingProcessorResultItem("classification", 1),
                        new StemmingProcessorResultItem("class", 1),
                        new StemmingProcessorResultItem("flower", 3),
                        new StemmingProcessorResultItem("friend", 4),
                        new StemmingProcessorResultItem("friendly", 4),
                        new StemmingProcessorResultItem("classes", 1)
                    }
                );
    }

There you go, that should be perfect for your copy-paste needs


You cannot "apply stemming" to the text to get those results because the acceptance criteria contains a mistake. Namely frequency("friend") should be 5. Every single stemming algorithm by definition cannot produce the acceptance criteria. Therefore any algorithm that gives those values will have to do - as per Rob Ashton. You could also use a switch statement or a dictionary lookup, whatever, it just needs to output those numbers.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜