开发者

find word and score based on positions

hey guys i have a textfile i have divided it into 4 parts. i want to search each part for the words that appear in each part and score that word

exmaple

welcome to the national basketball finals,the basketball teams here today have come a long way. without much delay lets play basketball.

i will want to return national = 1 as it appears only in one part etc

am working on determining text context using word position.

am working with c# and not very good in text processing basical开发者_JAVA技巧ly if a word appears in the 4 sections it scores 4 if a word appears in the 3 sections it scores 3 if a word appears in the 2 sections it scores 2 if a word appears in the 1 section it scores 1

thanks in advance

so far i have this

var s = "welcome to the national basketball finals,the basketball teams here today have come a long way. without much delay lets play basketball. ";

    var numberOfParts = 4;

    var eachPartLength = s.Length / numberOfParts;

    var parts = new List<string>();

    var words = Regex.Split(s, @"\W").Where(w => w.Length > 0); // this splits all words, removes empty strings

    var wordsIndex = 0;

    for (int i = 0; i < numberOfParts; i++)
    {

        var sb = new StringBuilder();

        while (sb.Length < eachPartLength && wordsIndex < words.Count())
        {

            sb.AppendFormat("{0} ", words.ElementAt(wordsIndex));

            wordsIndex++;

        }


        // here you have the part

        Response.Write("[{0}]"+ sb);

        parts.Add(sb.ToString());

        var allwords = parts.SelectMany(p => p.Split(' ').Distinct());

       var wordsInAllParts = allwords.Where(w => parts.All(p => p.Contains(w))).Distinct();


This question is very difficult to interpret. I don't fully understand your goal and it is my suspicion that you might not either.

In the absence of a clear requirement, there is no way to give a specific answer, so I will give a generic one:

Try writing a test that clearly specifies the exact behavior you want. You've got the beginnings of one with your sample string and the result you want but it's not unambiguous what you are looking for.

Make a test that, when it passes, demonstrates that one of the required behaviors is there. If that doesn't help you get a solution to the problem, come back and edit this question or make a new one that includes the test.

At the very least, you will be able to harvest better answers from this site.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜