开发者

Searching String for specific Word. C#

I would like to search a string for a specific words that a user would type in and then output the percentage that word is displayed within the text. Just wondering what the best method for this wo开发者_JAVA技巧uld be and if you could help me out please.


I suggest using String.Equals overload with StringComparison specified for better performance.

var separators = new [] { ' ', ',', '.', '?', '!', ';', ':', '\"' };
var words = sentence.Split (separators);
var matches = words.Count (w =>
    w.Equals (searchedWord, StringComparison.OrdinalIgnoreCase));
var percentage = matches / (float) words.Count;

Note that percentage will be float, e.g. 0.5 for 50%.
You can format it for display using ToString overload:

var formatted = percentage.ToString ("P0"); // 0.1234 => 12 %

You can also change format specifier to show decimal places:

var formatted = percentage.ToString ("P2"); // 0.1234 => 12.34 %

Please keep in mind that this method is ineffective for large strings because it creates a string instance for each of the words found. You might want to take StringReader and read word by word manually.


The easiest way is to use LINQ:

char[] separators = new char() {' ', ',', '.', '?', '!', ':', ';'};
var count =
    (from word In sentence.Split(separators)      // get all the words
    where word.ToLower() = searchedWord.ToLower() // find the words that match
    select word).Count();                         // count them

This only counts the number of times the word appears in the text. You could also count how many words there are in the text:

var totalWords = sentence.Split(separators).Count());

and then just get the percentage as:

var result = count / totalWords * 100;


My suggestion is a complete class.

class WordCount {
    const string Symbols = ",;.:-()\t!¡¿?\"[]{}&<>+-*/=#'";

    public static string normalize(string str)
    {
        var toret = new StringBuilder();

        for(int i = 0; i < str.Length; ++i) {
            if ( Symbols.IndexOf( str[ i ] ) > -1 ) {
                toret.Append( ' ' );
            } else {
                toret.Append( char.ToLower( str[ i ] ) );
            }
        }

        return toret.ToString();
    }

    private string word;
    public string Word {
        get { return this.word; }
        set { this.word = value; }
    }

    private string str;
    public string Str {
        get { return this.str; }
    }

    private string[] words = null;
    public string[] Words {
       if ( this.words == null ) {
           this.words = this.Str.split( ' ' );
       }

       return this.words;
    }

    public WordCount(string str, string w)
    {
         this.str = ' ' + normalize( str ) + ' ';
         this.word = w;
    }

    public int Times()
    {
        return this.Times( this.Word );
    }

    public int Times(string word)
    {
        int times = 0;

        word = ' ' + word + ' ';

        int wordLength = word.Length;
        int pos = this.Str.IndexOf( word );

        while( pos > -1 ) {
            ++times;

            pos = this.Str.IndexOf( pos + wordLength, word );
        }

        return times;
    }

    public double Percentage()
    {
        return this.Percentage( this.Word );
    }

    public double Percentage(string word)
    {
        return ( this.Times( word ) / this.Words.Length );
    }
}

Advantages: string splitting is cached, so there is no danger of applying it more than one time. It is packaged in one class, so it can be easily resuable. No necessity of Linq. Hope this helps.


// The words you want to search for
var words = new string[] { "this", "is" };

// Build a regular expresion query
var wordRegexQuery = new System.Text.StringBuilder();
wordRegexQuery.Append("\\b(");
for (var wordIndex = 0; wordIndex < words.Length; wordIndex++)
{
  wordRegexQuery.Append(words[wordIndex]);
  if (wordIndex < words.Length - 1)
  {
    wordRegexQuery.Append('|');
  }
}
wordRegexQuery.Append(")\\b");

// Find matches and return them as a string[]
var regex = new System.Text.RegularExpressions.Regex(wordRegexQuery.ToString(), RegexOptions.IgnoreCase);
var someText = var someText = "This is some text which is quite a good test of which word is used most often. Thisis isthis athisisa.";
var matches = (from Match m in regex.Matches(someText) select m.Value).ToArray();

// Display results
foreach (var word in words)
{
    var wordCount = (int)matches.Count(w => w.Equals(word, StringComparison.InvariantCultureIgnoreCase));
    Console.WriteLine("{0}: {1} ({2:f2}%)", word, wordCount, wordCount * 100f / matches.Length);
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜