How to remove words based on a word count

2022-12-31 16:36 问答作者：

Here is what I'm trying to accomplish. I have an object coming back from the database with a string description. This description can be up to 1000 characters long, but we only want to display a short view of this. So I coded up the following, but I'm having trouble in actu开发者_运维问答ally removing the number of words after the regular expression finds the total count of words. Does anyone have good way of dispalying the words which are less than the Regex.Matches?

Thanks!

if (!string.IsNullOrEmpty(myObject.Description))
{
    string original = myObject.Description;
    MatchCollection wordColl = Regex.Matches(original, @"[\S]+");
    if (wordColl.Count < 70) // 70 words?
    {
        uxDescriptionDisplay.Text = 
             string.Format("<p>{0}</p>", myObject.Description);
    }
    else
    {                        
        string shortendText = original.Remove(200); // 200 characters?
        uxDescriptionDisplay.Text = 
              string.Format("<p>{0}</p>", shortendText);
    }
 }

EDIT:

So this is what I got working on my own:

else 
{
    int count = 0;
    StringBuilder builder = new StringBuilder();
    string[] workingText = original.Split(' ');
    foreach (string word in workingText)
    {
        if (count < 70)
        {
            builder.AppendFormat("{0} ", word);
        }
        count++;
    }
        string shortendText = builder.ToString();
}

It's not pretty, but it worked. I would call it a pretty naive way of doing this. Thanks for all of the suggestions!

I would opt to go by a strict character count rather than a word count because you might happen to have a lot of long words.

I might do something like (pseudocode)

if text.Length > someLimit
   find first whitespace after someLimit (or perhaps last whitespace immediately before)
   display substring of text 
else 
   display text

Possible code implementation:

string TruncateText(string input, int characterLimit)
{
    if (input.Length > characterLimit)
    {
        // find last whitespace immediately before limit
        int whitespacePosition = input.Substring(0, characterLimit).LastIndexOf(" ");

        // or find first whitespace after limit (what is spec?)
        // int whitespacePosition = input.IndexOf(" ", characterLimit); 

        if (whitespacePosition > -1)
            return input.Substring(0, whitespacePosition);
    }
    return input;
}

One method, if you're using at least C#3.0, would be a LINQ like the following. This is provided you're going strictly by word count, not character count.

if (wordColl.Count > 70)
{
    foreach (var subWord in wordColl.Cast<Match>().Select(r => r.Value).Take(70))
    {
        //Build string here out of subWord
    }
}

I did a test using a simple Console.WriteLine with your Regex and your question body (which is over 70 words, it turns out).

You can use Regex Capture Groups to hold the match and access it later.

For your application, I'd recommend instead simply splitting the string by spaces and returning the first n elements of the array:

if (!string.IsNullOrEmpty(myObject.Description))
{
    string original = myObject.Description;
    string[] words = original.Split(' ');
    if (words.Length < 70)
    {
        uxDescriptionDisplay.Text = 
             string.Format("<p>{0}</p>", original);
    }
    else
    {                        
        string shortDesc = string.Empty;
        for(int i = 0; i < 70; i++) shortDesc += words[i] + " ";
        uxDescriptionDisplay.Text = 
             string.Format("<p>{0}</p>", shortDesc.Trim());
     }
 }

Are you wanting to remove 200 characters or start truncating at the 200th character? When you call original.Remove(200) you are indexing the start of the truncation at the 200th character. This is how you use Remove() for a certain number of characters to remove:

string shortendText = original.Remove(0,200);

This starts at the first character and removes 200 starting with that one. Which I imagine that's not what you're trying to do since you're shortening a description. That's merely the correct way to use Remove().

Instead of using Regex matchcollections why not just split the string? It's a lot easier and straight forward. You can set the delimiter to a space character and split that way. Not sure if that completely fixes your need but it just might. I'm not sure what your data looks like in the description. But you split this way:

String[] wordArray = original.Split(' ');

From there you can determine the word count with wordArray's Length property value.

If I was you I would go by characters as you may have many one letter words or many long words in your text.

Go through until characters <= your limit, then either find the next space and then add these characters to a new string (possibly using the SubString method) or take these characters and add a few full stops, then make a new string The later could be unproffessional I suppose.

继续阅读：regex string

How to remove words based on a word count

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？