开发者

How to remove words based on a word count

Here is what I'm trying to accomplish. I have an object coming back from the database with a string description. This description can be up to 1000 characters long, but we only want to display a short view of this. So I coded up the following, but I'm having trouble in actu开发者_运维问答ally removing the number of words after the regular expression finds the total count of words. Does anyone have good way of dispalying the words which are less than the Regex.Matches?

Thanks!

if (!string.IsNullOrEmpty(myObject.Description))
{
    string original = myObject.Description;
    MatchCollection wordColl = Regex.Matches(original, @"[\S]+");
    if (wordColl.Count < 70) // 70 words?
    {
        uxDescriptionDisplay.Text = 
             string.Format("<p>{0}</p>", myObject.Description);
    }
    else
    {                        
        string shortendText = original.Remove(200); // 200 characters?
        uxDescriptionDisplay.Text = 
              string.Format("<p>{0}</p>", shortendText);
    }
 }

EDIT:

So this is what I got working on my own:

else 
{
    int count = 0;
    StringBuilder builder = new StringBuilder();
    string[] workingText = original.Split(' ');
    foreach (string word in workingText)
    {
        if (count < 70)
        {
            builder.AppendFormat("{0} ", word);
        }
        count++;
    }
        string shortendText = builder.ToString();
}

It's not pretty, but it worked. I would call it a pretty naive way of doing this. Thanks for all of the suggestions!


I would opt to go by a strict character count rather than a word count because you might happen to have a lot of long words.

I might do something like (pseudocode)

if text.Length > someLimit
   find first whitespace after someLimit (or perhaps last whitespace immediately before)
   display substring of text 
else 
   display text

Possible code implementation:

string TruncateText(string input, int characterLimit)
{
    if (input.Length > characterLimit)
    {
        // find last whitespace immediately before limit
        int whitespacePosition = input.Substring(0, characterLimit).LastIndexOf(" ");

        // or find first whitespace after limit (what is spec?)
        // int whitespacePosition = input.IndexOf(" ", characterLimit); 

        if (whitespacePosition > -1)
            return input.Substring(0, whitespacePosition);
    }
    return input;
}


One method, if you're using at least C#3.0, would be a LINQ like the following. This is provided you're going strictly by word count, not character count.

if (wordColl.Count > 70)
{
    foreach (var subWord in wordColl.Cast<Match>().Select(r => r.Value).Take(70))
    {
        //Build string here out of subWord
    }
}

I did a test using a simple Console.WriteLine with your Regex and your question body (which is over 70 words, it turns out).


You can use Regex Capture Groups to hold the match and access it later.

For your application, I'd recommend instead simply splitting the string by spaces and returning the first n elements of the array:

if (!string.IsNullOrEmpty(myObject.Description))
{
    string original = myObject.Description;
    string[] words = original.Split(' ');
    if (words.Length < 70)
    {
        uxDescriptionDisplay.Text = 
             string.Format("<p>{0}</p>", original);
    }
    else
    {                        
        string shortDesc = string.Empty;
        for(int i = 0; i < 70; i++) shortDesc += words[i] + " ";
        uxDescriptionDisplay.Text = 
             string.Format("<p>{0}</p>", shortDesc.Trim());
     }
 }


Are you wanting to remove 200 characters or start truncating at the 200th character? When you call original.Remove(200) you are indexing the start of the truncation at the 200th character. This is how you use Remove() for a certain number of characters to remove:

string shortendText = original.Remove(0,200);

This starts at the first character and removes 200 starting with that one. Which I imagine that's not what you're trying to do since you're shortening a description. That's merely the correct way to use Remove().

Instead of using Regex matchcollections why not just split the string? It's a lot easier and straight forward. You can set the delimiter to a space character and split that way. Not sure if that completely fixes your need but it just might. I'm not sure what your data looks like in the description. But you split this way:

String[] wordArray = original.Split(' ');

From there you can determine the word count with wordArray's Length property value.


If I was you I would go by characters as you may have many one letter words or many long words in your text.

Go through until characters <= your limit, then either find the next space and then add these characters to a new string (possibly using the SubString method) or take these characters and add a few full stops, then make a new string The later could be unproffessional I suppose.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜