开发者

c# Regex remove words of less than 3 letters?

Any ideas on the regex need to remove words of 3 letters or less? So it would find "ii it was bbb c开发者_StackOverflow社区at rat hat" etc but not "four, three, twos".


Regex to match words of length 1 to 3 would be \b\w{1,3}\b, replace these matches with empty string.

Regex re = new Regex(@"\b\w{1,3}\b");
var result = re.Replace(input, "");

To also remove double spaces you could use:

Regex re = new Regex(@"\s*\b\w{1,3}\b\s*");
var result = re.Replace(input, " ");

(Altho it will leave a space at the beginning/end of string.)


Don't necessarily need a regex for this, it can be done with a simple linq selection.

string[] words = inputString.Split(' ');

var longWords = words.Where(x => x.Length > 3);

string outputString = String.Join(" ", longWords.ToArray());

Hell you could even do it in one line of code:

outputString = String.Join(" ", inputString.Split(' ').Where(x => x.Length > 3).ToArray());


I'm going to go out on a limb here and throw a non-regex solution at you:

public static string StripWordsWithLessThanXLetters(string input, int x)
{
    var inputElements = input.Split(' ');
    var resultBuilder = new StringBuilder();
    foreach (var element in inputElements)
    {
        if (element.Length >= x)
        {
            resultBuilder.Append(element + " ");
        }
    }
    return resultBuilder.ToString().Trim();
}

This is more verbose than the other solutions, but I think the performance cost of using the Linq solution might outweigh its net benefit, and a regex incur the same costs (potentially with more complexity to maintenance.)


string qText = "Long or not long sample text";
var inputWords = qText.Split(' ').ToList();
var rem = (from c in inputWords
           where c.Length > 3
           select c).ToList();


If the performance is an issue, here's another implementation that doesn't involve regex nor linq.

It's about 20% to 25% faster than other solutions

public static string StripWordsByLength(this string str, int minLength)
{
    bool addSpace = false;
    StringBuilder resultBuilder = new StringBuilder();
    foreach (string word in str.Split(' '))
    {
        if (word.Length >= minLength)
        {
            if (addSpace)
                resultBuilder.Append(' ');
            else
                addSpace = true;

            resultBuilder.Append(word);
        }
    }
    return resultBuilder.ToString();
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜