Building a smart string trimming function in C#
I am attempting to build a string extension method to trim a string to a certain length but with not breaking a word. I wanted to check to see if there was anything built into the framework or a more clever method than mine. Here's mine so far (not thoroughly tested):
public static string SmartTrim(this string s, int length)
{
StringBuilder result = new StringBuilder();
if (length >= 0)
{
if (s.IndexOf(' ') > 0)
{
string[] words = s.Split(' ');
int index = 0;
while (index < words.Length - 1 && result.Length + words[index + 1].Length <= length)
{
result.Append(words[index]);
result.Append(" ");
index++;
}
if (result.Length > 0)
{
result.Remove(result.Length - 1, 1);
}
}
else
{
开发者_如何学运维 result.Append(s.Substring(0, length));
}
}
else
{
throw new ArgumentOutOfRangeException("length", "Value cannot be negative.");
}
return result.ToString();
}
I'd use string.LastIndexOf
- at least if we only care about spaces. Then there's no need to create any intermediate strings...
As yet untested:
public static string SmartTrim(this string text, int length)
{
if (text == null)
{
throw new ArgumentNullException("text");
}
if (length < 0)
{
throw new ArgumentOutOfRangeException();
}
if (text.Length <= length)
{
return text;
}
int lastSpaceBeforeMax = text.LastIndexOf(' ', length);
if (lastSpaceBeforeMax == -1)
{
// Perhaps define a strategy here? Could return empty string,
// or the original
throw new ArgumentException("Unable to trim word");
}
return text.Substring(0, lastSpaceBeforeMax);
}
Test code:
public class Test
{
static void Main()
{
Console.WriteLine("'{0}'", "foo bar baz".SmartTrim(20));
Console.WriteLine("'{0}'", "foo bar baz".SmartTrim(3));
Console.WriteLine("'{0}'", "foo bar baz".SmartTrim(4));
Console.WriteLine("'{0}'", "foo bar baz".SmartTrim(5));
Console.WriteLine("'{0}'", "foo bar baz".SmartTrim(7));
}
}
Results:
'foo bar baz'
'foo'
'foo'
'foo'
'foo bar'
How about a Regex based solution ? You will probably want to test some more, and do some bounds checking; but this is what spring to my mind:
using System;
using System.Text.RegularExpressions;
namespace Stackoverflow.Test
{
static class Test
{
private static readonly Regex regWords = new Regex("\\w+", RegexOptions.Compiled);
static void Main()
{
Console.WriteLine("The quick brown fox jumped over the lazy dog".SmartTrim(8));
Console.WriteLine("The quick brown fox jumped over the lazy dog".SmartTrim(20));
Console.WriteLine("Hello, I am attempting to build a string extension method to trim a string to a certain length but with not breaking a word. I wanted to check to see if there was anything built into the framework or a more clever method than mine".SmartTrim(100));
}
public static string SmartTrim(this string s, int length)
{
var matches = regWords.Matches(s);
foreach (Match match in matches)
{
if (match.Index + match.Length > length)
{
int ln = match.Index + match.Length > s.Length ? s.Length : match.Index + match.Length;
return s.Substring(0, ln);
}
}
return s;
}
}
}
Try this out. It's null-safe, won't break if length is longer than the string, and involves less string manipulation.
Edit: Per recommendations, I've removed the intermediate string. I'll leave the answer up as it could be useful in cases where exceptions are not wanted.
public static string SmartTrim(this string s, int length)
{
if(s == null || length < 0 || s.Length <= length)
return s;
// Edit a' la Jon Skeet. Removes unnecessary intermediate string. Thanks!
// string temp = s.Length > length + 1 ? s.Remove(length+1) : s;
int lastSpace = s.LastIndexOf(' ', length + 1);
return lastSpace < 0 ? string.Empty : s.Remove(lastSpace);
}
string strTemp = "How are you doing today";
int nLength = 12;
strTemp = strTemp.Substring(0, strTemp.Substring(0, nLength).LastIndexOf(' '));
I think that should do it. When I ran that, it ended up with "How are you".
So your function would be:
public static string SmartTrim(this string s, int length)
{
return s.Substring(0, s.Substring(0, length).LastIndexOf(' '));;
}
I would definitely add some exception handling though, such as making sure the integer length is no greater than the string length and not less than 0.
Obligatory LINQ one liner, if you only care about whitespace as word boundary:
return new String(s.TakeWhile((ch,idx) => (idx < length) || (idx >= length && !Char.IsWhiteSpace(ch))).ToArray());
Use like this
var substring = source.GetSubstring(50, new string[] { " ", "." })
This method can get a sub-string based on one or many separator characters
public static string GetSubstring(this string source, int length, params string[] options)
{
if (string.IsNullOrWhiteSpace(source))
{
return string.Empty;
}
if (source.Length <= length)
{
return source;
}
var indices =
options.Select(
separator => source.IndexOf(separator, length, StringComparison.CurrentCultureIgnoreCase))
.Where(index => index >= 0)
.ToList();
if (indices.Count > 0)
{
return source.Substring(0, indices.Min());
}
return source;
}
I'll toss in some Linq goodness even though others have answered this adequately:
public string TrimString(string s, int maxLength)
{
var pos = s.Select((c, idx) => new { Char = c, Pos = idx })
.Where(item => char.IsWhiteSpace(item.Char) && item.Pos <= maxLength)
.Select(item => item.Pos)
.SingleOrDefault();
return pos > 0 ? s.Substring(0, pos) : s;
}
I left out the parameter checking that others have merely to accentuate the important code...
精彩评论