开发者

String to dictionary using regex (want to optimize)

I have string on the format "$0Option one$1$Option two$2$Option three" (etc) that I want to convert into a dictionary where each number corresponds to an option. I currently have a working solution for this problem, but since this method is called for every entry I'm importing (few thousand) I want it to be as optimized as possible.

public Dictionary<string, int> GetSelValsDictBySelValsString(string selectableValuesString)
{
    // Get all numbers in the string. 
    var correspondingNumbersArray = Regex.Split(selectableValuesString, @"[^\d]+").Where(x => (!String.IsNullOrWhiteSpace(x))).ToArray();

    List<int> correspondingNumbers = new List<int>();

    int number;
    foreach (string s in correspondingNumbersArray)
    {
        Int32.TryParse(s, out number);
        correspondingNumbers.Add(number);
    }

    selectableValuesString = selectableValuesString.Replace("$", "");

    var selectableStringValuesArray = Regex.Split(selectableValuesString, @"[\d]+").Where(x => (!String.IsNullOrWhiteSpace(x))).ToArray();

    var selectableValues = new Dictionary<string, int>();

    for (int i = 0; i < selectableStringValuesArray.Count(); i++)
    {
        selectableValues.Add(selectableStringValuesArray.ElementAt(i开发者_运维百科), correspondingNumbers.ElementAt(i));
    }

    return selectableValues;
}


The first thing that caught my attention in your code is that it processes the input string three times: twice with Split() and once with Replace(). The Matches() method is a much better tool than Split() for this job. With it, you can extract everything you need in a single pass. It makes the code a lot easier to read, too.

The second thing I noticed was all those loops and intermediate objects. You're using LINQ already; really use it, and you can eliminate all of that clutter and improve performance. Check it out:

public static Dictionary<int, string> GetSelectValuesDictionary(string inputString)
{
  return Regex.Matches(inputString, @"(?<key>[0-9]+)\$*(?<value>[^$]+)")
    .Cast<Match>()
    .ToDictionary(
        m => int.Parse(m.Groups["key"].Value),
        m => m.Groups["value"].Value);
}

notes:

  • Cast<Match>() is necessary because MatchCollection only advertises itself as an IEnumerable, and we need it to be an IEnumerable<Match>.
  • I used [0-9] instead of \d on the off chance that your values might contain digits from non-Latin writing systems; in .NET, \d matches them all.
  • Static Regex methods like Matches() automatically cache the Regex objects, but if this method is going to be called a lot (especially if you're using a lot of other regexes, too), you might want to create a static Regex object anyway. If performance is really critical, you can specify the Compiled option while you're at it.
  • My code, like yours, makes no attempt to deal with malformed input. In particular, mine will throw an exception if the number turns out to be too large, while yours just converts it to zero. This probably isn't relevant to your real code, but I felt compelled to express my unease at seeing you call TryParse() without checking the return value. :/
  • You also don't make sure your keys are unique. Like @Gabe, I flipped it around used the numeric values as the keys, because they happened to be unique and the string values weren't. I trust that, too, is not a problem with your real data. ;)


Your selectableStringValuesArray is not actually an array! This means that every time you index into it (with ElementAt or count it with Count) it has to rerun the regex and walk through the list of results looking for non-whitespace. You need something like this instead:

var selectableStringValuesArray = Regex.Split(selectableValuesString, @"[\d]+").Where(x => (!String.IsNullOrWhiteSpace(x))).ToArray();

You should also fix your correspondingNumbersString because it has the same problem.

I see you're using C# 4, though, so you can use Zip to combine the lists and then you wouldn't have to create an array or use any loops. You could create your dictionary like this:

return correspondingNumbersString.Zip(selectableStringValuesArray,
       (number, str) => new KeyValuePair<int, string>(int.Parse(number), str))
      .ToDictionary(kvp => kvp.Key, kvp => kvp.Value);
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜