Collection of strings to dictionary
Given an ordered collecti开发者_开发问答on of strings:
var strings = new string[] { "abc", "def", "def", "ghi", "ghi", "ghi", "klm" };
Use LINQ to create a dictionary of string to number of occurrences of that string in the collection:
IDictionary<string,int> stringToNumOccurrences = ...;
Preferably do this in a single pass over the strings collection...
var dico = strings.GroupBy(x => x).ToDictionary(x => x.Key, x => x.Count());
Timwi/Darin's suggestion will perform this in a single pass over the original collection, but it will create multiple buffers for the groupings. LINQ isn't really very good at doing this kind of counting, and a problem like this was my original motiviation for writing Push LINQ. You might like to read my blog post on it for more details about why LINQ isn't terribly efficient here.
Push LINQ and the rather more impressive implementation of the same idea - Reactive Extensions - can handle this more efficiently.
Of course, if you don't really care too much about the extra efficiency, go with the GroupBy
answer :)
EDIT: I hadn't noticed that your strings were ordered. That means you can be much more efficient, because you know that once you've seen string x and then string y, if x and y are different, you'll never see x again. There's nothing in LINQ to make this particularly easier, but you can do it yourself quite easily:
public static IDictionary<string, int> CountEntries(IEnumerable<string> strings)
{
var dictionary = new Dictionary<string, int>();
using (var iterator = strings.GetEnumerator())
{
if (!iterator.MoveNext())
{
// No entries
return dictionary;
}
string current = iterator.Current;
int currentCount = 1;
while (iterator.MoveNext())
{
string next = iterator.Current;
if (next == current)
{
currentCount++;
}
else
{
dictionary[current] = currentCount;
current = next;
currentCount = 1;
}
}
// Write out the trailing result
dictionary[current] = currentCount;
}
return dictionary;
}
This is O(n), with no dictionary lookups involved other than when writing the values. An alternative implementation would use foreach
and a current
value starting off at null... but that ends up being pretty icky in a couple of other ways. (I've tried it :) When I need special-case handling for the first value, I generally go with the above pattern.
Actually you could do this with LINQ using Aggregate
, but it would be pretty nasty.
The standard LINQ way is this:
stringToNumOccurrences = strings.GroupBy(s => s)
.ToDictionary(g => g.Key, g => g.Count());
If this is actual production code, I'd go with Timwi's response.
If this is indeed homework and you're expected to write your own implementation, it shouldn't be too tough. Here are just a couple of hints to point you in the right direction:
Dictionary<TKey, TValue>
has aContainsKey
method.- The
IDictionary<TKey, TValue>
interface'sthis[TKey]
property is settable; i.e., you can dodictionary[key] = 1
(which means you can also dodictionary[key] += 1
).
From those clues I think you should be able to figure out how to do it "by hand."
If you are looking for a particularly efficient (fast) solution, then GroupBy
is probably too slow for you. You could use a loop:
var strings = new string[] { "abc", "def", "def", "ghi", "ghi", "ghi", "klm" };
var stringToNumOccurrences = new Dictionary<string, int>();
foreach (var str in strings)
{
if (stringToNumOccurrences.ContainsKey(str))
stringToNumOccurrences[str]++;
else
stringToNumOccurrences[str] = 1;
}
return stringToNumOccurrences;
This is a foreach version like the one that Jon mentions that he finds "pretty icky" in his answer. I'm putting it in here, so there's something concrete to talk about.
I must admit that I find it simpler than Jon's version and can't really see what's icky about it. Jon? Anyone?
static Dictionary<string, int> CountOrderedSequence(IEnumerable<string> source)
{
var result = new Dictionary<string, int>();
string prev = null;
int count = 0;
foreach (var s in source)
{
if (prev != s && count > 0)
{
result.Add(prev, count);
count = 0;
}
prev = s;
++count;
}
if (count > 0)
{
result.Add(prev, count);
}
return result;
}
Updated to add a necessary check for empty source - I still think it's simpler than Jon's :-)
精彩评论