Accessing empty Linq result is very slow
Getting my feet wet with Linq. I am trying to determine the distinct values contained across four DataColumns. So, I start with
var c1types = (from DataRow row in dtSource.Select("hasreq")
where row["m"].ToInt() > 0
select new { col = row["m"] }).Distinct();
var c2types = (from DataRow row in dtSource.Select("hasreq")
where row["w"].ToInt() > 0
select new { col = row["w"] }).Distinct();
var c3types = (from DataRow row in dtSource.Select("hasreq")
where row["ag"].ToInt() > 0
select new { col = row["ag"] }).Distinct();
var c4types = (from DataRow row in dtSource.Select("hasreq")
where row["aq"].ToInt() > 0
select new { col = row["aq"] }).Distin开发者_JS百科ct();
foreach (var type in c1types.Union(c2types).Union(c3types).Union(c4types).Distinct())
{
...
}
This works, but is very slow (4-5 seconds). So, I put the following before the foreach
MessageBox.Show(c1types.Count().ToString()); // 1 - immediate display
MessageBox.Show(c2types.Count().ToString()); // 1 - immediate display
MessageBox.Show(c3types.Count().ToString()); // 1 - immediate display
MessageBox.Show(c4types.Count().ToString()); // 0 - 4-5 seconds to display
With my sample data, each of the first three Selects returns a single distinct value (Count() == 1). The fourth returns no values (Count() == 0). What I don't understand is why it displays the first three counts instantaneously, but the fourth takes 4-5 seconds to display. It would appear the empty result is the cause of the slowdown. What is going on here, and what is the best workaround?
From your code, I'm guessing that ToInt() is an extension method. I'm pretty sure the indexer on DataRow returns objects, and I don't remember Object defining ToInt().
If this is true, ToInt might be doing something performance sapping. Maybe it's doing something like
try { return Int32.Parse(arg); }
catch { return 0; }
If Int32.Parse can't handle most of the values from row["aq"], it could be causing the slowness - check your debug window for exceptions. If this is the problem you can speed it up using Int32.TryParse which won't throw an exception.
If I'm wrong can you provide some more information? What is ToInt()?
EDIT:
I should have known that it would be Convert.ToInt32, as Int32.Parse takes a String for a parameter. My proposed solution is as follows.
Criticism of the use of Convert.ToInt32:
You should not catch exceptions from Convert.ToInt32. It throws what Eric Lippert would call 'Boneheaded' exceptions. Exceptions that should never happen. If you get an exception from Convert.ToInt32, your program is wrong. You have tried to convert something to an Int32 that does not represent an Int32. Consider what the unit tests would look like for the ToInt extension method. You could call myPrizeSheep.ToInt() and get 0. Does converting a sheep to a number make sense? Swallowing exceptions from Convert.Int32 will usually lead to trouble down the road - in your case it was a performance issue, but often it can be worse - a correctness issue.
There is no Convert.TryConvertInt32(Object). There is an Int32.TryParse(String) though. This is because it's very common to want to parse a string input by a user to an Int32. You expect that they may have entered something that's not an Int32 - it's not an exceptional case if they do that - you can just tell the user to correct it - it's part of the normal flow of program execution.
If you have an object, you must know it represents an Int32 in order to want to try and convert it. If what you pass to Convert.ToInt32 does not represent an Int32, then that's an exceptional case. I can't think of a single instance where you'd want to 'try' to convert any old Object to an Int32 - and obviously neither can the BCL devs.
I don't think ToInt is good use of an extension method. I typically use extension methods so I can chain together calls using a nice pipe-forward style syntax. There are few instances where you'd want to chain ToInt() to other method calls.
Because all ToInt does is call Convert.ToInt32 and wrongly swallows any exceptions. You're much better off just using Convert.ToInt32 in your examples.
The solution:
Consider how to deal with things that aren't ints in your Linq query. In your case they're likely to be nulls or DBNulls. It is likely that you'll want to exclude those rows, in which case you can write something along the lines of:
var c4types = (from DataRow row in dtSource.Select("hasreq")
where row["aq"] != null && row["aq"] != DBNull.Value && row["aq"].ToInt() > 0
select new { col = row["aq"] }).Distinct();
That expression has the drawback that you end up with a list of Objects rather than Int32s. You could do a conversion in the final select, but you'd then be doing the conversion twice. To be honest, my preferred way (if you want a collection of Int32s rather than Objects) would be:
var c4types = dtSource.Select("hasreq")
.Where(row => row["aq"] != null && row["aq"] != DBNull.Value)
.Select(Convert.ToInt32(row["aq"])
.Where(i => i > 0)
.Distinct();
In the above, we first get rid of rows with null values. We then convert the rows we know have ints. Then we get rid of ints less than 1, and finally get a unique collection of integers.
精彩评论