My LINQ is really inefficent
I'm using LINQPad to learn LINQ and I've run into a stumbling block.
The goal is to get a list of Network Ids, Network Names and how many Stations each has.
Here is my original SQL:
SELECT n.iStationId AS NetworkID, n.sPrettyName AS NetworkName, COUNT(s.iStationID) AS StationCount
FROM T_StationInfo AS s, T_StationInfo as n
WHERE s.iNetworkId = n.iStationId
GROUP BY n.sPrettyName, n.iStationId
ORDER BY COUNT(s.iStationID) DESC
Here is my LINQ:
from s in T_stationInfo
from n in T_stationInfo
where s.INetworkID == n.IStationID
group s by s.INetworkID into stations
orderby stations.Count(x => x.INetworkID == stations.Key) descending
select new {
NetworkId = stations.Key,
NetworkName = T_stationInfo.Single(x => x.IStationID == stations.Key).SPrettyName,
StationCount = stations.Count(x => x.INetworkID == stations.Key)
};
LINQ takes 5 times longer to execute. I'm looking at the SQL that the linq statement generates and it pulls in the t_stationInfo table 7 times.
I believe this is because I am misusing LINQ but I don't see where or how.
What LINQ statement would create equivalent SQL or, at least, SQL that isn't so poor performing?
A couple notes:
- The structure of the table/database can not be changed.
- This question is more about learning to use LINQ than getting the list of ids, names, and counts.
- I do appreciate it! :)
--EDIT--
Just to clarify the structure: Each row in the table is an entity that has various information (name, contact, etc) and can have a parent. Those parents are also in the table. In this case parents can't have parents. Their parent field is NULL or 0.
So to get the Name of the Parent of a Station(called Network in the table), I pull the station info table in twice and join the parent id (network id) to the entity id (station id) so that on a single row I have the station's info and the parent's info. Hence the two froms of the same table.
Did that make sense?
--EDIT2--
This is the sql generated by the original LINQ query:
SELECT [t2].[iNetworkID] AS [NetworkId], (
SELECT [t5].[sPrettyName]
FROM [t_stationInfo] AS [t5]
WHERE (CONVERT(Decimal(29,4),[t5].[iStationID])) = [t2].[iNetworkID]
) AS [开发者_运维技巧NetworkName], (
SELECT COUNT(*)
FROM [t_stationInfo] AS [t6], [t_stationInfo] AS [t7]
WHERE ([t6].[iNetworkID] = [t2].[iNetworkID]) AND ([t2].[iNetworkID] = [t6].[iNetworkID]) AND ([t6].[iNetworkID] = (CONVERT(Decimal(29,4),[t7].[iStationID])))
) AS [StationCount]
FROM (
SELECT [t0].[iNetworkID]
FROM [t_stationInfo] AS [t0], [t_stationInfo] AS [t1]
WHERE [t0].[iNetworkID] = (CONVERT(Decimal(29,4),[t1].[iStationID]))
GROUP BY [t0].[iNetworkID]
) AS [t2]
ORDER BY (
SELECT COUNT(*)
FROM [t_stationInfo] AS [t3], [t_stationInfo] AS [t4]
WHERE ([t3].[iNetworkID] = [t2].[iNetworkID]) AND ([t2].[iNetworkID] = [t3].[iNetworkID]) AND ([t3].[iNetworkID] = (CONVERT(Decimal(29,4),[t4].[iStationID])))
) DESC
I don't how big of an impact this will have on your performance, if any. But when I look at your query I see one function declared twice:
stations.Count(s => s.INetworkID == stations.Key)
Does using a let
clause improve performance at all?
from station in T_stationInfo
from network in T_stationInfo
where station.INetworkID == network.IStationID
group station by station.INetworkID into stations
let stationCount = stations.Count(x => x.INetworkID == stations.Key)
orderby stationCount descending
select new
{
NetworkId = stations.Key,
NetworkName = T_stationInfo.First(x => x.IStationID == stations.Key).sPrettyName,
StationCount = stationCount
};
I feel like there should also be a better way to assign the NetworkName property, but I'm not sure.
Oh, and sorry for renaming the variables. I changed s
to station
and n
to network
to help me follow it a little better.
Thanks to Alexander Taran's comment, I reinvestigated the group syntax and finally understood how to group by more than one field. That led me to this:
from s in T_stationInfo
from n in T_stationInfo
where s.INetworkID == n.IStationID
group s by new { s.INetworkID, n.SPrettyName } into stations
orderby stations.Count() descending
select new {
NetworkId = stations.Key.INetworkID,
NetworkName = stations.Key.SPrettyName,
StationCount = stations.Count()
};
That generates the following SQL:
SELECT [t2].[iNetworkID] AS [NetworkId], [t2].[sPrettyName] AS [NetworkName], [t2].[value2] AS [StationCount]
FROM (
SELECT COUNT(*) AS [value], COUNT(*) AS [value2], [t0].[iNetworkID], [t1].[sPrettyName]
FROM [t_stationInfo] AS [t0], [t_stationInfo] AS [t1]
WHERE [t0].[iNetworkID] = (CONVERT(Decimal(29,4),[t1].[iStationID]))
GROUP BY [t0].[iNetworkID], [t1].[sPrettyName]
) AS [t2]
ORDER BY [t2].[value] DESC
The outer select looks like simple renaming and reordering. If I move the ORDER BY to the inside select and strip off the outer select, it runs at the same speed as my hand made SQL.
As a bonus, the new query seems much easier to understand.
-- EDIT --
In accordance with Justin Rusbatch's answer I pulled stations.Count() out into a let and it did have a small performace increase (about 10%). It also cleaned up the rendered sql somewhat.
As of now this is the best I have accomplished:
from station in T_stationInfo
from network in T_stationInfo
where station.INetworkID == network.IStationID
group station by new { station.INetworkID, network.SPrettyName } into stations
let stationCount = stations.Count()
orderby stationCount descending
select new
{
NetworkId = stations.Key.INetworkID,
NetworkName = stations.Key.SPrettyName,
StationCount = stationCount
};
And that creates the following SQL:
SELECT [t2].[iNetworkID] AS [NetworkId], [t2].[sPrettyName] AS [NetworkName], [t2].[value] AS [StationCount]
FROM (
SELECT COUNT(*) AS [value], [t0].[iNetworkID], [t1].[sPrettyName]
FROM [t_stationInfo] AS [t0], [t_stationInfo] AS [t1]
WHERE [t0].[iNetworkID] = (CONVERT(Decimal(29,4),[t1].[iStationID]))
GROUP BY [t0].[iNetworkID], [t1].[sPrettyName]
) AS [t2]
ORDER BY [t2].[value] DESC
Which is a massive improvement over where I started.
精彩评论