sql: Group by x,y,z; return grouped by x,y with lowest f(z)
This is for http://cssfingerprint.com
I collect timing stats about how fast the different methods I use perform on different browsers, etc., so that I can optimize the scraping speed. Separately, I have a report about what each method returns for a handful of URLs with known-correct values, so that I can tell which methods are bogus on which browsers. (Each is different, alas.)
The related tables look like this:
CREATE TABLE `browser_tests` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`bogus` tinyint(1) DEFAULT NULL,
`result` tinyint(1) DEFAULT NULL,
`method` varchar(255) DEFAULT NULL,
`url` varchar(255) DEFAULT NULL,
`os` varchar(255) DEFAULT NULL,
`browser` var开发者_如何学运维char(255) DEFAULT NULL,
`version` varchar(255) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`user_agent` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=33784 DEFAULT CHARSET=latin1
CREATE TABLE `method_timings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`method` varchar(255) DEFAULT NULL,
`batch_size` int(11) DEFAULT NULL,
`timing` int(11) DEFAULT NULL,
`os` varchar(255) DEFAULT NULL,
`browser` varchar(255) DEFAULT NULL,
`version` varchar(255) DEFAULT NULL,
`user_agent` varchar(255) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=28849 DEFAULT CHARSET=latin1
(user_agent is broken down pre-insert into browser, version, and os from a small list of recognized values using regex; I keep the original user-agent string just in case.)
I have a query like this that tells me the average timing for every non-bogus browser / version / method tuple:
select c, avg(bogus) as bog, timing, method, browser, version
from browser_tests as b inner join (
select count(*) as c, round(avg(timing)) as timing, method,
browser, version from method_timings
group by browser, version, method
having c > 10 order by browser, version, timing
) as t using (browser, version, method)
group by browser, version, method
having bog < 1
order by browser, version, timing;
Which returns something like:
c bog tim method browser version
88 0.8333 184 reuse_insert Chrome 4.0.249.89
18 0.0000 238 mass_insert_width Chrome 4.0.249.89
70 0.0400 246 mass_insert Chrome 4.0.249.89
70 0.0400 327 mass_noinsert Chrome 4.0.249.89
88 0.0556 367 reuse_reinsert Chrome 4.0.249.89
88 0.0556 383 jquery Chrome 4.0.249.89
88 0.0556 863 full_reinsert Chrome 4.0.249.89
187 0.0000 105 jquery Chrome 5.0.307.11
187 0.8806 109 reuse_insert Chrome 5.0.307.11
123 0.0000 110 mass_insert_width Chrome 5.0.307.11
176 0.0000 231 mass_noinsert Chrome 5.0.307.11
176 0.0000 237 mass_insert Chrome 5.0.307.11
187 0.0000 314 reuse_reinsert Chrome 5.0.307.11
187 0.0000 372 full_reinsert Chrome 5.0.307.11
12 0.7500 82 reuse_insert Chrome 5.0.335.0
12 0.2500 102 jquery Chrome 5.0.335.0
[...]
I want to modify this query to return only the browser/version/method with the lowest timing - i.e. something like:
88 0.8333 184 reuse_insert Chrome 4.0.249.89
187 0.0000 105 jquery Chrome 5.0.307.11
12 0.7500 82 reuse_insert Chrome 5.0.335.0
[...]
How can I do this, while still returning the method that goes with that lowest timing?
I could filter it app-side, but I'd rather do this in mysql since it'd work better with my caching.
Get all the min time into a temp table
SELECT
ID
,MIN(Time) AS MinTime
INTO
#tempMinTimes
FROM
TABLE_NAME or (Sub Query)
GROUP BY
ID
Then Inner Join onto this temp table on the ID field.
EDIT I only saw now there are duplicate ID's, you need to get rid of duplicate ID's or group on the Method Name as well and then in your join use the ID and Method Name
精彩评论