开发者

Beginner SQL question: querying gold and silver tag badges in Stack Exchange Data Explorer

I'm using the Stack Exchange Data Explorer to learn SQL, but I think the fundamentals of the question is applicable to other databases.

I'm trying to query the Badges table, which according to Stexdex (that's what I'm going to call it from now on) has the following schema:

  • Badges
    • Id
    • UserId
    • Name
    • Date

This works well for badges like [Epic] and [Legendary] which have unique names, but the silver and gold tag-specific badges seems to be mixed in together by having the same exact name.

Here's an example query I wrote for [mysql] tag:

SELECT
  UserId as [User Link],
  Date
FROM
  Badges
Where
  Name = 'mysql'
Order By
  Date ASC

The (slightly annotated) output is: as seen on stexdex:

User Link       Date                    
--------------- -------------------     // all for silver except where noted
Bill Karwin     2009-02-20 11:00:25     
Quassnoi        2009-06-01 10:00:16     
Greg            2009-10-22 10:00:25     
Quassnoi        2009-10-31 10:00:24     // for gold
Bill Karwin     2009-11-23 11:00:30     // for gold
cletus          2010-01-01 11:00:23    
OMG Ponies      2010-01-03 11:00:48     
Pascal MARTIN   2010-02-17 11:00:29 
Mark Byers      2010-04-07 10:00:35     
Daniel Vassallo 2010-05-14 10:00:38 

This is consistent with the current list of silver and gold earners at the moment of this writing, but to speak in more timeless terms, as of the end of May 2010 only 2 users have earned the gold [mysql] tag: Quassnoi and Bill Karwin, as evidenced in the above result by their names being the only ones that appear twice.

So this is the way I understand it:

  • The first time an Id appears (in chronological order) is for the silver badge
  • The second time is for the gold

Now, the above result mixes the silver and gold entries together. My questions are:

  • Is this a typical design, or are there much friendlier schema/normalization/whatever you call it?
  • In the current design, how would you query the silver and gold badges separately?
      开发者_如何学Go
    • GROUP BY Id and picking the min/max or first/second by the Date somehow?
    • How can you write a query that lists all the silver badges first then all the gold badges next?
      • Imagine also that the "real" query may be more complicated, i.e. not just listing by date.
      • How would you write it so that it doesn't have too many repetition between the silver and gold subqueries?
    • Is it perhaps more typical to do two totally separate queries instead?
    • What is this idiom called? A row "partitioning" query to put them into "buckets" or something?

Requirement clarification

Originally I wanted the following output, essentially:

User Link       Date                    
--------------- -------------------     
Bill Karwin     2009-02-20 11:00:25     // result of query for silver
Quassnoi        2009-06-01 10:00:16     // :
Greg            2009-10-22 10:00:25     // :
cletus          2010-01-01 11:00:23     // :
OMG Ponies      2010-01-03 11:00:48     // :
Pascal MARTIN   2010-02-17 11:00:29     // :
Mark Byers      2010-04-07 10:00:35     // :
Daniel Vassallo 2010-05-14 10:00:38     // :
------- maybe some sort of row separator here? can SQL do this? -------
Quassnoi        2009-10-31 10:00:24     // result of query for gold
Bill Karwin     2009-11-23 11:00:30     // :

But the answers so far with a separate column for silver and gold is also great, so feel free to pursue that angle as well. I'm still curious how you'd do the above, though.


Is this a typical design, or are there much friendlier schema/normalization/whatever you call it?

Sure, you could add a type code to make it more explicit. But when you consider that one can not get a gold badge before a silver one, the date stamp makes a lot of sense to differentiate between them.

In the current design, how would you query the silver and gold badges separately? GROUP BY Id and picking the min/max or first/second by the Date somehow?

Yes - joining onto a derived table (AKA inline view) that is a list of users & the minimum date would return the silver badges. Using HAVING COUNT(*) >= 1 would work too. You'd have to use a combination of GROUP BY and HAVING COUNT(*) = 2` to get gold badges - the max date doesn't ensure that there are more than one record for a userid...

How can you write a query that lists all the silver badges first then all the gold badges next?

Sorry - by users, or all silvers first and then golds? The former might be done simply by using ORDER BY t.userid, t.date; the latter I'd likely use analytic functions (IE: ROW_NUMBER(), RANK())...

Is it perhaps more typical to do two totally separate queries instead?

See above about how vague your requirements are, to me anyways...

What is this idiom called? A row "partitioning" query to put them into "buckets" or something?

What you're asking about is referred to by the following synonyms: Analytic, Windowing, ranking...


You'd do something like this and rely only on date or count in an aggregate.

Arguably, it also makes no sense to query silver followed by gold, but rather get data side by side like this:

Unfortunately, you haven't really specified what you want, but a good starting point for aggregates is to express it in plain English

Example: "Give me dates of silver and gold badge awards per user for tag mysql". Which this does:

SELECT
  UserId as [User Link],
  min(Date) as [Silver Date],
  case when count(*) = 1 THEN NULL ELSE max(date) END
FROM
  Badges
Where
  Name = 'mysql'
group by
  UserId
Order By
  case when count(*) = 1 THEN NULL ELSE max(date) END DESC, min(Date)

Edit, after update:

Your desired output is not really SQL: it's 2 separate recordsets. The separator is a no-go. As a setb based operation, there is no "natural" order so this introduces one:

SELECT
  UserId as [User Link],
  min(Date) as [Date],
  0 as dummyorder
FROM
  Badges
Where
  Name = 'mysql'
group by
  UserId
union all
select
  UserId as [User Link],
  max(Date) as [Date],
  1 as dummyorder
FROM
  Badges
Where
  Name = 'mysql'
group by
  UserId
having
  count(*) = 2
Order By
  dummyorder, Date
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜