开发者

A database of questions with unambiguous numeric answers

I (and co-hackers) are building a sort of trivia game inspired by this blog post: http://messymatters.com/calibration. The idea is to give confidence intervals and learn how to be calibrated (when you're "90% sure" you开发者_C百科 should be right 90% of the time).

We're thus looking for, ideally, thousands of questions with unambiguous numerical answers. Also, they shouldn't be too boring. There are a lot of random statistics out there -- eg, enclosed water area in different countries -- that would make the game mind-numbing. Things like release dates of classic movies are more interesting (to most people).

Other interesting ones we've found include Olympic records, median incomes for different professions, dates of famous inventions, and celebrity ages. Scraping things like above, by the way, was my reason for asking this question: Scrape HTML tables from a given URL into CSV

So, if you know of other sources of interesting numerical facts (in a parsable form) I'm eager for pointers to them. Thanks!


Video game category

vgchartz.com have various charts for video game titles and hardware performance.

Sample queries:

  • Worldwide total sales of video game titles of all time
  • Hardware sales between 01/03/2010 to 05/22/2010: Wii-PS3-X360 in America, Japan, UK, Australia

There's enough data for questions like:

  • How many hardware/title X were sold in Year Y/first week of sales?
  • Title X outsells Title Y (in their respective first N weeks of sales) by how much/what ratio?

Popular music category

billboard.com is all you need.

Wikipedia links

  • Billboard charts
  • Billboard Hot 100
  • Billboard 200
  • Billboard Hot 100 50th Anniversary Charts
  • List of best-charting U.S. music artists
  • List of best-selling music artists
  • Best-selling albums in the United States since Nielsen SoundScan tracking began

In addition to sales figures, you can also ask queries about chart positions, e.g.:

  • In Category Y of Chart Z, where does song X place/how many songs does artist X have?

Making the most out of your data

You can make unambiguous numeric Q/A out of most lists. Take for example, a list like TIME.com All Time 100 Novels

Some generic questions that can be asked are:

  • How many are written in a given time period?
    • Decade, year, in the presidency of George Bush, before 9/11, etc.
  • What's the gap in rank between Title X and Title Y?
    • Pairwise queries like this really make the most of your data!

You can do this with any given Top 100 lists:

  • Time 100
  • Time 100: The Most Important People of the Century
  • Bravo's 100 Greatest TV Characters
  • TV Guide's 100 Greatest Episodes of All Time
  • List of most-watched television broadcasts

History category

historyorb.com is just one example. The URLs and HTMLs are very scrape-friendly.

  • Calendar of Famous Birthdays, Deaths, Events

There are many similar sites, e.g. brainyhistory.com.

You can also use these dates to "cross" with the other data (e.g. the Top 100 Novels example above).


Movie category

The Internet Movie Database is of course... the internet movie database!

  • IMDb/USA Video Rentals Archive Calendar, All-Time World Wide Box Office
    • "How much do Movie X, Y, Z gross in total?"
  • The plain text data files (available via FTP, read copyright/license)


All the stats U'll ever need...


There are several "open" databases available online.

http://unstats.un.org/unsd/databases.htm

Just pull your data from them, and you are up!!

NOTE: You might want to cache each Question once you pull it, for future re-use (different user).

GoodLUCK!!

CVS @ 2600Hertz


Box Office Mojo is a great one for how much famous movies have grossed. I think people find that interesting.


You can try knocking at the front door:

Pioneer Grants: Pioneer Grants are available for startups and other developers building innovative applications with the Wolfram|Alpha API.

(http://products.wolframalpha.com/api/pricing.html)


Well, if you'd like to make questions like "what's population of country X?", "how high is the highest mountain in Europe?" then this could be your choice:

http://www.dbis.informatik.uni-goettingen.de/Mondial/

The MONDIAL database has been compiled from geographical Web data sources listed below:

  • CIA World Factbook,
  • a predecessor of Global Statistics which has been collected by Johan van der Heijden.
  • additional textual sources for coordinates,
  • the International Atlas by Kümmerly & Frey, Rand McNally, and Westermann,
  • and some geographical data of the Karlsruhe TERRA database.


Sports trivia would lend itself pretty well to this, as you can come up with a ton of questions that 1) have unambiguous numerical answers and 2) some people actually care about. I know a downloadable database for baseball statistics is out there, and I'd be surprised if you couldn't find similar databases other major (and not-so-major) sports as well. You'll still have to pick and choose, as there's such a thing as too much minutia even for die-hard sports fans ("How many strikeouts did [obscure pitcher] compile in 1923?"), but it should be a rich environment to mine.


Wikipedia has a number of number that show up repeatedly (often in a side bar) for instance, many if not most TV show pages have a link to a list of episodes and the link has a episode count.


The questions in this game are perfect for what we have in mind:

http://en.wikipedia.org/wiki/Wits_and_Wagers

I wonder how the creators of Wits & Wagers collected those questions...


World Facts (Crime, Economy, Food etc...)

http://www.nationmaster.com/facts.php

Did you know? (Facts | Fast Facts | Animals | History | Lists | News | Phobias)

http://didyouknow.org/


Cricket statistics. Popular with millions of people around the world, and all accessible from the incredible database at http://www.cricinfo.com. Highly recommend.

Also the CIA factbook: https://www.cia.gov/library/publications/the-world-factbook/

has all sorts of useful numerical facts about countries and the like.


WolframAlpha might be a good place to look for numerical data in all sorts of categories.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜