开发者

Google Code Jam 2010 Large DataSets Take Too Long to Submit

I'm participating in the 2010 code jam and I solved two of the problems for the small data sets, but I'm not even close to solving the large data sets in the 8 minute time frame.

I'm wondering if anyone out there has solved the large data set:

  1. What hardware were you running on?
  2. What language were you running on?
  3. What performance tuning techniques did you do on your code to run as fast as possible?

I'm writing the solutions in Ruby, which is not my day to day language, and executing the开发者_开发知识库m on my Macbook Pro.

My solutions for problem A and problem C are on github at http://github.com/tjboudreaux/codejam2010.

I'd appreciate any suggestions that you may have.

FWIW, I have alot of experience in C++ from college, my primary language is PHP, and my "sandbox" language is Ruby.

Was I just a bit ambitious by taking a shot at this in Ruby, not knowing where the language struggles for performance, or does anyone see anything that's a redflag as to why I can't complete the large dataset in time to submit.


Usually the small data sets are designed to be solved with simple algorithms but the large data sets require some clever ideas to reduce the amount of time required to calculate. The language shouldn't be a big issue here if you have the right algorithm.


First, look at the algorithm that you are using and try to calculate the complexity of it. Afterwards look into the language you are using to implement it. Ruby is known to be slower than other languages, it might have an impact especially if the set is very large and time limits are short.

Have a look at this website Language Benchmarks. It compares various languages in terms of speed, performance and memory consumption.


You need the right algorithm to be successful in GCJ. Also I'd argue it's much more important how fast you can code a program in the language than how fast it is - provided the limited coding time allowed.

I used Python for GCJ and did not have a case in which the language speed "failed" me. One can say that Python is 2x faster than Ruby (per lang.benchmarks shootout); and when i used Psyco (JIT compiler module) i get about 5x speed-up - but that's small beer, the choice of language can bring only linear speed increase. Say 10 times, big whoops.

The problems in GCJ on the other hand are designed with combinatorial explosion in mind and the larger inputs lead to much larger increase of time (or memory) needed. Taking for example GCJ 2010-1C, "Making Chess Boards". Assuming square board for simplicity, the naive implementation has complexity O(n4). The fast yet complicated judge implementation was described as O(n2 log(n2)). My simpler solution - which unfortunately came to me way after the round ended - is O(n3). The difference between power of 3 to power of 4 might not seem significant but in the large input there was 512x512 table to process, for it the 3 algorithms will have to do iterations in the magnitude of

naive   68,719,476,736
mine       134,217,728
judge's      4,718,592

So my implementation on that input will be roughly 30x slower than judge's solution and ~ 500x faster than the naive code. On my oldish desktop (1.2GHz Athlon) my python code runs the large input for slightly under 4 mins. I can speculate that the optimal solution would have run in under 10 seconds - but who cares, as long as you fit in under 8 mins?

On the other hand, the n**4 algorithm would take ~500*4 min = 33 hours to run. Which is very much not acceptable and no optimizing compiler or over-clocked CPU are likely to save us from that morass.

Sure, some optimizations are possible - just adding psyco.full() decreased my runtime 5x to 46 sec. Also running the code on my faster laptop (2GHz dual core) sped it up 3 times. That's 15 times "only" - never mind let's say we sped it up 50x - that is still 10 times too slow to be able to use the "naive" algorithm on the large input.

So if you have bad algorithm, there is no optimization/compiler/hardware to help. On the other hand, if you have the best algorithm in mind, you can use computer 30 times slower than my nine-year old PC and still get results on time with scripting language of the rank of python/ruby. Which by the way is main goal of GCJ problem writers - contestants distinguishing themselves on base of programming skills and not compiler/hardware/network connection.


I made the same mistake. I write in Python, which is inherently slower than other languages due to its interpreted nature. I tried compiling it to C++ code with ShedSkin, but my algorithm was still too slow.

My solution to the Snapper Chain simply "acted out" the scenario. Pseudocode:

set first snapper to powered and off // because it is always powered
set all other snappers to unpowered and off
repeat K times:
   for each snapper in chain: // process changes in on-off-ness
      if powered and on:
         turn off
      elif powered and off:
         turn on
   for each snapper in chain: // process changes in powered-ness
      if not first snapper and previous snapper is powered and on:
         set to powered
      else:
         set to unpowered

I later realized a solution based on the fact that (2^n)-1 == a binary number with n-1 right-justified 1's, but by then I had long since run out of time on the large set.

EDIT: There's a way better solution in the Contest Analysis page on the Code Jam Dashboard.


the language is not the problem the problem is the algorithm itself, you have to obtain another algorithm rather than the brute force technique such as dynamic programming or divide and conquer


Because you are using ruby, you are going to have to find the most efficient algorithms possible. I would also try using a profiler to find the points which are using up most of your time. There are a few available. One open-source option is http://ruby-prof.rubyforge.org/.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜