开发者

Delayed_Job scraper works in development but not on Heroku

Here's the code for the scraper

class Scrape
  def perform
    url = "# a long url"

    agent = Mechanize.new
    agent.get(url)

    while(agent.page.link_with(:text => "Next Page \u00BB")) do
      agent.page.search(".content").each do |item|
        puts "."
        House.create!({
          # attributes...
        })
      end

      agent.page.link_with(:text => "Next Page \u00BB").click
    end
  end
end

On my local environment I c开发者_JAVA百科an run it in the rails console just by typing

Scrape.new.delay.perform # to queue the job
rake jobs:work

and it works perfectly.

However running the analogous (with a worker running instead of rake jobs:work) in the Heroku console doesn't seem to do anything. I tried logging some lines in the Heroku log and I can get the url variable to log (so the method is at least getting called) but the "." which is there to show each time we run the while loop never appears and no Houses are created in the database.

Anyone any ideas what might be wrong?


Solved this problem myself, pretty obscure bug though. I was using ruby 1.9.2 in my local environment but I had the app deployed on a ruby 1.8.7 stack.

The important difference being the change in character encoding between the two ruby versions which meant that Mechanize couldn't find a link with the unicode encoded character "\u00BB" and thus didn't do any scraping.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜