开发者

Build array of flashvars using hpricot

I have used hpricot before for grabing con开发者_如何学Pythontent from websites that are within some HTML tags however I am trying to build an array of all the flashvars found on this page http://view-source:http://megavideo.com/?v=014U2YO9

require 'hpricot'
require 'open-uri'

flashvars = Array.new
doc = Hpricot(open("http://megavideo.com/?v=014U2YO9"))

for flashvars in (doc/"/param[@name='flashvars']") do
  flashvars << flashvar
end

I have been trying with the above code snippet, hopefully I was on the right tracks, would anyone be able to help me further?

Thankyou


You have used syntax indicating that you are trying to fetch attributes from <param> elements, but no such markup exists on that page. There are a plethora of JavaScript assignments to properties of a flashvar object. Assuming that these are what you want, you don't need Hpricot, just a regex for the JS. This seems to work:

require 'open-uri'
html = open("http://megavideo.com/?v=014U2YO9").read

flashvars = Hash[ html.scan( /flashvars\.(\w+)\s*=\s*["']?(.+?)["']?;/ ) ]

require 'pp' # Just for pretty output here
pp flashvars

#=> {"logintxt"=>"Login",
#=>  "registertxt"=>"Register",
#=>  "searchtxt"=>"Search videos",
#=>  "searchrestxt"=>"\"",
#=>  "useSystemFont"=>"0",
#=>  "size"=>"17",
#=>  "loginAct"=>"?c=login%26next%3Dv%253D014U2YO9",
#=>  "registerAct"=>"?c=signup",
#=>  "userAct"=>"?c=account",
#=>  "signoutAct"=>"javascript:signout()",
#=>  "myvideostxt"=>"My Videos",
#=>  "videosAct"=>"?c=myvideos",
#=>  "added"=>"2011-04-14",
#=>  "username"=>"beenerkeekee19952",
#=>  etc.

Note that this leaves all values as strings in Ruby, even values that were numbers in JavaScript. As it strips off leading/trailing quote marks for the JavaScript strings, the result is that you cannot discern flashvars.foo = 42; from flashvars.bar = "42";.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜