Build array of flashvars using hpricot
I have used hpricot before for grabing con开发者_如何学Pythontent from websites that are within some HTML tags however I am trying to build an array of all the flashvars found on this page http://view-source:http://megavideo.com/?v=014U2YO9
require 'hpricot'
require 'open-uri'
flashvars = Array.new
doc = Hpricot(open("http://megavideo.com/?v=014U2YO9"))
for flashvars in (doc/"/param[@name='flashvars']") do
flashvars << flashvar
end
I have been trying with the above code snippet, hopefully I was on the right tracks, would anyone be able to help me further?
Thankyou
You have used syntax indicating that you are trying to fetch attributes from <param>
elements, but no such markup exists on that page. There are a plethora of JavaScript assignments to properties of a flashvar
object. Assuming that these are what you want, you don't need Hpricot, just a regex for the JS. This seems to work:
require 'open-uri'
html = open("http://megavideo.com/?v=014U2YO9").read
flashvars = Hash[ html.scan( /flashvars\.(\w+)\s*=\s*["']?(.+?)["']?;/ ) ]
require 'pp' # Just for pretty output here
pp flashvars
#=> {"logintxt"=>"Login",
#=> "registertxt"=>"Register",
#=> "searchtxt"=>"Search videos",
#=> "searchrestxt"=>"\"",
#=> "useSystemFont"=>"0",
#=> "size"=>"17",
#=> "loginAct"=>"?c=login%26next%3Dv%253D014U2YO9",
#=> "registerAct"=>"?c=signup",
#=> "userAct"=>"?c=account",
#=> "signoutAct"=>"javascript:signout()",
#=> "myvideostxt"=>"My Videos",
#=> "videosAct"=>"?c=myvideos",
#=> "added"=>"2011-04-14",
#=> "username"=>"beenerkeekee19952",
#=> etc.
Note that this leaves all values as strings in Ruby, even values that were numbers in JavaScript. As it strips off leading/trailing quote marks for the JavaScript strings, the result is that you cannot discern flashvars.foo = 42;
from flashvars.bar = "42";
.
精彩评论