How to unpack Javascript in Python
I would like to retrieve the contents of a javascript script instead of executing it upon requesting it.
EDIT: I understand that Python is not executing the javascript code. The issue is that when I request this online JS script it gets executed. I'm unable to retrieve the contents of the script. Maybe what I want is to decode the script like so http://jsunpack.jeek.org/dec/go
That's what my开发者_StackOverflow中文版 code looks like to request the js file:
def request(self, uri):
data = None
req = urllib2.Request(uri, data, self.header)
response = urllib2.urlopen(req)
html_text = response.read()
return html_text.decode()
I know approximately what the insides of the script look like but all I get after the request is issued is a 'loaded' message. My guess is that the JS code gets executed. Is there any way to just request the code?
There is no HTML or JavaScript interpreter in urllib2
. This module does nothing but fetch the resource and return it to you raw; it certainly will not attempt to execute any JavaScript code it receives. If you are not receiving the response you expect, check the URL with a tool like wget
or monitor the network connection with Wireshark or Fiddler to see what the server is actually returning.
(decode()
here only converts the bytes of the HTTP response body to Unicode characters—using the default character encoding, which probably isn't a good idea.)
ETA:
I guess what I want is to decode the Javascript like so jsunpack.jeek.org/dec/go
Ah, well that's a different game entirely. You can get the source for that here, though you'll also need to install SpiderMonkey, the JavaScript engine from Mozilla, to allow it to run the downloaded JavaScript.
There's no way to automatically ‘unpack’ obfuscated JavaScript without running it, since the packing code can do anything at all and JS is a Turing-complete language. All this tool does is run it with some wrapper code for functions like eval
which packers/obfuscators typically use. Unfortunately, this sabotage is easily detectable, so if it's malware you're trying to unpack you'll find this fails as often as it succeeds.
I'm not sure I understand. If I do a simplified version of your code and run it on a URI that's sure to have some javascript:
>>> import urllib2
>>> res = urllib2.urlopen("http://stackoverflow.com/questions/6946867/how-to-unpack-javascript-in-python")
And you print res (or res.decode()), the javascript is intact.
Doing urlopen should retrieve whatever character stream the source provides. It's up to you to do something with it (render it as html, interpret it as javascript, etc).
精彩评论