开发者

How to evaluate javascript code in Python

I need to fetch some result on a webpage, which use some JavaScript code to generate the part I am interesting in like following

eval(function(p,a,c,k,e,d){e=function(c){return c};if(!''.replace(/^/,String)){while(c--)d[c]=k[c]||c;k=[function(e){return d[e]}];e=function(){return'\\w+'};c=1;};while(c--)if(k[c])p=p.replace(new RegExp('\\b'+e(c)+'\\b','g'),k[c]);return p;}('5 11=17;5 12=["/3/2/1/0/13.4","/3/2/1/0/15.4","/3/2/1/0/14.4","/3/2/1/0/7.4","/3/2/1/0/6.4","/3/2/1/0/8.4","/3/2/1/0/10.4","/3/2/1/0/9.4","/3/2/1/0/23.4","/3/2/1/0/22.4","/3/2/1/0/24.4","/3/2/1/0/26.4","/3/2/1/0/25.4","/3/2/1/0/18.4","/3/2/1/0/16.4","/3/2/1/0/19.4","/3/2/1/0/21.4"];5 20=0;',10,27,'40769|开发者_如何学运维54|Images|Files|png|var|imanhua_005_140430179|imanhua_004_140430179|imanhua_006_140430226|imanhua_008_140430242|imanhua_007_140430226|len|pic|imanhua_001_140429664|imanhua_003_140430117|imanhua_002_140430070|imanhua_015_140430414||imanhua_014_140430382|imanhua_016_140430414|sid|imanhua_017_140430429|imanhua_010_140430289|imanhua_009_140430242|imanhua_011_140430367|imanhua_013_140430382|imanhua_012_140430367'.split('|'),0,{}))

The result of eval() is valuable to me, I am writing a Python script, is there any library I can use to virtually run this piece of JavaScript code and get the output?

Thanks


pyv8 is a set of bindings for the V8 JavaScript Engine (Google Chrome)


Use a spidermonkey binding

from spidermonkey import Runtime
rt = Runtime()
cx = rt.new_context()
result = cx.eval_script(whatyoupostedabove)


You can use PyQt with the WebKit module :) It has JS engine and can evaluate JS within context of a (X)HTML document.


I suppose you solved the problem by now, but I wanted to share another (in my opinion a much more viable) option. When you are interested in evaluating just one --known-- javascript function, it may be easier to implement this function in Python rather than pull in a huge tool that is built to parse and run all imaginable javascript in the world.

So I would suggest to write a python version of the javascript unpacker function and most is solved. I did in fact do that and here is an example. The int2base function is Alex Martelli's implementation which can be found here.

def unpack(p, a, c, k, e=None, d=None):
    ''' unpack
    Unpacker for the popular Javascript compression algorithm.
    
    @param  p  template code
    @param  a  radix for variables in p
    @param  c  number of variables in p
    @param  k  list of c variable substitutions
    @param  e  not used
    @param  d  not used
    @return p  decompressed string
    '''
    # Paul Koppen, 2011
    for i in xrange(c-1,-1,-1):
        if k[i]: p = re.sub('\\b'+int2base(i,a)+'\\b', k[i], p)
    return p

Finally you need to do a tiny bit of parsing to extract the four function arguments. Just for the sake of a simple illustration though, I use eval here to let Python do that for me.

s  = '''eval(function(p,a,c,k,e,d){e=function(c){return c};if(!''.replace(/^/,String)){while(c--)d[c]=k[c]||c;k=[function(e){return d[e]}];e=function(){return'\\w+'};c=1;};while(c--)if(k[c])p=p.replace(new RegExp('\\b'+e(c)+'\\b','g'),k[c]);return p;}('5 11=17;5 12=["/3/2/1/0/13.4","/3/2/1/0/15.4","/3/2/1/0/14.4","/3/2/1/0/7.4","/3/2/1/0/6.4","/3/2/1/0/8.4","/3/2/1/0/10.4","/3/2/1/0/9.4","/3/2/1/0/23.4","/3/2/1/0/22.4","/3/2/1/0/24.4","/3/2/1/0/26.4","/3/2/1/0/25.4","/3/2/1/0/18.4","/3/2/1/0/16.4","/3/2/1/0/19.4","/3/2/1/0/21.4"];5 20=0;',10,27,'40769|54|Images|Files|png|var|imanhua_005_140430179|imanhua_004_140430179|imanhua_006_140430226|imanhua_008_140430242|imanhua_007_140430226|len|pic|imanhua_001_140429664|imanhua_003_140430117|imanhua_002_140430070|imanhua_015_140430414||imanhua_014_140430382|imanhua_016_140430414|sid|imanhua_017_140430429|imanhua_010_140430289|imanhua_009_140430242|imanhua_011_140430367|imanhua_013_140430382|imanhua_012_140430367'.split('|'),0,{}))'''
js = eval('unpack' + s[s.find('}(')+1:-1])

Result:

'var len=17;var pic=["/Files/Images/54/40769/imanhua_001_140429664.png","/Files/Images/54/40769/imanhua_002_140430070.png","/Files/Images/54/40769/imanhua_003_140430117.png","/Files/Images/54/40769/imanhua_004_140430179.png","/Files/Images/54/40769/imanhua_005_140430179.png","/Files/Images/54/40769/imanhua_006_140430226.png","/Files/Images/54/40769/imanhua_007_140430226.png","/Files/Images/54/40769/imanhua_008_140430242.png","/Files/Images/54/40769/imanhua_009_140430242.png","/Files/Images/54/40769/imanhua_010_140430289.png","/Files/Images/54/40769/imanhua_011_140430367.png","/Files/Images/54/40769/imanhua_012_140430367.png","/Files/Images/54/40769/imanhua_013_140430382.png","/Files/Images/54/40769/imanhua_014_140430382.png","/Files/Images/54/40769/imanhua_015_140430414.png","/Files/Images/54/40769/imanhua_016_140430414.png","/Files/Images/54/40769/imanhua_017_140430429.png"];var sid=40769;'

Additional note: it was brought to my attention that if the radix > 36 then Alex' int2base function breaks. The solution is to modify it by adding uppercase characters like so: digs = string.digits + string.lowercase + string.uppercase


This seems to be suitable for my need: http://code.google.com/p/python-spidermonkey/


when importing javacript module is not option, I use this

import re

def baseN(num,b,numerals="0123456789abcdefghijklmnopqrstuvwxyz"):
    return ((num == 0) and numerals[0]) or (baseN(num // b, b, numerals).lstrip(numerals[0]) + numerals[num % b])

def unpack(p, a, c, k, e=None, d=None):
    while (c):
        c-=1
        if (k[c]):
            p = re.sub("\\b" + baseN(c, a) + "\\b",  k[c], p)
    return p

encrypted = r'''eval(function(p,a,c,k,e,d){e=function(c){return c};if(!''.replace(/^/,String)){while(c--)d[c]=k[c]||c;k=[function(e){return d[e]}];e=function(){return'\\w+'};c=1;};while(c--)if(k[c])p=p.replace(new RegExp('\\b'+e(c)+'\\b','g'),k[c]);return p;}('5 11=17;5 12=["/3/2/1/0/13.4","/3/2/1/0/15.4","/3/2/1/0/14.4","/3/2/1/0/7.4","/3/2/1/0/6.4","/3/2/1/0/8.4","/3/2/1/0/10.4","/3/2/1/0/9.4","/3/2/1/0/23.4","/3/2/1/0/22.4","/3/2/1/0/24.4","/3/2/1/0/26.4","/3/2/1/0/25.4","/3/2/1/0/18.4","/3/2/1/0/16.4","/3/2/1/0/19.4","/3/2/1/0/21.4"];5 20=0;',10,27,'40769|54|Images|Files|png|var|imanhua_005_140430179|imanhua_004_140430179|imanhua_006_140430226|imanhua_008_140430242|imanhua_007_140430226|len|pic|imanhua_001_140429664|imanhua_003_140430117|imanhua_002_140430070|imanhua_015_140430414||imanhua_014_140430382|imanhua_016_140430414|sid|imanhua_017_140430429|imanhua_010_140430289|imanhua_009_140430242|imanhua_011_140430367|imanhua_013_140430382|imanhua_012_140430367'.split('|'),0,{}))'''

encrypted = encrypted.split('}(')[1][:-1]

print eval('unpack(' + encrypted)

output:

var len=17;var pic=["/Files/Images/54/40769/imanhua_001_140429664.png","/Files/Images/54/40769/imanhua_002_140430070.png","/Files/Images/54/40769/imanhua_003_140430117.png","/Files/Images/54/40769/imanhua_004_140430179.png","/Files/Images/54/40769/imanhua_005_140430179.png","/Files/Images/54/40769/imanhua_006_140430226.png","/Files/Images/54/40769/imanhua_007_140430226.png","/Files/Images/54/40769/imanhua_008_140430242.png","/Files/Images/54/40769/imanhua_009_140430242.png","/Files/Images/54/40769/imanhua_010_140430289.png","/Files/Images/54/40769/imanhua_011_140430367.png","/Files/Images/54/40769/imanhua_012_140430367.png","/Files/Images/54/40769/imanhua_013_140430382.png","/Files/Images/54/40769/imanhua_014_140430382.png","/Files/Images/54/40769/imanhua_015_140430414.png","/Files/Images/54/40769/imanhua_016_140430414.png","/Files/Images/54/40769/imanhua_017_140430429.png"];var sid=40769;
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜