开发者

How do I Extract a Javascript Value using Regular Expressions?

I'm trying the extract the ProductValue from the following bit of Javascript:

<script language="javascript" type="text/javascript">
lpAddVars('page','Section','womens');
lpAddVars('page','CartTotal','0.00');

    lpAddVars('page','ProductID','43577');
    lpAddVars('page','ProductValue','128.00');  

</script>

I don't think Beautiful Soup parses javascript so I think the best way to do this may be to use a regular expression, but I'm very new to re and so far nothing I've tried seems to work. Any advice or help on how to accomplish开发者_开发百科 this?

Thanks!


This should work:

import re

javascript_text = '''
    <script language="javascript" type="text/javascript">
    lpAddVars('page','Section','womens');
    lpAddVars('page','CartTotal','0.00');

        lpAddVars('page','ProductID','43577');
        lpAddVars('page','ProductValue','128.00');  

    </script>
'''

product_value = re.findall(r"ProductValue.*,['|\"](.*)['|\"]", javascript_text)

# at this point, product_value = ['128.00']

So what is "ProductValue.*,'|\"['|\"]" even doing?

"ProductValue.*,'|\"['|\"]"

ProductValue -- just a literal string that you're searching for

.* -- we want any amount of characters, so spaces, single quotes, whatever

, -- we'll stop allowing ".*" to match on all characters once we reach the ","

['|\"] -- we want to match either a single quote or a double quote

(.*) -- this is the bit we're actually interested in, which can be any characters

['|\"] -- again, we'll stop the ".*" once we reach a closing single or double quote

From this point on, I would do something like:

product_values = []
for value in product_value:
    value = value.strip() # get rid of any excess whitespace
    value = float(value) # ProductValue appears to be a float of some sort
    product_values.append(value) # store the value


/'ProductValue'\s*,\s*(.*?)\s\)/
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜