How do I Extract a Javascript Value using Regular Expressions?
I'm trying the extract the ProductValue
from the following bit of Javascript:
<script language="javascript" type="text/javascript">
lpAddVars('page','Section','womens');
lpAddVars('page','CartTotal','0.00');
lpAddVars('page','ProductID','43577');
lpAddVars('page','ProductValue','128.00');
</script>
I don't think Beautiful Soup parses javascript so I think the best way to do this may be to use a regular expression, but I'm very new to re and so far nothing I've tried seems to work. Any advice or help on how to accomplish开发者_开发百科 this?
Thanks!
This should work:
import re
javascript_text = '''
<script language="javascript" type="text/javascript">
lpAddVars('page','Section','womens');
lpAddVars('page','CartTotal','0.00');
lpAddVars('page','ProductID','43577');
lpAddVars('page','ProductValue','128.00');
</script>
'''
product_value = re.findall(r"ProductValue.*,['|\"](.*)['|\"]", javascript_text)
# at this point, product_value = ['128.00']
So what is "ProductValue.*,'|\"['|\"]" even doing?
"ProductValue.*,'|\"['|\"]"
ProductValue -- just a literal string that you're searching for
.* -- we want any amount of characters, so spaces, single quotes, whatever
, -- we'll stop allowing ".*" to match on all characters once we reach the ","
['|\"] -- we want to match either a single quote or a double quote
(.*) -- this is the bit we're actually interested in, which can be any characters
['|\"] -- again, we'll stop the ".*" once we reach a closing single or double quote
From this point on, I would do something like:
product_values = []
for value in product_value:
value = value.strip() # get rid of any excess whitespace
value = float(value) # ProductValue appears to be a float of some sort
product_values.append(value) # store the value
/'ProductValue'\s*,\s*(.*?)\s\)/
精彩评论