开发者

Regex to extract attribute from html element

Text from stream:

<option value=\"1999\">1999</option>\r\n    \r\n \r\n\r\n  \r\n\r\
    n    
<option value=\"2000\">2000</option>\r\n    \r\n \r\n\r\n  \r\n\r\n    
<option value=\"2001\">2001</option>\r\n    \r\n \r\n\r\n  \r\n\r\n    
<option value=\"2002\">2002</option>\r\n    \r\n \r\n\r\n  \r\n\r\n    
<option value=\"2003\">2003</option>\r\n    \r\n \r\n\r\n  \r\n\r\n    
<option value=\"2004\">2004</option>\r\n    \r\n \r\n\r\n  \r\n\r\n    
<option value=\"2005\">2005</option>\r\n    \r\n \r\n\r\n  \r\n\r\n    
<option value=\"2006\">2006</option>\r\n    \r\n \r\n\r\n  \r\n\r\n    
<option value=\"2007\">2007</option>\r\n    \r\n \r\n\r\n  \r\n\r\n    
<option value=\"2008\">2008</option>\r\n    \r\n \r\n\r\n  \r\n\r\n    
<o开发者_运维知识库ption value=\"2009\">2009</option>

Regex: (?si:<option value=\\\"(?<year>.*?)\\) shouldn't this be the right way to get the year?. meaning for year group, get all characters as long as you don't hit \


try

\<option\svalue\=\\\"(\d*)\\\"

This is the plain regex. change it to what language you are using it


It depends exactly what language you're using, but try <option\s+value=(\\\")?(\d+)(\\\")?>.

Here's it working in Python:

>>> re.findall("<option\s+value=(\\\")?(\d+)(\\\")?>", text)
['1999', '2000', '2001', '2002', '2003', '2004', '2005', '2006', '2007', '2008', '2009']
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜