开发者

regular expression help

text:

<span id="p_code_">WHATIWANT</span>

code:

objRegExp.IgnoreCase = True
objRegExp.Global = True
objRegExp.Pattern = "\<(span\s+id=""(p_code_.*)[^\>]+)</s开发者_Go百科pan>"

trying to extract string WHATIWANT


Don't parse (x)html with regex! That's what the DOM is for.

http://www.uv.tietgen.dk/staff/mlha/pc/web/script/vbscript/object/index.htm


I think what you're looking for is the following:

objRegExp.Pattern = "\<span id=\"p_code_\"\>(.*?)\<\/span\>"

It's sometimes helpful to use something to test against your regex/string. I mostly just use TextMate's find function for this purpose, but here's a great web resource: http://rubular.com/

EDIT: based on the comment below, it looks like you need something more like:

objRegExp.Pattern = "\<span id=\"p_code_d\d{3,}a\d{3,}\"\>(.*?)\<\/span\>"

to capture the "d567a356" part of the span's id. This assumed that the id will always end with something of the form: d(followed by three or more numbers)a(followed by three or more numbers).

EDIT 2:

Actually, this is more general:

objRegExp.Pattern = "\<span id=\"p_code_.+?\b\"\>(.*?)\<\/span\>"

This will match both of the following:

<span id="p_code_d567a356" class="blaf">WHATIWANT</span>

and

<span id="p_code_d567a3dsfasfdsaf56">WHATIWANT</span>
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜