Javascript RegEx wont work, but works in c# (atomic subexpression)

2023-01-11 13:00 问答作者：

I have a regex tested in Expresso, works like a charm. But when I try to use it in javascript it gave an error. Firebug says:

invalid quantifier ?><div\b[^>]*>(?<DEPTH>)|<\/div>(?<-DEPTH>)|.?)*(?(DEPTH)(?!))<\/div>

the regex:

<div\b[^>]*&开发者_如何学Cgt;(?><div\b[^>]*>(?<DEPTH>)|</div>(?<-DEPTH>)|.?)*(?(DEPTH)(?!))</div>

The regex matches nested html-divs such as:

<div id="foo"><div>blubb</div><div foobar>blubb</div></div>

Is the javascript regex only a subset?

edit: I have to strip the div's, including the text between them, away.

<div id="foo"><div>blubb</div><div foobar>blubb</div></div>some
non html...

only the "some non html..." should stay. So I think I can't use any htmlparser?

Is the javascript regex only a subset?

No, they are different - there are a variety of Regular Expression engines out there, and they each have different features/quirks.

C# is has more features than JavaScript, but JS's one is not derived from C# so it isn't a subset.

Here's a couple of pages that document the differences:

http://www.regular-expressions.info/refflavors.html
http://www.regular-expressions.info/refext.html

And that whole website (regular-expressions.info) is well worth browsing to learn more about regex.

The regex matches nested html-divs

It probably doesn't, not in all cases.

And certainly it wont be possible for a single JS regex, since it doesn't support that depth stuff, amongst other things.

You're using the wrong tool for this job - parsing HTML should be done with a proper HTML parser/selector, then analysing the DOM to find the nested divs.

Anything that implements Sizzle should do (i.e jQuery, Dojo Toolkit, and others).

For example, something like jQuery('div:has(div)') or dojo.query('div:has(div)') or similar, should find nested divs (i.e. select all divs which have a div nested inside them), and will correctly cope with assorted quirks which can be complex if not impossible with a single regex.

edit: I have to strip the div's, including the text between them, away.
<div id="foo"><div>blubb</div><div foobar>blubb</div></div>some non html...
only the "some non html..." should stay. So I think I can't use any htmlparser?

No - that is even more reason to use a HTML parser, and not attempt a messy regex hack.

jQuery('#foo div').remove()

That will remove all child DIVs, and leave the HTML text node in place.

Depending on your precise requirements, the selector might need changing, but this is absolutely a task for a tool that is designed to understand HTML.

Of course, todays javascript won't support atomic group and recursive regex, but you could easily build a quick&dirty solution by piecewise recursive stripping of tags from html source. If other solutions are too complicated and the structure of the documents is predictable, you could do sth. like:

 function stripme(tag, code)
{
 var strp = code;
 var regexp = new RegExp('<'+tag+'[^>]*?>(.*)</'+tag+'>');  // <- involves backtracking 
 while( strp.match(regexp) )            // every level of nesting will lead to
    strp = strp.replace(regexp, '');    // another loop invocation with the captured
 return strp;                           // contents (.*) of the level in RegExp.$1
}                                       // (if needed)

This will work with, for example:

 var html ='<div id="foo"><div>blubb</div><div foobar>blubb</div></div>some non html...';

when invoked with, eg.:

 window.onload = function() { var stripped=stripme('div', html); alert(stripped); }

BTW, if possible, always use a DOM parser or Javascript library as recommended by Peter Boughton

Regards

rbo

继续阅读：javascript regex

Javascript RegEx wont work, but works in c# (atomic subexpression)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？