开发者

RegEx to get attributes and body of script tags

I'm using thi开发者_JAVA百科s regex to find <script> tags:

<script (.|\n)*>(.|\n)*?</script>

The problem is, it matches the ENTIRE string below, not just each tag separately:

<script src="crap2.js"></script><script src="crap2.js"></script>


You really would be better off using the DOM to process HTML for this reason and all sorts of others.


change your first * to *?

This is the non-greedy 'match all', so it will match the smallest set of characters before the next '>'.


I don't think anything else needs to be said other than RegEx match open tags except XHTML self-contained tags.


Also see this week's Coding Horror: Parsing Html The Cthulhu Way, inspired by the epic answer by @bobince that @JS Bangs links to.


I'll keep posting links to my previous answers until this question type has been wiped from this planet's surface (hopefully in 10 years or so): Don't user regular expressions for irregular languages like html or xml. Use a parser instead.


<script[\s\S]*?>[\s\S]*?</script>

This matches most common situations, but it's very important to consider JS Bangs answer.


try to exclude any '<' from the content

 <script (.|\n)*>(.|\n|[^<])*?</script>
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜