开发者

Regular Expression to extract a HTML-Tag without certain parameter (to fix a WordPress-Plugin)

I need the help of some RegEx-Experts to fix a bug in a WordPress-Plugin, which is no longer maintained by the author.

Inside the plugin there is the following php-sytax to find included scripts:

'/(\\s*)(<script\\b[^>]*?>)([\\s\\S]*?)<\\/script>(\\s*)/i'

开发者_开发技巧This line filters scripts no matter for what media they are written. To fix an bug this line must be changed, so that script tags with the parameter media="print" are not extracted.

How must this line be chanced that script tags with parameter media="print" are not affected?

See here for the topic in the WordPress-Support-Forum.


preg are not meant to match HTML tags. You'll never know where and how attributes will be defined :

<script media="print">
<script media=print>
<script type="text/javascript" media="print">
<script media="print" type="text/javascript">

Basically, you cannot handle that in a good way with pregs. I'd suggest you to extract the html you want to clean into some DOM (or even SimpleXML) object and get all script tags where attributes are "print" with an xpath function

//script[media="print"]


A pretty simple approach would be:

'#<script\b(?:\s+(?!media="print")[^\s>]+)*\s*>(.*?)</script>#i'

It uses a (?!..) negative assertion to look at each string part after a space. This will not exactly match HTML attributes, but is sufficient to detect the single case. You might need to add alternatives though (media=print or media='print') because preg_match is looking for raw strings, not interpreting HTML-equivalent expressions. (Using DOM however would certainly be overkill for this task.)


to remove tags use strip_tag according to your need

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜