开发者

Regular expression to remove div styles/classes in PHP

I want to parse out HTML from a string selectively. I have used strip_tags to allow div's, but I don't want to keep the div styles/classes from the 开发者_如何转开发string. That is, I want:

<div class="something">text</div>
<div style="something">text</div>

to simply become:

<div>text</div>

in both cases.

Can anyone help? Thanks!


replace the following regex with nothing:

(?<=<div.*?)(?<!=\t*?"?\t*?)(class|style)=".*?"


Here is an example:

preg_replace('`<div (style="[^"]*"|class="[^"]*")>([^<]*)</div>`i', "<div>$1</div>", $str);

Basically, this matches the content of a div with a style or a class attribute. Then, you remove everything to keep only <div>content</div>.

It's longer than J V's version, but it won't replace something like <div style="blablabla" color="blablabla">content</div>, for instance. May or may not be what you want.


As an option to regexp (which always freaks me out), I'd suggest so use xml_parse_into_struct.

See at php.net and it's first example.


I found out it's very difficult to build a single regex that, in a single pass, remove simultaneously class and style attributes inside a tag. That's because we don't know where this attributes will appear, together with other attributes inside the tag (supposing that we want to preserve the other ones). However, we can achieve that, splitting this task in two simpler search and replace operations: one for the class attribute and another for the style attribute.

To capture the first part of a div containing a class attribute, with one or more values enclosed in double quotes, the regex is as follows:

(<div\s+)([^>]*)(class\s*=\s*\"[^\">]*\")(\s|/|>)

The same code modified for single quotes:

(<div\s+)([^>]*)(class\s*=\s*\'[^\'>]*\')(\s|/|>)

Or no quotes:

(<div\s+)([^>]*)(class\s*=\s*[^\"\'=/>\s]+)(\s|/|>)

The captured string must then be replaced by the first, second and fourth capture group which, in PHP preg_replace() code, is represented by the string $1$2$4.

To eliminate a style attribute, instead a class one, just replace the substring class by the substring style in the regex. To eliminate these attributes in any tag (not only divs), replace the substring div by the substring [a-z][a-z0-9]* in the regex

Note: the regex above will not eliminate class or style attributes with syntax errors. Example: class="xxxxx (missing a quote after the value), class='xxxxx'' (excess of quotes after the value), class="xxxx"title="yyyy" (no space between attributes), and so on.

Short explanation:

<div\s+                  # beginning of the div tag, followed by one or more whitespaces
[^>]*                    # any set of attributes before the class (optional)
class\s*=\s*\"[^\">]*\"  # class attribute, with optional whitespaces
\s|/|>                   # one of these characters always follows the end of an attribute
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜