开发者

PHP: regexp and specific tags stripping

I am looking for a way to strip all anchor tags also i want everything from ',' to <br> to be removed but <br> should remain thr.

dirty input:

Abstractor HLTH<br>
Account Representative, Major <a href="#P">P</a><br>
Accountant <a href="#NP">NP</a>, <a href="#M">M</a>, <a href="#REA">REA</a>, <a href="#SKI">SKI</a><br>

it should be like:

Abstractor HLTH<br>
Account Representative<br>
Acco开发者_JAVA百科untant <br>

please help!

-- following is the dirty text:

$str = sprintf('

Abstractor HLTH<br>
Account Representative, Major <a href="#P">P</a><br>

Accountant <a href="#NP">NP</a>, <a href="#M">M</a>, <a href="#REA">REA</a>, <a href="#SKI">SKI</a><br>
Accountant, Cost I & II (See Cost Accountant I, II) <a href="#FR">FR</a><br>
Accountant, General <a href="#G">G</a><br>
Accountant, General I (Junior) (See General Accountant) <a href="#FR">FR</a>, <a href="#O/G">O/G</a>, <a href="#W">W</a><br>

Accountant, General II (Intermediate) (See General Accountant) <a href="#FR">FR</a>, <a href="#O/G">O/G</a>, <a href="#W">W</a>, <a href="#HA">HA</a> <br>
Accountant, General III (Senior) (See General Accountant) <a href="#FR">FR</a>, <a href="#O/G">O/G</a>, <a href="#W">W</a> <br>

');


Normally it's bad to use regex to deal with HTML strings, but assuming all your links are formed like that then using preg_replace() shouldn't pose problems. Try this

// Removes all links
$str = preg_replace("/<a href=\"#([A-Z\\/]+?)\">\\1<\\/a>(?:, )?/i", "", $str);

// Strip the comma and everything from the comma
// to the next <br> in the line
$str = preg_replace("/,(.*?)(?=<br>)/i", "", $str);

To the other answers suggesting strip_tags(): it won't erase text contained by a pair of HTML tags that it strips. For example

Accountant <a href="#NP">NP</a>

becomes

Accountant NP

which isn't quite what the OP wants.


I would strongly advise using HTML Purifier http://htmlpurifier.org/

It is fairly simple to set up, has an excellent reputation and extremely powerful.


strip-tags() for the tags, str_replace() with strpos() for the other thing.


HTML Purifier is your friend. It has flexible options, and is very sophisticated. Doing such things with str_replace or regular expressions is wrong.


strip_tags has a second argument which allows you to supply a string of allowable tags. It will strip all tags except the ones you supply:

$string = strip_tags($string, '<br>'); // will leave <br>-tags in place


$clean_string = strip_tags($original_string, '<br>');

This will strip everything apart from br tags.

As KingCrunch says, str_replace and strpos for the rest.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜