PHP: regexp and specific tags stripping
I am looking for a way to strip all anchor tags also i want everything from ',' to <br>
to be removed but <br>
should remain thr.
dirty input:
Abstractor HLTH<br>
Account Representative, Major <a href="#P">P</a><br>
Accountant <a href="#NP">NP</a>, <a href="#M">M</a>, <a href="#REA">REA</a>, <a href="#SKI">SKI</a><br>
it should be like:
Abstractor HLTH<br>
Account Representative<br>
Acco开发者_JAVA百科untant <br>
please help!
-- following is the dirty text:
$str = sprintf('
Abstractor HLTH<br>
Account Representative, Major <a href="#P">P</a><br>
Accountant <a href="#NP">NP</a>, <a href="#M">M</a>, <a href="#REA">REA</a>, <a href="#SKI">SKI</a><br>
Accountant, Cost I & II (See Cost Accountant I, II) <a href="#FR">FR</a><br>
Accountant, General <a href="#G">G</a><br>
Accountant, General I (Junior) (See General Accountant) <a href="#FR">FR</a>, <a href="#O/G">O/G</a>, <a href="#W">W</a><br>
Accountant, General II (Intermediate) (See General Accountant) <a href="#FR">FR</a>, <a href="#O/G">O/G</a>, <a href="#W">W</a>, <a href="#HA">HA</a> <br>
Accountant, General III (Senior) (See General Accountant) <a href="#FR">FR</a>, <a href="#O/G">O/G</a>, <a href="#W">W</a> <br>
');
Normally it's bad to use regex to deal with HTML strings, but assuming all your links are formed like that then using preg_replace()
shouldn't pose problems. Try this
// Removes all links
$str = preg_replace("/<a href=\"#([A-Z\\/]+?)\">\\1<\\/a>(?:, )?/i", "", $str);
// Strip the comma and everything from the comma
// to the next <br> in the line
$str = preg_replace("/,(.*?)(?=<br>)/i", "", $str);
To the other answers suggesting strip_tags()
: it won't erase text contained by a pair of HTML tags that it strips. For example
Accountant <a href="#NP">NP</a>
becomes
Accountant NP
which isn't quite what the OP wants.
I would strongly advise using HTML Purifier http://htmlpurifier.org/
It is fairly simple to set up, has an excellent reputation and extremely powerful.
strip-tags() for the tags, str_replace() with strpos() for the other thing.
HTML Purifier is your friend. It has flexible options, and is very sophisticated. Doing such things with str_replace or regular expressions is wrong.
strip_tags has a second argument which allows you to supply a string of allowable tags. It will strip all tags except the ones you supply:
$string = strip_tags($string, '<br>'); // will leave <br>-tags in place
$clean_string = strip_tags($original_string, '<br>');
This will strip everything apart from br tags.
As KingCrunch says, str_replace
and strpos
for the rest.
精彩评论