开发者

regex: change html before saving in database

Before saving into database i need to


  1. delete all tags
  2. delet开发者_如何学JAVAe all more then one white space characters
  3. delete all more then one newlines

for it i do the following

  1. $content = preg_replace('/<[^>]+>/', "", $content);
  2. $content = preg_replace('/\n/', "NewLine", $content);it's for not to lose them when deleting more then one white space character

    $content = preg_replace('/(\&nbsp\;){1,}/', " ", $content);

    $content = preg_replace('/[\s]{2,}/', " ", $content);

  3. and finnaly i must delete more then one "NewLine" words.

after first two points i get text in such format-

NewLineWordOfText
NewLine
NewLine
NewLine NewLine WordOfText &quot;WordOfText WordOfText&quot; WordOfText NewLine&quot;WordOfText
...

how telede more then one newline from such content?

Thanks


First of all, while HTML is not regular and thus it is a bad idea to use regular expressions to parse it, PHP has a function that will remove tags for you: strip_tags

To squeeze spaces while preserving newlines:

$content = preg_replace('/[^\n\S]{2,}/', " ", $content);
$content = preg_replace('/\n{2,}/', "\n", $content);

The first line will squeeze all whitespace other than \n ([^\n\S] means all characters that aren't \n and not a non-whitespace character) into one space. The second will squeeze multiple newlines into a single newline.


why don't you use nl2br() and then preg_replace all <br /><br />s with just <br /> then all <br />s back to \n?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜