regex: change html before saving in database
Before saving into database i need to
- delete all tags
- delet开发者_如何学JAVAe all more then one white space characters
- delete all more then one newlines
for it i do the following
$content = preg_replace('/<[^>]+>/', "", $content);
$content = preg_replace('/\n/', "NewLine", $content);it's for not to lose them when deleting more then one white space character
$content = preg_replace('/(\ \;){1,}/', " ", $content);
$content = preg_replace('/[\s]{2,}/', " ", $content);
and finnaly i must delete more then one "NewLine" words.
after first two points i get text in such format-
NewLineWordOfText
NewLine
NewLine
NewLine NewLine WordOfText "WordOfText WordOfText" WordOfText NewLine"WordOfText
...
how telede more then one newline from such content?
Thanks
First of all, while HTML is not regular and thus it is a bad idea to use regular expressions to parse it, PHP has a function that will remove tags for you: strip_tags
To squeeze spaces while preserving newlines:
$content = preg_replace('/[^\n\S]{2,}/', " ", $content);
$content = preg_replace('/\n{2,}/', "\n", $content);
The first line will squeeze all whitespace other than \n
([^\n\S]
means all characters that aren't \n
and not a non-whitespace character) into one space. The second will squeeze multiple newlines into a single newline.
why don't you use nl2br() and then preg_replace all <br /><br />
s with just <br />
then all <br />
s back to \n?
精彩评论