php regex find string between line start and empty line without lines that starts with any html tags
hello i have to get any lines without html tags into this format
<p>lorem ipsum</p>
e.g.
hello world
<h2>lol</h2>
lorem ipsum
dolor sit
amet
consetetur
should parsed to
<p>hello world</p>
<h2>lol</h2>
<p>lorem ipsum
dolor sit
amet</p>
<p>consetetur</p>
i tried this with the php function preg_replace();
does someone can help?
P.S. I'll trie to get this syntax into html
# header 开发者_运维问答1 // <h1>header 1</h1>
## header 2 // <h2>header 2</h2>
and all lines without header should parse into
... my headers will be parsed but the paragraphs notThis is a bit verbose, but it should be solid. It uses DOMDocument
rather than regex:
$dom = new DOMDocument;
$dom->loadXML('<root>' . $yourContent .'</root>');
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('/root/text()');
function wrapnode ($node) {
global $dom;
$p = $dom->createElement('p');
$node->parentNode->replaceChild($p, $node);
$p->appendChild($node);
}
foreach ($nodes as $node) {
if ($node->nodeType === XML_TEXT_NODE) {
$node->nodeValue = trim($node->nodeValue);
while ($location = strpos($node->nodeValue, "\n\n")) {
$newnode = $node->splitText($location);
wrapnode($node);
$node = $newnode;
$node->nodeValue = trim($node->nodeValue);
}
wrapnode($node);
}
}
echo $dom->saveXML();
This works in java:
input.replaceAll("(?<=\\n\\n)(?=\\w)", "<p>").replaceAll("(?<=\\w)(?=\\n\\n)", "</p>");
However it's a bit brittle: It does two replacements that might not be connected.
As far as valid HTML 2.0 is concerned, <p>
does not need to be a pair. So to create HTML of the input HTML with additional paragraphs per a double line break, it's very simple:
$html = str_replace("\n\n", '<p>', $html);
Keep in mind that this solution is very specific to the input and the output, so it might solve part of the scenario in your question only. However I could not get enough information from your question to give a better answer.
As far as HTML 4.0.1 is concerned, this can be created with ease out of it:
$html = str_replace("\n\n", "<p>", $yourContent);
$dom = new DOMDocument;
$dom->loadHTML($html);
echo $dom->saveHtml();
DomDocument can convert the HTML 2 into HTML 4.0.1 and will add all needed HTML elements like doctype, html and body. only the head and title is missing.
精彩评论