开发者

php regex find string between line start and empty line without lines that starts with any html tags

hello i have to get any lines without html tags into this format

<p>lorem ipsum</p>

e.g.

hello world

<h2>lol</h2>

lorem ipsum
dolor sit
amet

consetetur

should parsed to

<p>hello world</p>

<h2>lol</h2>

<p>lorem ipsum
dolor sit
amet</p>

<p>consetetur</p>

i tried this with the php function preg_replace();

does someone can help?

P.S. I'll trie to get this syntax into html

# header 开发者_运维问答1 // <h1>header 1</h1>
## header 2 // <h2>header 2</h2>

and all lines without header should parse into

... my headers will be parsed but the paragraphs not


This is a bit verbose, but it should be solid. It uses DOMDocument rather than regex:

$dom = new DOMDocument;
$dom->loadXML('<root>' . $yourContent .'</root>');
$xpath = new DOMXPath($dom);

$nodes = $xpath->query('/root/text()');

function wrapnode ($node) {
    global $dom;

    $p = $dom->createElement('p');
    $node->parentNode->replaceChild($p, $node);
    $p->appendChild($node);
}

foreach ($nodes as $node) {
    if ($node->nodeType === XML_TEXT_NODE) {
        $node->nodeValue = trim($node->nodeValue);

        while ($location = strpos($node->nodeValue, "\n\n")) {
            $newnode = $node->splitText($location);
            wrapnode($node);

            $node = $newnode;
            $node->nodeValue = trim($node->nodeValue);
        }

        wrapnode($node);
    }
}

echo $dom->saveXML();


This works in java:

input.replaceAll("(?<=\\n\\n)(?=\\w)", "<p>").replaceAll("(?<=\\w)(?=\\n\\n)", "</p>");

However it's a bit brittle: It does two replacements that might not be connected.


As far as valid HTML 2.0 is concerned, <p> does not need to be a pair. So to create HTML of the input HTML with additional paragraphs per a double line break, it's very simple:

$html = str_replace("\n\n", '<p>', $html);

Keep in mind that this solution is very specific to the input and the output, so it might solve part of the scenario in your question only. However I could not get enough information from your question to give a better answer.

As far as HTML 4.0.1 is concerned, this can be created with ease out of it:

$html = str_replace("\n\n", "<p>", $yourContent);
$dom = new DOMDocument;
$dom->loadHTML($html);
echo $dom->saveHtml();

DomDocument can convert the HTML 2 into HTML 4.0.1 and will add all needed HTML elements like doctype, html and body. only the head and title is missing.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜