开发者

How do I remove these tags with JavaScript

I'm still learning regex (obviously) and i can't figure it out, and i want to do it the right way rather than doing it the long way. How can I:

Find all <p> or </p> and replace with a \n except the first <p> and last </p> in which case replace with nothing, just remove, and for <br>, <br /> and <br/> replace with 开发者_StackOverflow社区\n also.

With Regex OR something else. I'm getting this from a jQuery $.get() return. So, please don't flame me about it, I just don't know how to do it.


Javascript has rather nice tools for dealing with an xml (or xhtml) DOM. Use those.


In Regex perspective, to make the first <p> become an exception, you must identify a pattern which makes the first <p> fails. For example, if text before first <p> is abcxyz, that is, abcxyz<p>, then you search every <p> which is not preceded by abcxyz, so that the first <p> doesn't match. Using regex, it becomes: (?<!abcxyz)<p>

To make the last </p> become an exception, you must identify a pattern which makes the last </p> fails. For example, if text after last </p> is abcxyz, that is, </p>abcxyz, then you search every </p> which is not followed by abcxyz, so that the last </p> doesn't match. Using regex, it becomes: </p>(?!abcxyz)

Although JavaScript support positive and negative look-ahead, unfortunately, JavaScript regex doesn't support neither positive nor negative look-behind. Indeed, there are some dirty tricks to mimic look-behind in JavaScript, however, not all look-behind construct can be mimicked.

Thus, if possible, try to identify a pattern which makes the first <p> fails, but use negative look-ahead.

To replace the first <p> and the last </p> with nothing, you can inverse the logic we use above, and you have to do this in separate step.

To replace <br>, <br />, <br/> with \n, search for: <br\s*\/?>, and replace with \n.


One way to do this would be to allow the browser to do it for you. In IE and WebKit, you could assign your HTML as the innerHTML of a <div> and get its innerText. However, that won't work in Firefox or Opera. Here's a slightly bizarre use of the Selection object that will do it:

function getInnerText(html) {
    var text = "";
    var div = document.createElement("div");
    div.innerHTML = html;

    document.body.appendChild(div);
    if (typeof window.getSelection != "undefined") {
        var sel = window.getSelection();
        sel.removeAllRanges();
        var range = document.createRange();
        range.selectNodeContents(div);
        sel.addRange(range);
        text = sel.toString();
        sel.removeAllRanges();
    } else if (document.body.createTextRange != "undefined") {
        var range = document.body.createTextRange();
        range.moveToElementText(div);
        text = range.text;
    }
    document.body.removeChild(div);
    return text.replace(/\r\n/g, "\n").replace(/\r/g, "\n");
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜