Regular expression to remove endline space patterns

2023-04-10 03:00 问答作者：

I have a website updater that converts each p element to a textarea, the user types in the content then each textarea is converted back to p & I grab the resulting HTML & store that in my SQL database.

My Problem: In Internet Explorer, when I go to grab the HTML back it has slightly changed the html. For example:

// From this originally
<img id="headingpic"/><div id="myContent">  

// To this
<img id="headingpic"/>
<div id="myContent">

This matters because now on display there is a vertical gap between the img & the div below.

Sometimes IE inserts an "\n ", sometimes its an " \n" sometimes its just an "\n". I am trying to come up with a regular expression to remove these endlines(& spacing) no matter their pattern. I have ALOT of difficulty开发者_运维百科 coming up regular expressions, they seem so cryptic to me.

If I explain my algorithm can you suggest the "character" that acheives this in regular expressions?

For every ">" character: IGNORING ANY WHITEPACE OR ENDLINE CHARACTERS if the next character is an "<" then proceed
For every char behind "<" if it is not == ">" delete it(or replace it with "")

I am trying to do this in either javascript or python:

# Python: should I use replace for this? Would my regular expression look something like this?
HTML_CONTENT.replace( "^[ \t\n\r]" ) # this removes all whitespace as far as I know

I would go about this a different way:

firstly spilt by line.

html_content_list = HTML_CONTENT.split("\n"); // Split by line;

then remove all whitespace on the end with a .trim() (assuming we are talking about strings and one line each, test for null first)

for(var i in html_content_list)
{
    html_content_list[i] = html_content_list[i].trim();
}

then if it really does need a new line add it at the end:

html_content_list.join("\n");

Your regex needs a few more characters, or the \s:

HTML_CONTENT.replace( "^[ \t\n\r\f\v]" )

HTML_CONTENT.replace( "^[\s]" )

\v Matches a vertical tab \u000B.

\f Matches a form feed \u000C.

I misunderstood the question at first, but here is how you would do it it python:

import re
HTML_CONTENT = """\
<img id="headingpic"/> abcdef
qwerty..??,ksjhe173((:$
<div id="myContent">
"""

print re.sub(">[^<]*<", "><", HTML_CONTENT)

Outputs:

<img id="headingpic"/><div id="myContent">

Or, if you just want to remove white space and newlines:

import re
HTML_CONTENT = """\
<img id="headingpic"/>

<div id="myContent">
"""

print re.sub(">[\s]*<", "><", HTML_CONTENT)

Outputs:

<img id="headingpic"/><div id="myContent">

继续阅读：javascript python regex

Regular expression to remove endline space patterns

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？