开发者

How to remove line breaks from a file?

How to remove:

<p> (break line!!!)
text...
</p> (break line!!!)

from a file with regex?

I tried:

开发者_JAVA百科find . -type f -exec perl -p -i -e "s/SEARCH_REGEX/REPLACEMENT/g" {} \;


This stuff can really blow up in your face so be careful; try it with test data in a test dir etc.

The -0 switch will "turn off" the default record separator ($/) so you can do multiple lines at once. The s lets . match across newlines and the +? is to make it lazy up to the "TERRANO." Try this test on one of your files.

perl -0 -p -e 's/<p>.+?TERRANO[^<]*<\/p>//gs'

If that works, you can add it to your original.

find . -type f -exec perl -0 -pi -e "s/<p>.+?TERRANO[^<]*<\/p>//gs" {} \;

As mentioned in a comment, if the content is HTML, you should probably be using an HTML parser.


Several ways to do it.

First is to undef $\. Then you match something like

/\<p\>\nTERRANO.*\n\<\/p\>/

which may depend upon whether or not you are using cr/lf's, or just lf's/

Second is to use a loop to concatenate the lines (plus whatever is in $\) and match that in one regex, including matching whatever is in $\.

Third would be to use File::Slurp.

Fourth is to use several regexes and a loop to match each line, and if all three are satisfied, do your substitution.


You may also use the Unix text editor ed to remove a range of lines with regex:

str='
BEFORE MULTILINE PATTERN 1
<p> (break line!!!)
text...
</p> (break line!!!)
AFTER MULTILINE PATTERN 1
BEFORE MULTILINE PATTERN 2 
<p> (break line!!!)
text...
</p> (break line!!!)
AFTER MULTILINE PATTERN 2
'

# for in-place file editing use "ed -s file" and replace ",p" with "w"
# cf. http://wiki.bash-hackers.org/howto/edit-ed

cat <<-'EOF' | sed -e 's/^ *//' -e 's/ *$//' -e '/^ *#/d' | ed -s <(echo "$str")
  H
  # only remove the first match
  #/<p>/,/<\/p>/d
  # remove all matches
  g/<p>/+0,/<\/p>/+0d
  ,p
  q
EOF


You may want to use multi-line regexp:

s/regexp/replacement/m

See here

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜