How to remove line breaks from a file?
How to remove:
<p> (break line!!!)
text...
</p> (break line!!!)
from a file with regex?
I tried:
开发者_JAVA百科find . -type f -exec perl -p -i -e "s/SEARCH_REGEX/REPLACEMENT/g" {} \;
This stuff can really blow up in your face so be careful; try it with test data in a test dir etc.
The -0
switch will "turn off" the default record separator ($/
) so you can do multiple lines at once. The s
lets .
match across newlines and the +?
is to make it lazy up to the "TERRANO." Try this test on one of your files.
perl -0 -p -e 's/<p>.+?TERRANO[^<]*<\/p>//gs'
If that works, you can add it to your original.
find . -type f -exec perl -0 -pi -e "s/<p>.+?TERRANO[^<]*<\/p>//gs" {} \;
As mentioned in a comment, if the content is HTML, you should probably be using an HTML parser.
Several ways to do it.
First is to undef $\
.
Then you match something like
/\<p\>\nTERRANO.*\n\<\/p\>/
which may depend upon whether or not you are using cr/lf's, or just lf's/
Second is to use a loop to concatenate the lines (plus whatever is in $\
) and match that in one regex, including matching whatever is in $\
.
Third would be to use File::Slurp.
Fourth is to use several regexes and a loop to match each line, and if all three are satisfied, do your substitution.
You may also use the Unix text editor ed to remove a range of lines with regex:
str='
BEFORE MULTILINE PATTERN 1
<p> (break line!!!)
text...
</p> (break line!!!)
AFTER MULTILINE PATTERN 1
BEFORE MULTILINE PATTERN 2
<p> (break line!!!)
text...
</p> (break line!!!)
AFTER MULTILINE PATTERN 2
'
# for in-place file editing use "ed -s file" and replace ",p" with "w"
# cf. http://wiki.bash-hackers.org/howto/edit-ed
cat <<-'EOF' | sed -e 's/^ *//' -e 's/ *$//' -e '/^ *#/d' | ed -s <(echo "$str")
H
# only remove the first match
#/<p>/,/<\/p>/d
# remove all matches
g/<p>/+0,/<\/p>/+0d
,p
q
EOF
You may want to use multi-line regexp:
s/regexp/replacement/m
See here
精彩评论