开发者

Bash, perl regexp help

I have a text file (utf8):

http://d.pr/1d6T+

Please help me with regexp. I want to replace

<p>
TERRANO...
</p>

with: empty space. :)

And:

<td width="20%" align="left" class="thead">Rám:</td>
开发者_如何学C

With:

<td width="20%" align="left" class="thead">Something else:</td>

Just word "Rám" is also OK to replace.

I found this line, but I dont know how to use it:

find . -type f -exec perl -p -i -e "s/SEARCH_REGEX/REPLACEMENT/g" {} \;


assuming you want to replace text in HTML files:

cd /path/to/my/project
find . -iname '*.html' -exec perl -p -i -e "s/Rám:/Something else:/g" {} \;
find . -iname '*.html' -exec perl -p -i -e "s/TERRANO.../Something else:/g" {} \;


If you do not mind to convert your regular .txt files into .(x)html files and have HTML tidy and xmlstarlet available, you can do without regex!

tidy -v                   # HTML Tidy for Mac OS X released on 25 March 2009
xmlstarlet --version      # 1.0.6

curl -L -o utf8file 'http://d.pr/1d6T+'

# convert HTML to XHTML with tidy
tidy -h
tidy -i -q -c -wrap 0 -numeric -asxml -utf8 --merge-divs yes --merge-spans yes utf8file > utf8file.xhtml

xmlstarlet el -a utf8file.xhtml
xmlstarlet el -v utf8file.xhtml
xmlstarlet edit --help

# edit file in-place
xmlstarlet edit -L -u "//*[local-name()='p']" -v 'EMPTY SPACE IS HERE' utf8file.xhtml 

# remove <p> ... </p> completely
xmlstarlet edit -L -d "//*[local-name()='p']" utf8file.xhtml  

xmlstarlet edit -L -u "//*[local-name()='td'][@width='20%' and @align='left' and @class='thead' and .='Rám:']" -v 'SOMETHING ELSE:' utf8file.xhtml

open -a Safari utf8file.xhtml

# convert XHTML to HTML with tidy
tidy -i -q -c -wrap 0 -numeric -ashtml -utf8 --merge-divs yes --merge-spans yes utf8file.xhtml > utf8file.html
open -a Safari utf8file.html


To extract just the table from utf8file.xhtml after the in-place editing steps you may use the "print copy of XPATH expression" feature of xmlstarlet:

xmlstarlet sel --help

# test
xmlstarlet sel -I -t -c "//*[local-name()='table'][@id='model-table-specifikacia']" utf8file.xhtml

xmlstarlet sel -I -t -c "//*[local-name()='table'][@id='model-table-specifikacia']" utf8file.xhtml > utf8file


Old topic, but useful: For mass search and replaces, I tend to use a Perl peewee (name based on the arguments used) rather than relying on find and then executing perl code.

That is, I use the following:

perl -pi -w -e 's/<p>\nTERRANO.+?\n<\/p>/<p>\n\n<\/p>/g;' ./*.html

and

perl -pi -w -e 's/<td (.+?) class=\"thead\">Rám:<\/td>/<td $1 class="thead">Something else:<\/td>/g;' ./*.html

Hope that helps somebody!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜