开发者

How to use sed to delete a string with wildcards

File1:

<a>hello</b> <c>开发者_如何学Gofoo</d>
<a>world</b> <c>bar</d>

Is an example of the file this would work on. How can one remove all strings which have a <c>*</d> using sed?


The following line will remove all text from <c> to </d> inclusive:

sed -e 's/<c>.*<\/d>//'

The bit inside the s/...// is a regular expression, not really a wildcard in the same way as the shell uses, so anything you can put in a regular expression you can put in there.


if all your data is like that of the example

# gawk 'BEGIN{FS=" <c>"}{print $1}' file
<a>hello</b>
<a>world</b>


Great Swiss-Army knife!

I modified it to pull header info out of eMails for an archiving script. It involved renaming the IMAP eMails with both date and sender info (otherwise IMAP just numbered 1, 2, 3, etc.). Here's the two mods:

for i in $mailarray; do date -d $(less -f $i | grep -im 1 "Date:\ " | sed -e 's_^.*\(ate: \)__') +%F_%T%Z; done

for i in $mailarray; do less -f "$i" | grep -iEm 1 "From:\ " | sed -e 's_^.*\(rom\).*<\|^.*\(rom:\).__' | sed -e 's_@.*$__'; done

They saved a great deal of extraneous coding. Thank you.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜