How to use sed to delete a string with wildcards
File1:
<a>hello</b> <c>开发者_如何学Gofoo</d>
<a>world</b> <c>bar</d>
Is an example of the file this would work on. How can one remove all strings which have a <c>*</d>
using sed?
The following line will remove all text from <c>
to </d>
inclusive:
sed -e 's/<c>.*<\/d>//'
The bit inside the s/...//
is a regular expression, not really a wildcard in the same way as the shell uses, so anything you can put in a regular expression you can put in there.
if all your data is like that of the example
# gawk 'BEGIN{FS=" <c>"}{print $1}' file
<a>hello</b>
<a>world</b>
Great Swiss-Army knife!
I modified it to pull header info out of eMails for an archiving script. It involved renaming the IMAP eMails with both date and sender info (otherwise IMAP just numbered 1, 2, 3, etc.). Here's the two mods:
for i in $mailarray; do date -d $(less -f $i | grep -im 1 "Date:\ " | sed -e 's_^.*\(ate: \)__') +%F_%T%Z; done
for i in $mailarray; do less -f "$i" | grep -iEm 1 "From:\ " | sed -e 's_^.*\(rom\).*<\|^.*\(rom:\).__' | sed -e 's_@.*$__'; done
They saved a great deal of extraneous coding. Thank you.
精彩评论