Sed removing duplicate characters and certain characters in beginning/end of string
I am asking for your help with sed. I need to remove duplicate underscores and underscores from beginning and end of string.
For example:
echo '[Lorem] ~ ipsum *dolor* sit metus !!!' | sed 's/[^ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789._()-]/_/g'
Produces:
_Lorem____ipsum__dolor__sit_metus____
But I need to further format this string to: Lorem_ipsum_dolor_sit_metus
In other words, remove any underscores from beginning and end of string, and reduce multipl开发者_高级运维e consecutive underscore symbols into just one, preferably using another pipes.
Do you have any idea how to do that?
Thank you.
Just add ;s/__*/_/g;s/^_//;s/_$//
just after g
in your sed command.
All you need to do is add a "+" after your bracket expression to eliminate runs of multiple underscores. Then you can delete the beginning and ending ones. Also, as ladenedge suggested, you can use a character class to shorten your list.
sed 's/[^[:alnum:].()-]\+/_/g;s/^_\(.*\)_$/\1/'
精彩评论