开发者

Remove Repeating and Control Characters in sed

Let's say I have a word at the beginning of a line, HHEELLLLOO for example. How can I replace repeat characters with single characters. The output should be HELLO.

Also does anyone know how to remove or specify control characters in sed, ^H for 开发者_开发知识库example.


Question 1

Yes, regex can handle that. In sed:

$ echo HHEELLLLOO | sed 's/\(.\)\1/\1/g'
HELLO

This will do it.

Question 2

It may vary depending on your system. Here (BSD) you can type ctrl-v ctrl-h to insert a literal backspace character to be interpreted by sed. Give it a try.

$ cat file
H^HE^HL^HL^HO^H
$ sed 's/^H//g' file > new_file
$ cat new_file
HELLO


See "limiting repetition" from this site: http://www.regular-expressions.info/repeat.html

An actual script, as inspired by chown and that site:

sed 's/\([a-zA-Z]\)\1\+/\1/g' 

However, you won't be able to get HELLO, you would only get HELO. A regex is not sophisticated enough to determine that there should be 2 L's. For that, you would need to match the word to a dictionary. Though, you could use the regex for that ... H+E+L+O+ . . .

For the control characters, \0xx will match arbitrary ASCII characters. You'll have to look up what ^H represents.


Try this for removing duplicates: sed 's/\([a-zA-Z]\)\1\+/\1/g' but it will produce 'HELO' not 'HELLO'. See the other Answer for the reasons why this is.


$ echo BookKeeper | perl -pe 's/(.)\1+/$1/gi'
Bokeper

$ perl -le 'print "\cSome \cEvil \cControl \cMess\c?"' | perl -ple 's/\pC//g'
ome vil ontrol ess

Technically, control characters are \p{Cc}.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜