开发者

Replace mixed escape sequences, control characters and literals with Sed?

I have a weird mix of literal, visible and escaped control characters in a data dump that I need to clean (preferably with sed), f开发者_如何学编程or example ^A, ^B, \N (literally), and visible newlines. I need to clean the file such that the visible newlines remain intact, replace every ^A with a tab character, and strip every ^B\N^B\N (which follows every unix time value in the data e.g. 13068505731812510).

This is what the contents look like using less in a shell command (in the shell, the ^A and ^B characters have a dark background to denote control chars):

^A guid ^A unix-time ^B\N^B\N^A 4 ^A 192.168.21.136 ^A 7.0 ^A IE ^A 8 ^A guid ^A WinNT ^A ... (visible newline)

Or a literal example...

... ^A40C4595C-0B9D-46B7-8214-3D9CE2B5F057^A13071154505579551^B\N^B\N^A4^A192.168.21.136^A7.0^AIE^A8^AE6979203-F58B-4D20-9D66-7F5369BF9E32^AWinXP^A ...

So far the escape sequences I've been feeding sed have not been producing the expected output. Does anyone know the magic escapes needed to make all this happen in as few passes as possible? (There are many gigs of files, and time counts.) Thanks! Bonus points if I can convert the unix time digits into human-readable times in the same pass.


Change ^A to tabs:

sed 's/\^A/'"$(printf '\011')"'/g'

Strip our ^B^N:

sed -e 's/\^B\\N\^B\\N//g'
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜