format output from Unix "script" command: remove backspaces, linefeeds and deleted chars?

2023-03-30 12:46 问答作者：

I'm trying to use the script command to record an interactive shell session so that I can use it to prepare documentation.

according to the man page:

Script places everything in the log file, including linefeeds and
backspaces. This is not what the naive user expects.

I am the naive user (don't usually get a shout out in man pages, this is rather exciting!), and I'd like to process the output so that backspaces, linefeeds and deleted characters and so on are removed.

example, I run a script session:

stew:~> script -f scriptsession.log
Script started, file is scriptsession.log
stew:~> date
Mon Aug 22 15:00:37 EDT 2011
stew:~> #extra chars: that
stew:~> exit
exit
Script done, file is scriptsession.log

then I use cat to read the session log:

stew:~> cat scriptsession.log
Script started on Mon 22 Aug 2011 03:00:35 PM EDT
stew:~> date
Mon Aug 22 15:00:37 EDT 2011
stew:~> #extra chars: that
stew:~> exit
exit

Script done on Mon 22 Aug 2011 03:01:01 PM EDT

but when I use less, I see evidence of the unwanted characters that are invisible using cat:

stew:~> less scriptsession.log
Script started on Mon 22 Aug 2011 03:00:35 PM EDT
stew:~> date
Mon Aug 22 15:00:37 EDT 2011
stew:~> #extra chars: thiESC[ESC[ESC[ESC[Kthat
stew:~> exit
exit

Script done on Mon 22 Aug 2011 03:01:01 PM EDT
scriptsession.log lines 1-8/8 (END)

when I use cat, I understand that it doesn't remove the invisible chars, it just doesn't represent开发者_开发问答 them visibly, like less does--so if I pipe the cat output to a file, it still has the unwanted characters.

the output format I'd like is a copy of what cat displays. thanks!

(apologies if this is a duplicate, searching "unix script output format" returns lots of noise results with respect to the question at hand!)

The col command will do some, but not all, of the filtering you're looking for. (It doesn't seem to recognize the control sequences for bold and underlining, for example.)

An approach I've used in the past is to (a) change my shell prompt so it doesn't do any highlighting (it normally does), and/or (b) set $TERM to "dumb" so various commands won't try to use certain control sequences.

I solved the problem by running scriptreplay in a screen and the dumping the scrollback buffer to a file.

The following expect script does this for you.

It has been tested for logfiles with up to 250.000 lines. In the working directory you need your scriptlog, a file called "time" with 10.000.000 times the line "1 10" in it, and the script. I needs the name of your scriptfile as command line argument, like ./name_of_script name_of_scriptlog.

#!/usr/bin/expect -f 

set logfile [lindex $argv 0]

if {$logfile == ""} {puts "Usage: ./script_to_readable.exp \$logfile."; exit}

set timestamp [clock format [clock sec] -format %Y-%m-%d,%H:%M:%S]
set pwd [exec pwd]
if {! [file exists ${pwd}/time]} {puts "ERROR: time file not found.\nYou need a file named time with 10.000.000 times the line \"1 10\" in the working directory for this script to work. Please provide it."; exit}
set wc [exec cat ${pwd}/$logfile | wc -l]
set height [ expr "$wc" + "100" ]
system cp $logfile ${logfile}.tmp
system echo $timestamp >> ${logfile}.tmp
set timeout -1
spawn screen -h $height -S $timestamp 
send "scriptreplay -t time -s ${logfile}.tmp 100000 2>/dev/null\r"
expect ${timestamp} 
send "\x01:hardcopy -h readablelog.${timestamp}\r"

send "exit\r"

system sed '/^$/d' readablelog.$timestamp >> readablelog2.$timestamp
system head -n-2 readablelog2.$timestamp >> ${logfile}.readable.$timestamp
system rm -f readablelog.$timestamp readablelog2.$timestamp ${logfile}.tmp

The time file can be generated by

for i in $(seq 1 10000000); do echo "1 10" >> time; done

As mentioned by Keith, col does part of the job (the control characters).

You can further use ansifilter to remove any ANSI escape sequences that you don't want: http://www.andre-simon.de/zip/download.html#ansifilter

Or you can use the "more" command, which will interpret those characters and display exactly what you typed, received as output, etc, as if you scrolled back in your buffer.

# awk script
{
    gsub(/\033\[[CK]/, "")
    while (sub(/.\b/, "")) ;
    print
}

The script removes interleaving 'ESC [ C' and 'ESC [ K' substrings. Then replaces 'c BS' substrings to nothig, where c stands for any character.

继续阅读：bash format

format output from Unix "script" command: remove backspaces, linefeeds and deleted chars?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？