开发者

xargs on retaining filename for batch html to text conversion

I'm converting some html files to text using html2text and want to retain the name of the file name charliesheenwinning.html as charliesheenwinning.txt or even charliesheenwinning.html.txt .

find ./ -not -regex ".*\(png\|jpg\|gif\)$" -print0 | xargs -0 -L10 {} max-process=0 html2text {} -o ../potistotallywinning/{}.txt

Of course the last part -o is so wrong. How do I retain reusing the filename beyond the first argument to html2text? Can use a for in -exec, but how can I do it with xargs?

update

Ended up doing

find path/to/dir -type f -not -regex ".*\(gif\|png\|jpg\|jpeg\|mov\|pdf\|txt\)$" -print0 | xargs -0 -L10 --max-procs=0 -I {} html2text -o {}.txt {}
mkdir dir/w/textfiles
cp -r path/to/dir dir/w/textfiles
find dir/w/textfiles -type f -not -regex ".*txt$" -print0 | xargs -0 -L10 --max-procs=0 -I {} rm {}

Not the best .. but whatever.. [just in case you were wondering why it isn't just a simpl开发者_JAVA技巧e -name '*html' in the find argument, this was a wget of a mediawiki .. ]


You should try to use basename:

$ man basename


I was facing the same problem – for the record, here's what I came up with to get substition into xargs:

seq 100 | xargs -I % -n 1 -P 16 bash -c 'echo % `sed "s/1/X/" <<< %`'

It will print something like this:

10 X0
3 3
12 X2
4 4
11 X1
1 X
15 X5
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜