xargs on retaining filename for batch html to text conversion
I'm converting some html files to text using html2text and want to retain the name of the file name charliesheenwinning.html as charliesheenwinning.txt or even charliesheenwinning.html.txt .
find ./ -not -regex ".*\(png\|jpg\|gif\)$" -print0 | xargs -0 -L10 {} max-process=0 html2text {} -o ../potistotallywinning/{}.txt
Of course the last part -o is so wrong. How do I retain reusing the filename beyond the first argument to html2text? Can use a for in -exec, but how can I do it with xargs?
update
Ended up doing
find path/to/dir -type f -not -regex ".*\(gif\|png\|jpg\|jpeg\|mov\|pdf\|txt\)$" -print0 | xargs -0 -L10 --max-procs=0 -I {} html2text -o {}.txt {}
mkdir dir/w/textfiles
cp -r path/to/dir dir/w/textfiles
find dir/w/textfiles -type f -not -regex ".*txt$" -print0 | xargs -0 -L10 --max-procs=0 -I {} rm {}
Not the best .. but whatever.. [just in case you were wondering why it isn't just a simpl开发者_JAVA技巧e -name '*html' in the find argument, this was a wget of a mediawiki .. ]
You should try to use basename:
$ man basename
I was facing the same problem – for the record, here's what I came up with to get substition into xargs:
seq 100 | xargs -I % -n 1 -P 16 bash -c 'echo % `sed "s/1/X/" <<< %`'
It will print something like this:
10 X0
3 3
12 X2
4 4
11 X1
1 X
15 X5
精彩评论