开发者

Adding trec format tags to thousands of file

i need to add tags like text of file in thousand of files in a directory and i tried it using cat and outputing it to a stream of file using

for file in *
do
cat ../gau > temp;  //gau contain format i need to append in each file
echo $file >>temp;
cat ../gau_ >>temp ;//contains </DOCID>
cat $file >>temp;  
cat ../gau1  >> temp;  //this contain last sentence </DOC>
cat temp > $file
done

but doing this is very slow can please tell me a better and efficient way to do this .os ot possible to do using c .how can we open files in batches and then process them and put back as it can fasten this process since opening and writi开发者_如何转开发ng file is bottle neck i suppose.

Is there and premade program(which is efficient and fast) to do this job as we are scarcity in time.


THis is a quick python code, try it, it would execute faster than your batch script:

import os

for dirname, dirnames, filenames in os.walk('/MY_DIRECTORY/'):
    for filename in filenames:
        with open(os.path.join(dirname, filename), "r+") as f:
             str = f.read() # read everything in the file
             f.seek(0) # rewind
             f.write("Prepended text tags" + str) # write the new line before
             f.close()

I haven't tried it though.


Don't cat temp > $file, just mv temp $file -- you don't need to rewrite the file, just rename it. That's certainly one of the causes of bad performance

for file in *; do
  { cat ../gau; echo $file; cat ../gau_ $file ../gau1; } > temp
  mv temp $file
done

You might want to choose more desctiptive filenames than "gau", "gau_" and "gau1".

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜