How do I loop over several files, keeping the base name for further processing?
I have multiple text files that need to be tokenised, POS and NER. I am using C&C taggers and have run their tutorial, but I am wondering if there is a way to tag multiple files rather than one by one.
At the moment I am tokenising the files:
bin/tokkie --input working/tutorial/example.txt--quotes delete --output working/tutorial/example.tok
as follows and then Part of Speech tagging:
bin/pos --input working/tutorial/example.tok --model models/pos --output working/tutorial/example.pos
and lastly Named Entity Recognition:
bin/ner --input working/tutorial/example.pos --model models/ner --output working/tutorial/example.ner
I am not sure how I would go about creating a loop to do this and keep the file name the same as the input but with the extension representing the tagging it has. I was thinking of a bash script or perhaps Perl to open the directory but I am not sure on how to enter the C&C commands in order for the script to understand.
At the moment I am doing it manually and it's 开发者_如何学编程pretty time consuming to say the least!
Untested, likely needs some directory mangling.
use autodie qw(:all);
use File::Basename qw(basename);
for my $text_file (glob 'working/tutorial/*.txt') {
my $base_name = basename($text_file, '.txt');
system 'bin/tokkie',
'--input' => "working/tutorial/$base_name.txt",
'--quotes' => 'delete',
'--output' => "working/tutorial/$base_name.tok";
system 'bin/pos',
'--input' => "working/tutorial/$base_name.tok",
'--model' => 'models/pos',
'--output' => "working/tutorial/$base_name.pos";
system 'bin/ner',
'--input' => "working/tutorial/$base_name.pos",
'--model' => 'models/ner',
'--output' => "working/tutorial/$base_name.ner";
}
In Bash:
#!/bin/bash
dir='working/tutorial'
for file in "$dir"/*.txt
do
noext=${file/%.txt}
bin/tokkie --input "$file" --quotes delete --output "$noext.tok"
bin/pos --input "$noext.tok" --model models/pos --output "$noext.pos"
bin/ner --input "$noext.pos" --model models/ner --output "$noext.ner"
done
精彩评论