Delete not important words
I have: an File with words: importantwords.txt (multiple lines, space separated, containing words) Example:
ALMOST
APPARENTLY
COULD
DEPEND
.
.
.
and I have text files: 01news.txt,..., 10news.txt (News as a text). Example:
During the short period of time between acquisition and allocation, the executive directors of the Company are deemed to be interested in those shares. The Company announces that 开发者_运维知识库the following transactions took place in relation to the SIP on Tuesday.
Now, I want to delete from 01news.txt, ... 10news.txt all the words which are not in importantwords.txt
How could I do that? I tried it with sed, but I am newbie. Can you help please?
for file in *news.txt
do
awk 'FNR==NR{for(i=1;i<=NF;i++) impt[$i];next }
{
for(j=1;j<=NF;j++) {
if ( toupper($j) in impt) {
printf "%s ", $j
}
}
print ""
} ' importantwords.txt $file > tmp && mv tmp $file
done
精彩评论