How to count occurrences of a word in all the files of a directory?

2023-03-08 17:10 问答作者：

I’m trying to count a particular word occurrence in a whole directory. Is this possible?

Say for example there is a directory with 100 files all of whose files may have the word “aaa” in them. How would I count the number of “aaa” in all the files u开发者_运维知识库nder that directory?

I tried something like:

 zegrep "xception" `find . -name '*auth*application*' | wc -l

But it’s not working.

grep -roh aaa . | wc -w

Grep recursively all files and directories in the current dir searching for aaa, and output only the matches, not the entire line. Then, just use wc to count how many words are there.

Another solution based on find and grep.

find . -type f -exec grep -o aaa {} \; | wc -l

Should correctly handle filenames with spaces in them.

Use grep in its simplest way. Try grep --help for more info.

To get count of a word in a particular file:

grep -c <word> <file_name>

Example:

grep -c 'aaa' abc_report.csv

Output:

To get count of a word in the whole directory:

grep -c -R <word>

Example:

grep -c -R 'aaa'

Output:

abc_report.csv:445
lmn_report.csv:129
pqr_report.csv:445
my_folder/xyz_report.csv:408

Let's use AWK!

$ function wordfrequency() { awk 'BEGIN { FS="[^a-zA-Z]+" } { for (i=1; i<=NF; i++) { word = tolower($i); words[word]++ } } END { for (w in words) printf("%3d %s\n", words[w], w) } ' | sort -rn; }
$ cat your_file.txt | wordfrequency

This lists the frequency of each word occurring in the provided file. If you want to see the occurrences of your word, you can just do this:

$ cat your_file.txt | wordfrequency | grep yourword

To find occurrences of your word across all files in a directory (non-recursively), you can do this:

$ cat * | wordfrequency | grep yourword

To find occurrences of your word across all files in a directory (and it's sub-directories), you can do this:

$ find . -type f | xargs cat | wordfrequency | grep yourword

Source: AWK-ward Ruby

find .|xargs perl -p -e 's/ /\n'|xargs grep aaa|wc -l

cat the files together and grep the output: cat $(find /usr/share/doc/ -name '*.txt') | zegrep -ic '\<exception\>'

if you want 'exceptional' to match, don't use the '\<' and '\>' around the word.

How about starting with:

cat * | sed 's/ /\n/g' | grep '^aaa$' | wc -l

as in the following transcript:

pax$ cat file1
this is a file number 1

pax$ cat file2
And this file is file number 2,
a slightly larger file

pax$ cat file[12] | sed 's/ /\n/g' | grep 'file$' | wc -l
4

The sed converts spaces to newlines (you may want to include other space characters as well such as tabs, with sed 's/[ \t]/\n/g'). The grep just gets those lines that have the desired word, then the wc counts those lines for you.

Now there may be edge cases where this script doesn't work but it should be okay for the vast majority of situations.

If you wanted a whole tree (not just a single directory level), you can use somthing like:

( find . -name '*.txt' -exec cat {} ';' ) | sed 's/ /\n/g' | grep '^aaa$' | wc -l

There's also a grep regex syntax for matching words only:

# based on Carlos Campderrós solution posted in this thread
man grep | less -p '\<'
grep -roh '\<aaa\>' . | wc -l

For a different word matching regex syntax see:

man re_format | less -p '\[\[:<:\]\]'

继续阅读：count find grep

How to count occurrences of a word in all the files of a directory?

Let's use AWK!

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Let's use AWK!

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？