Bash script to find the frequency of every letter in a file

2023-01-21 11:59 问答作者：

I am trying to find out the frequency of appearance of every letter in the english alphabet in an input开发者_开发问答 file. How can I do this in a bash script?

My solution using grep, sort and uniq.

grep -o . file | sort | uniq -c

Ignore case:

grep -o . file | sort -f | uniq -ic

Just one awk command

awk -vFS="" '{for(i=1;i<=NF;i++)w[$i]++}END{for(i in w) print i,w[i]}' file

if you want case insensitive, add tolower()

awk -vFS="" '{for(i=1;i<=NF;i++)w[tolower($i)]++}END{for(i in w) print i,w[i]}' file

and if you want only characters,

awk -vFS="" '{for(i=1;i<=NF;i++){ if($i~/[a-zA-Z]/) { w[tolower($i)]++} } }END{for(i in w) print i,w[i]}' file

and if you want only digits, change /[a-zA-Z]/ to /[0-9]/

if you do not want to show unicode, do export LC_ALL=C

A solution with sed, sort and uniq:

sed 's/\(.\)/\1\n/g' file | sort | uniq -c

This counts all characters, not only letters. You can filter out with:

sed 's/\(.\)/\1\n/g' file | grep '[A-Za-z]' | sort | uniq -c

If you want to consider uppercase and lowercase as same, just add a translation:

sed 's/\(.\)/\1\n/g' file | tr '[:upper:]' '[:lower:]' | grep '[a-z]' | sort | uniq -c

Here is a suggestion:

while read -n 1 c
do
    echo "$c"
done < "$INPUT_FILE" | grep '[[:alpha:]]' | sort | uniq -c | sort -nr

Similar to mouviciel's answer above, but more generic for Bourne and Korn shells used on BSD systems, when you don't have GNU sed, which supports \n in a replacement, you can backslash escape a newline:

sed -e's/./&\
/g' file | sort | uniq -c | sort -nr

or to avoid the visual split on the screen, insert a literal newline by type CTRL+V CTRL+J

sed -e's/./&\^J/g' file | sort | uniq -c | sort -nr

继续阅读：bash frequency letters

Bash script to find the frequency of every letter in a file

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？