Linux command or script counting duplicated lines in a text file? [duplicate]
If I have a text file with the following conent
red apple
green apple
green apple
orange
orange
orange
Is there a Linux command or script that I can use to get the following result?
1 red apple
2 green apple
3 orange
Send it through sort
(to put adjacent items together) then uniq -c
to give counts, i.e.:
sort filename | uniq -c
and to get that list in sorted order (by frequency) you can
sort filename | uniq -c | sort -nr
Almost the same as borribles' but if you add the d
param to uniq
it only shows duplicates.
sort filename | uniq -cd | sort -nr
uniq -c file
and in case the file is not sorted already:
sort file | uniq -c
cat <filename> | sort | uniq -c
Can you live with an alphabetical, ordered list:
echo "red apple
> green apple
> green apple
> orange
> orange
> orange
> " | sort -u
?
green apple
orange
red apple
or
sort -u FILE
-u stands for unique, and uniqueness is only reached via sorting.
A solution which preserves the order:
echo "red apple
green apple
green apple
orange
orange
orange
" | { old=""; while read line ; do if [[ $line != $old ]]; then echo $line; old=$line; fi ; done }
red apple
green apple
orange
and, with a file
cat file | {
old=""
while read line
do
if [[ $line != $old ]]
then
echo $line
old=$line
fi
done }
The last two only remove duplicates, which follow immediately - which fits to your example.
echo "red apple
green apple
lila banana
green apple
" ...
Will print two apples, split by a banana.
Try this
cat myfile.txt| sort| uniq
To just get a count:
$> egrep -o '\w+' fruits.txt | sort | uniq -c
3 apple
2 green
1 oragen
2 orange
1 red
To get a sorted count:
$> egrep -o '\w+' fruits.txt | sort | uniq -c | sort -nk1
1 oragen
1 red
2 green
2 orange
3 apple
EDIT
Aha, this was NOT along word boundaries, my bad. Here's the command to use for full lines:
$> cat fruits.txt | sort | uniq -c | sort -nk1
1 oragen
1 red apple
2 green apple
2 orange
Here is a simple python script using the Counter type. The benefit is that this does not require sorting the file, essentially using zero memory:
import collections
import fileinput
import json
print(json.dumps(collections.Counter(map(str.strip, fileinput.input())), indent=2))
Output:
$ cat filename | python3 script.py
{
"red apple": 1,
"green apple": 2,
"orange": 3
}
or you can use a simple one-liner:
$ cat filename | python3 -c 'print(__import__("json").dumps(__import__("collections").Counter(map(str.strip, __import__("fileinput").input())), indent=2))'
精彩评论