removing redundant data

2023-03-23 13:57 问答作者：

I have a file which looks like this (3 columns and n number of rows)

chr8    101999980   102031975
chr8    101999980   102033533 
chr8    101999980   102033533 
chr8    101999980   102032736 
chr8    101999980   102034799 
chr8    101999980   102034799 
chr8    101999980   102034397
chr8    101999980   102032736

and from this data I wan开发者_JS百科t to remove the redundant lines and these exact repeated data could be present anywhere in this dataset with a bash script.

If maintaining the order is important:

awk '!c[$0]++' filename

This can be read as follows:

pushes each line as an array key (c[$0]),
post-increments (++) the value to keep a count of such lines, and
performs the default action only if the line has never been seen before (!)
- n++ returns 0, or false, if n is unset
- the default action is {print}

You can pipe your file through sort and uniq:

$ sort yourFile | uniq > newFile

sort yourfile | uniq > outputfile

If order does not matter.

It works on adjacent identical rows, that's why you need sort. In your file, you don't need sort because the duplicates come right next to each other. If that is not the standard case, you need to sort the file first.

$ uniq yourfile | wc -l
6
$ sort yourfile | uniq | wc -l
6

With and without sort both return 6 lines, but you did not say it is the default.

继续阅读：bash shell

removing redundant data

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？