removing redundant data
I have a file which looks like this (3 columns and n number of rows)
chr8    101999980   102031975
chr8    101999980   102033533 
chr8    101999980   102033533 
chr8    101999980   102032736 
chr8    101999980   102034799 
chr8    101999980   102034799 
chr8    101999980   102034397
chr8    101999980   102032736
and from this data I wan开发者_JS百科t to remove the redundant lines and these exact repeated data could be present anywhere in this dataset with a bash script.
If maintaining the order is important:
awk '!c[$0]++' filename
This can be read as follows:
- pushes each line as an array key (c[$0]),
- post-increments (++) the value to keep a count of such lines, and
- performs the default action only if the line has never been seen before (!)- n++returns 0, or false, if n is unset
- the default action is {print}
 
You can pipe your file through sort and uniq:
$ sort yourFile | uniq > newFile
sort yourfile | uniq > outputfile
If order does not matter.
It works on adjacent identical rows, that's why you need sort. In your file, you don't need sort because the duplicates come right next to each other. If that is not the standard case, you need to sort the file first.
$ uniq yourfile | wc -l
6
$ sort yourfile | uniq | wc -l
6
With and without sort both return 6 lines, but you did not say it is the default.
 
         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论