What is the most efficient way to subtract one list from another?
I am trying to subtract List_1 (50k lines) from List_2 (100k lines) , when an item in List_1 is an exact match for an item in List_2. I am using grep
, specifically:
grep -v -f List_1.csv List_2.csv > Magic_L开发者_如何学Pythonist.csv
I know this is not the most efficient way to do this, but what is? sed
? awk
? comm
? SQL? How might I accomplish this in the most efficient way possible?
This is one of the most efficient ways IMHO, you need to add -F though:
grep -Fvf List_1.csv List_2.csv > Magic_List.csv
Most efficient way is to use a trie data structure or a hash function for the 2nd list and for each item in the first list search in your trie.
You'd have to benchmark it to find the most efficient method. This is, however, what comm
is for, so I'd guess it would be a pretty tool.
comm -13 List_1.csv List_2.csv > Magic_List.csv
精彩评论