Compare columns of unequal length for matches and differences
I will explain this in excel terms just so it will probably be clearer.
I have an excel sheet with 2 columns.
Column A has 69,000 rows. Column B has 49,000 rows.
Column A has our complete product list Column B has product list from Manufacturer 1
There are only certain/some rows which开发者_开发知识库 are common between 2 columns. and also, column B is not a subset of column A. Column A has extra entries and so does column B.
I need to know, which rows from Column B, are common with Column A which rows from Column B are not common with Column A
How would I acheive this? I am trying excel but the vlookup is taking forever and hanging up. Are there any other windows/office utilities which can help me? If its a macro, can you please give me scripts and suggestions to execute it?
I have access to linux machine aswell and I am familiar with those tools.
I can transfer this info to a text file/s, can I run some sed or awk script to print the output?
Any help would be great.
Use the MATCH()
function, it'll give you a number if there is a result, and #NA if there isn't.
I always work in Tables in Excel 2007 and newer, but will give both syntaxes:
Assuming you have a table, with things to compare in columns "column1" and "column2", checking whether the value in Column2 is present in Column1
=ISNUMBER(MATCH(Table1[[#This Row],[Column2]],[Column1],0))
Or if you have an old school array with data in columns A and B, looking for the value in B in A:
=ISNUMBER(MATCH(Sheet1!$B2,Sheet1!$A$2:$A$11,0))
What's going on - you are looking for an exact match (the 0 parameter), of the value on the current row in one column, in the other column and checking whether you get a numeric value (yes there is a match), or not (no match)
This is dead simple on Unix or Linux. Start by putting all of your company's products in one file, and all of the other company's products in another. I'll call them FileA and FileB.
Sort them.
$ sort -u FileA > temp_file
$ mv temp_file FileA
$ sort -u FileB > temp_file
$ mv temp_file FileB
The products that are common to both files . . .
$ comm -12 FileA FileB
The products that are unique to FileB . . .
$ comm -13 FileA FileB
I'm surprised the VLOOKUP is slow/unreliable, 70'000 rows is nothing. Are you sure you've got the formulas correct?
Seeing as you have Excel, you might have MS Access. Loading the columns into an Access table and resolving with JOINs would be very quick
Sort the two lists and use Approximate Vlookup (last argument True): this will be extremely fast (binary search) but you need to handle the NoMatch case: something like this in column C
=IF(B1=Vlookup(B1,$A$1:$A$69000,1,True),"Match","NoMatch")
and copy down
精彩评论