Problems comparing two arrays in Ruby
In ruby, I am trying to compare a list of URLs with a previous list of URLs, and get only the new ones.
I put the old list in a text file with one URL per line. I am reading the text file into an array like so:
oldLines = File.open('logfile.txt', 'r').readlines
I have an array of new values populated using the exact same method as the old list, and will probably have some overlap with the 开发者_开发问答old list called 'newLines'. I am trying to get only values that don't match with the old list. Let's say 'newList'.length = 100 and 'oldlist'.length = 95, and I know through visual inspection that something like 90 elements overlap between them. Things I have tried:
newList = newList - oldList
#(newList | oldList) returns 195
#(newList & oldList) returns 0
newList.delete_if { |x| oldList.include?(x) }
In both scenarios, nothing gets deleted from newList. I know I am missing something here. Thanks.
I did the following:
a.txt
http://yahoo.com
http://google.com
http://bing.com
b.txt
http://bing.com
http://yahoo.com
test.rb
a = File.open('a.txt', 'r').readlines.map!(&:chomp)
b = File.open('b.txt', 'r').readlines.map!(&:chomp)
p a-b #=> ["http://google.com"]
Without the chomp
it fails because in a.txt
I have http://yahoo.com\n
while on b.txt
I simply have http://yahoo.com
without the \n
at the end.
All you need to do is invoke the subtract method for arrays, which you did.
['1', '2', '3', '4', '5'] - ['2', '3', '4']
# => ["1", "5"]
Not sure why this isn't working for you. Post some url sample data for your two arrays, problem probably lies there, and I'll update my answer accordingly.
I couldn' figure out what's wrong with your code so I got it on irb. And still I don't have any answers. What is newList and oldList. How are these data structures populated? Are they array?
irb(main):003:0> oldLines = File.open('/Users/pprakash/old', 'r').readlines
=> ["http://www.google.com\n", "http://yahoo.com\n", "http://slideshare.net\n"]
irb(main):004:0> newLines = File.open('/Users/pprakash/new', 'r').readlines
=> ["http://www.google.com\n", "http://yahoo.com\n", "http://slideshare.net\n", "http://great.com\n", "http://example.com\n"]
irb(main):005:0> x = newLines - oldLines
=> ["http://great.com\n", "http://example.com\n"]
irb(main):006:0> newLines
=> ["http://www.google.com\n", "http://yahoo.com\n", "http://slideshare.net\n", "http://great.com\n", "http://example.com\n"]
irb(main):007:0> oldLines
=> ["http://www.google.com\n", "http://yahoo.com\n", "http://slideshare.net\n"]
irb(main):008:0> newLines = newLines - oldLines
=> ["http://great.com\n", "http://example.com\n"]
irb(main):009:0> newLines
=> ["http://great.com\n", "http://example.com\n"]
irb(main):010:0>
I was not able to reproduce your problem. Here is what I did
urls.txt
http://www.google.com
http://www.digg.com
http://www.slashdot.com
http://www.yahoo.com
urls2.txt
http://www.google.com
http://www.digg.com
http://www.slashdot.com
http://www.yahoo.com
http://www.dzone.com
http://www.digit.com
http://www.digitaldreams.com
Code
first = File.open('urls.txt', 'r').readlines
second = File.open('urls2.txt', 'r').readlines
disjoint = second - first
Update: Upon attempting a few other things I flubbed my code by chomping the '\n' some of the URLs, and subtracted URLs with '\n' from URLs without '\n' and it didn't remove anything. So I imagine why you aren't seeing anything removed is some sort of error like that. Try printing out the two URLs before you subtract them.
精彩评论