开发者

Problems comparing two arrays in Ruby

In ruby, I am trying to compare a list of URLs with a previous list of URLs, and get only the new ones.

I put the old list in a text file with one URL per line. I am reading the text file into an array like so:

oldLines = File.open('logfile.txt', 'r').readlines

I have an array of new values populated using the exact same method as the old list, and will probably have some overlap with the 开发者_开发问答old list called 'newLines'. I am trying to get only values that don't match with the old list. Let's say 'newList'.length = 100 and 'oldlist'.length = 95, and I know through visual inspection that something like 90 elements overlap between them. Things I have tried:

newList = newList - oldList
#(newList | oldList) returns 195
#(newList & oldList) returns 0


newList.delete_if { |x| oldList.include?(x) }

In both scenarios, nothing gets deleted from newList. I know I am missing something here. Thanks.


I did the following:

a.txt

http://yahoo.com
http://google.com
http://bing.com

b.txt

http://bing.com
http://yahoo.com

test.rb

a = File.open('a.txt', 'r').readlines.map!(&:chomp)
b = File.open('b.txt', 'r').readlines.map!(&:chomp)
p a-b #=> ["http://google.com"]

Without the chomp it fails because in a.txt I have http://yahoo.com\n while on b.txt I simply have http://yahoo.com without the \n at the end.


All you need to do is invoke the subtract method for arrays, which you did.

['1', '2', '3', '4', '5'] - ['2', '3', '4']

# => ["1", "5"]

Not sure why this isn't working for you. Post some url sample data for your two arrays, problem probably lies there, and I'll update my answer accordingly.


I couldn' figure out what's wrong with your code so I got it on irb. And still I don't have any answers. What is newList and oldList. How are these data structures populated? Are they array?

irb(main):003:0> oldLines = File.open('/Users/pprakash/old', 'r').readlines
=> ["http://www.google.com\n", "http://yahoo.com\n", "http://slideshare.net\n"]
irb(main):004:0> newLines = File.open('/Users/pprakash/new', 'r').readlines
=> ["http://www.google.com\n", "http://yahoo.com\n", "http://slideshare.net\n", "http://great.com\n", "http://example.com\n"]
irb(main):005:0> x = newLines - oldLines
=> ["http://great.com\n", "http://example.com\n"]
irb(main):006:0> newLines
=> ["http://www.google.com\n", "http://yahoo.com\n", "http://slideshare.net\n", "http://great.com\n", "http://example.com\n"]
irb(main):007:0> oldLines
=> ["http://www.google.com\n", "http://yahoo.com\n", "http://slideshare.net\n"]
irb(main):008:0> newLines = newLines - oldLines
=> ["http://great.com\n", "http://example.com\n"]
irb(main):009:0> newLines
=> ["http://great.com\n", "http://example.com\n"]
irb(main):010:0> 


I was not able to reproduce your problem. Here is what I did

urls.txt

http://www.google.com
http://www.digg.com
http://www.slashdot.com
http://www.yahoo.com

urls2.txt

http://www.google.com
http://www.digg.com
http://www.slashdot.com
http://www.yahoo.com
http://www.dzone.com
http://www.digit.com
http://www.digitaldreams.com

Code

first = File.open('urls.txt', 'r').readlines
second = File.open('urls2.txt', 'r').readlines

disjoint = second - first

Update: Upon attempting a few other things I flubbed my code by chomping the '\n' some of the URLs, and subtracted URLs with '\n' from URLs without '\n' and it didn't remove anything. So I imagine why you aren't seeing anything removed is some sort of error like that. Try printing out the two URLs before you subtract them.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜