Ruby: Using a csv as a database

2023-02-11 13:05 问答作者：

I think I may not have done a good enough job explaining my question the first time.

I want to open a bunch of text, and binary files and scan those files with my regular expression. What I need from the csv is to take the data in the second column, which are the paths to all the files, as the means to point to which file to open.

Once the file is opened and the regexp is scanned thru the file, if it matches anything, it displays to the screen. I am sorry for the confusion and thank you so much for everything! –

Hello,

I am sorry for asking what is probably a simple question. I am new to ruby and will appreciate any guidance.

I am trying to use a csv file as an index to leverage other actions.

In particular, I have a csv file that looks like:

 id, file, description, date
 1, /dir_a/file1, this is the first file, 02/10/11
 2, /dir_b/f开发者_JAVA百科ile2, this is the second file, 02/11/11

I want to open every file defined in the "file" column and search for a regular expression.

I know that you can define the headers in each column with the CSV class

require 'rubygems'
require 'csv'
require 'pp'

index = CSV.read("files.csv", :headers => true)

index.each do |row|
  puts row ['file']
end

I know how to create a loop that opens every file and search's for a regexp in each file, and if there is one, displays it:

regex = /[0-9A-Za-z]{8,8}-[0-9A-Za-z]{4,4}-[0-9A-Za-z]{4,4}-[0-9A-Za-z]{4,4}-[0-9A-Za-z]{12,12}/

Dir.glob('/home/Bob/**/*').each do |file|
  next unless File.file?(file)
  File.open(file, "rb") do |f|
    f.each_line do |line|
      f.each_line do |line|
        unless (pattern = line.scan(regex)).empty?
          puts "#{pattern}"
        end
      end
    end
  end
end

Is there a way I can use the contents of the second column in my csv file as my variable to open each of the files, search the regexp and if there is a match in the file, output the the row in the csv that had the match to a new csv?

Thank you in advance!!!!

At a quick glance it looks like you could reduce it to:

index.each do |row|
  File.foreach(row['file']) do |line|
    puts "#{pattern}" if (line[regex])
  end
end

A CSV file shouldn't be binary, so you can drop the 'rb' when opening the file, letting us reduce the file read to foreach, which iterates over the file, returning it line by line.

The depth of the files in your directory hierarchy is in question based on your sample code. It's not real clear what's going on there.

EDIT:

it tells me that "regex" is an undefined variable

In your question you said:

regex = /[0-9A-Za-z]{8,8}-[0-9A-Za-z]{4,4}-[0-9A-Za-z]{4,4}-[0-9A-Za-z]{4,4}-[0-9A-Za-z]{12,12}/

the files I open to do the search on may be a binary.

According to the spec:

Common usage of CSV is US-ASCII, but other character sets defined by IANA for the "text" tree may be used in conjunction with the "charset" parameter.

It goes on to say:

Security considerations:

CSV files contain passive text data that should not pose any risks. However, it is possible in theory that malicious binary data may be included in order to exploit potential buffer overruns in the program processing CSV data. Additionally, private data may be shared via this format (which of course applies to any text data).

So, if you're seeing binary data you shouldn't because it's not CSV according to the spec. Unfortunately the spec has been abused over the years, so it's possible you are seeing binary data in the file. If so, continue to use 'rb' as the file mode but do it cautiously.

An important question to ask is whether you can read the file using Ruby's CSV library, which makes a lot of this a moot discussion.

继续阅读：csv ruby

Ruby: Using a csv as a database

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？