Better way to parse "Description (tag)" to "Description, tag"

2023-03-12 00:08 问答作者：

I have a text file with many 1000s of lines like this, which are category descriptions with the keyword enclosed in parentheses

Chemicals (chem) 
Electrical (elec)

I need to convert these lines to comma separated values like so:

Chemicals, chem
Elec开发者_C百科trical, elec

What I am using is this:

lines = line.gsub!('(', ',').gsub!(')', '').split(',')

I would like to know if there is a better way to do this.

for posterity, this is the full code (based on the answers)

require 'rubygems'
require 'csv'

csvfile = CSV.open('output.csv', 'w')
File.open('c:/categories.txt') do |f|
  f.readlines.each do |line|
    (desc, cat) = line.split('(')
    desc.strip!
    cat.strip!
    csvfile << [desc, cat[0,cat.length-1]]
  end
end

Try something like this:

line.sub!(/ \((\w+)\)$/, ', \1')

The \1 will be replaced with the first match of the given regexp (in this case it will be always the category keyword). So it will basically change the (chem) with , chem.

Let's create an example using a text file:

lines = []
File.open('categories.txt', 'r') do |file|
  while line = file.gets 
    lines << line.sub(/ \((\w+)\)$/, ', \1')
  end
end

Based on the question updates I can propose this:

require 'csv'

csv_file = CSV.open('output.csv', 'w')

File.open('c:/categories.txt') do |f| 
  f.each_line {|c| csv_file << c.scan(/^(.+) \((\w+)\)$/)}
end

csv_file.close

Starting with Ruby 1.9, you can do it in one method call:

str = "Chemicals (chem)\n"
mapping = { ' (' => ', ',
            ')'  => ''}

str.gsub(/ \(|\)/, mapping)  #=> "Chemicals, chem\n"

In Ruby, a cleaner, more efficient, way to do it would be:

description, tag = line.split(' ', 2) # split(' ', 2) will return an 2 element array of
                                      # the all characters up to the first space and all characters after. We can then use
                                      # multi assignment syntax to assign each array element in a different local variable
tag = tag[1, (tag.length - 1) - 1] # extract the inside characters (not first or last) of the string
new_line = description << ", " << tag # rejoin the parts into a new string

This will be computationally faster (if you have a lot of rows) because it uses direct string operations instead of regular expressions.

No need to manipulate the string. Just grab the data and output it to the CSV file. Assuming that you have something like this in the data:

Chemicals (chem)

Electrical (elec)

Dyes & Intermediates (dyes)

This should work:

File.open('categories.txt', 'r') do |file|
  file.each_line do |line|
    csvfile << line.match(/^(.+)\s\((.+)\)$/) { |m| [m[1], m[2]] }
  end
end

Benchmarks relevant to discussion in @hundredwatt's answer:

require 'benchmark'

line = "Chemicals (chem)"

# @hundredwatt
puts Benchmark.measure {
  100000.times do
    description, tag = line.split(' ', 2)
    tag = tag[1, (tag.length - 1) - 1]
    new_line = description << ", " << tag
  end
} # => 0.18

# NeX
puts Benchmark.measure {
  100000.times do
    line.sub!(/ \((\w+)\)$/, ', \1')
  end
} # => 0.08

# steenslag
mapping = { ' (' => ', ',
  ')'  => ''}
puts Benchmark.measure {
  100000.times do
    line.gsub(/ \(|\)/, mapping)
  end
} # => 0.08

know nothing about ruby, but it is easy in php

 preg_match_all('~(.+)\((.+)\)~','Chemicals (chem)',$m);

$result = $m[1].','.$m[2];

继续阅读：ruby

Better way to parse "Description (tag)" to "Description, tag"

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？