开发者

need to format csv file differently using regex

I have a csv file that I need to revise, here is a snippet from it:

1.1.1,"1, 8, 11, 13"
1.1.2,"10, 11, 12"
1.1.3,"2, 3, 10, 11, 13"

I want to format it like this:

1.1.1,1
1.1.1,8
1.1.1,11
1.1.1,13
1.1.2,1开发者_开发问答0
1.1.2,11
1.1.2,12
1.1.3,2
1.1.3,3
1.1.3,10
1.1.3,11
1.1.3,13

I am using the search replace function within a text editor, w/ the regular expression option enabled.


I can't think of a way to match when the number of values in the quoted part varies as your data does, but if there aren't too many variations, you could use something like this and just fiddle about by adding ,\s*(\d+) to the Find part and \n\1,\5 to the Replace part a few times to catch all permutations.

Find:

([\d\.]+),"(\d+),\s*(\d+),\s*(\d+)"

Replace:

\1,\2\n\1,\3\n\1,\4

This works in Notepad++ for the second line of your example.


Regex will only work on the file if you are reading it into the program and operating on it in memory. Why not just write a simple converter that translates the file into what you want?

In psuedo-code:

file = open("your.csv");
out = open("your_converted.csv")
for line in file.read()
    list = line.split(",") //split on the commas
    val1 = list[0]
    for i = 1 ; i < list.length-2 //2 because we removed the 1st value already and we start counting from zero.
        value = list[i]
        if value.contains("\"") 
            value = value.remove("\"")
        out.write(val1 + ","value + "\n")

Obviously you'll want to close the file and such after using.


I don't see the need for regex usage here: regex are not always the solution to a problem.

You can do it even without a csv parser, since your file is very simple.

Just put this in a test.py file:

#!/usr/bin/env python
import sys

def main():
 for line in sys.stdin:
  if line.strip():
   fields = line.split(',', 1)
   for s in fields[1].split(','):
    print ','.join([fields[0], s.replace('"', '').strip()])


if __name__=='__main__':
 main()

Then simply do:

$ cat yourfile.csv | test.py > newfile.csv

PS: you may need to chmod +x the python file before executing.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜