need to format csv file differently using regex
I have a csv file that I need to revise, here is a snippet from it:
1.1.1,"1, 8, 11, 13"
1.1.2,"10, 11, 12"
1.1.3,"2, 3, 10, 11, 13"
I want to format it like this:
1.1.1,1
1.1.1,8
1.1.1,11
1.1.1,13
1.1.2,1开发者_开发问答0
1.1.2,11
1.1.2,12
1.1.3,2
1.1.3,3
1.1.3,10
1.1.3,11
1.1.3,13
I am using the search replace function within a text editor, w/ the regular expression option enabled.
I can't think of a way to match when the number of values in the quoted part varies as your data does, but if there aren't too many variations, you could use something like this and just fiddle about by adding ,\s*(\d+)
to the Find part and \n\1,\5
to the Replace part a few times to catch all permutations.
Find:
([\d\.]+),"(\d+),\s*(\d+),\s*(\d+)"
Replace:
\1,\2\n\1,\3\n\1,\4
This works in Notepad++ for the second line of your example.
Regex will only work on the file if you are reading it into the program and operating on it in memory. Why not just write a simple converter that translates the file into what you want?
In psuedo-code:
file = open("your.csv");
out = open("your_converted.csv")
for line in file.read()
list = line.split(",") //split on the commas
val1 = list[0]
for i = 1 ; i < list.length-2 //2 because we removed the 1st value already and we start counting from zero.
value = list[i]
if value.contains("\"")
value = value.remove("\"")
out.write(val1 + ","value + "\n")
Obviously you'll want to close the file and such after using.
I don't see the need for regex usage here: regex are not always the solution to a problem.
You can do it even without a csv parser, since your file is very simple.
Just put this in a test.py file:
#!/usr/bin/env python
import sys
def main():
for line in sys.stdin:
if line.strip():
fields = line.split(',', 1)
for s in fields[1].split(','):
print ','.join([fields[0], s.replace('"', '').strip()])
if __name__=='__main__':
main()
Then simply do:
$ cat yourfile.csv | test.py > newfile.csv
PS: you may need to chmod +x the python file before executing.
精彩评论