Regular Expression to break row with comma separated values into distinct rows
I have a file with many rows. Each row has a column which may contain comma separated values. I need each row to be distinct (ie no comma separated values).
Here is an example row:
AB AB10,AB11,AB12,AB15,AB16,AB21,AB22,AB23,AB24,AB25,AB99 ABERDEEN Aberdeenshire
The columns are comma separated (Postcode area, Postcode districts, Post town, Former postal county).
So the above row would get turned into:
AB AB10 ABERDEEN Aberdeenshire AB AB11 ABERDEEN Aberdeenshire AB AB12 ABERDEEN Aberdeenshire ... ...
I tried the following but it didn't work...
(.+)\t(([0-9A-Z]+),)+\t(.+)\t(.+开发者_开发百科)
I agree that RegEx are not be the best way but this should work hopefully if that's all you have available to you. (Done repeatedly until there are no more matches)
Edit
Updated with the OP's final solution from the comments.
Find: (.+)\t([^,\s]+),([^\t]+)\t(.+)
Replace: \1\t\2\t\4\r\1\t\3\t\4
I agree with stakx that this doesn't sound like a good place for regexes.
I would write a small program instead which read each line, split the line into columns, split each relevant column into a list of values, and then iterated over all combinations of those, outputting a line each time.
Assuming it's only that one column which can have multiple tokens, it would basically look like this:
while not InputFile.EndOfFile:
line = InputFile.readline();
columns = line.split('\t'); //Assuming 1-based array, so indexes 1-4
col2values = columns[2].split(',');
for each value in col2values:
OutputFile.WriteLine(columns[1]+'\t'+value+'\t'+columns[3]+'\t'+columns[4]);
If multiple columns can have multiple values, simply put another loop inside the for each.
精彩评论