need a shell script to change the comma delimiter to a pipe delimeter
My input looks like "$130.00","$2,200.00","$1,230.63"
and so on
My question is how can I go about changing the comma delimiter to a | delimiter without getting rid of the comma in the actual input.
Just to clarify this input is in a csv file with 40 columns and 9500 rows.开发者_Python百科
I want my output to look like
"$130.00"|"$2,200.00"|"$1,230.63"
To do this reliably, you have to use states to keep track of wether you are inside a string or not. The following perl script should work:
#!/usr/bin/perl -w
use strict;
use warnings;
my $state_outside_string = 0;
my $state_inside_string = 1;
my $state = $state_outside_string;
while (my $line = <>) {
my @chars = split(//,$line);
foreach my $char (@chars) {
if ($char eq '"') {
if ($state == $state_outside_string) {
$state = $state_inside_string;
} else {
$state = $state_outside_string;
}
} elsif ($char eq ',') {
if ($state == $state_outside_string) {
print '|';
next;
}
}
print $char;
}
}
Does 'having shell run a Perl script' count?
If so, I'd look at Perl's Text::CSV module. You'd have two CSV handles, one for reading the file with the sep_char
attribute set as comma (the standard, default), the other for writing the file with the sep_char
attribute set as pipe.
Working script
#!/usr/bin/env perl
use strict;
use warnings;
use Text::CSV;
die "Usage: $0 in_file out_file\n" unless scalar @ARGV == 2;
my $in = Text::CSV->new({ binary => 1, blank_is_undef => 1 })
or die "Horribly";
my $out = Text::CSV->new({ binary => 1, sep_char => '|',
always_quote => 1, eol => "\n" })
or die "Horribly";
open my $fh_in, '<', $ARGV[0]
or die "Failed to open $ARGV[0] for reading ($!)";
open my $fh_out, '>', $ARGV[1]
or die "Failed to open $ARGV[1] for writing ($!)";
while (my $fields = $in->getline($fh_in))
{
$out->print($fh_out, $fields);
}
close $fh_in or die "Failed to close input ($!)";
close $fh_out or die "Failed to close output ($!)";
Sample input
"$130.00","$2,200.00","$1,230.63"
"EUR1.300,00",,
"GBP1,300.00","$2,200.00",
Sample output
"$130.00"|"$2,200.00"|"$1,230.63"
"EUR1.300,00"||
"GBP1,300.00"|"$2,200.00"|
If you have no other commas in your file, you can use:
sed "s/,/|/g" filename > outputfilename
If the commas are only between the ""
s, then:
sed 's/","/"|"/g' filename > outputfilename
Works like this:
sh-3.1$ echo '"123,456","123,454"' |sed 's/","/"|"/g'
"123,456"|"123,454"
If you can still have an quoted-expression like ","
in your input and don't want to change that, then it gets a bit more complicated, I think :)
Another solution with Python using a dedicated module, probably best in terms of safety and code needed:
import csv
inFilename = 'input.csv'
outFilename = 'output.csv'
r = csv.reader(open(inFilename))
w = csv.writer(open(outFilename,'w'), delimiter='|', quotechar='"', quoting=csv.QUOTE_NONNUMERIC)
w.writerows(list(r))
Safe and simple. You can tweak this for other formats easily, the parameters are fairly straightforward.
Ruby's CSV library was replaced with FasterCSV in 1.9; in earlier versions you can use the fastercsv
gem.
#!/usr/bin/env ruby
require "csv"
output = CSV.read("test.csv").map do |row|
row.to_csv(:col_sep => "|")
end
puts output
精彩评论