Grep -> How to replace contents of a text file
I have a text file that has a lot of lines and it is laid out like
zzzzz | id@host.tld |
yyyyy | id@host.tld |
one of these per line for about 10 million lines.
using a Grep expression, how can I do a replace to just get
zzzzz
yyyyy
etc for each line in the file
Maybe using Perl to rewrite out the file would be fine too, I just dont know a lot of Perl.
UPDATE 1: Sometimes the export gets run to produce:
id@host.tld | zzzzz
id@host.tld | yyyyy
UPDATE 2: Sometimes they leave row numbers in as:
a variable digit row number | zzzzz | id@host.tld |
a variable digit row number | yyyyy | id@host.tld |
UPDATE 3: This file can contain lines with formats like:
zzzzz | id@host.tld |
yyyyy | id@host.tld |
id@host.tld | zzzzz
id@host.tld | yyyyy
variable digit row number | zzzzz | id@host.tld |
variable digit row number |开发者_如何学Python yyyyy | id@host.tld |
It can be done using (GNU) grep
, too:
grep -o '^[^|]*'
Edit:
If you don't want trailing spaces but want to allow leading spaces resp. spaces in the middle of the first field, you could change the command to:
grep -o '^[^|]*[^| ]'
This looks like a job for sed:
sed 's/\(.*\) |.*| \(.*\) |.*|/\1 \2/' filename
or
sed 's/ |[^|]*|//g' filename
EDIT:
The revised question is even easier:
sed 's/ |.*//' filename
You might even be able to get away with
sed 's/ .*//' filename
but that's really pushing it.
Seemed like the question got edited -- or maybe i am losing it :) If all you need is the first part till the "|" something like the following should work
sed 's/\([^|]*\).*/\1/' filename.txt
with perl...for huge files...
use Tie::File;
tie @array, 'Tie::File', 'file.path/file.name' || die;
for (@array) {
s/^([^\|]+).*/$1/;
}
untie @array;
Perl one-liner:
perl -e 'while(<>) { /^(.+?) |/ && print "$1\n" }' input.txt > output.txt
Should work flawlessly, unless the first entry may contain |
.
It would be pretty simple in perl.
You can do a split on " | " to get an array for each line. Then open a file to write, and write "$array[0]\n"
Your program would look something like:
open IN, '<', "someFile.txt";
@lines = <IN>;
close IN;
open OUT, '>', "outfile.txt";
foreach(@lines){
chomp;
@array = split /\s*\|\s*/, $_;
print OUT $array[0] . "\n";
}
close OUT;
For your updates:
Split is a function that takes a pattern, an expression and returns an array of strings. So in the example above. The pattern is a regular expression. \s is a space, \| is "|". So it's saying split on a space zero or more times (\s*), a pipe (\|) and zero or more spaces (\s*).
Update 1 would look like:
@array = {
[0] => "id@host.com"
[1] => "zzzzzzzzzz"
}
Update 2 would look like:
@array = {
[0] => "some Number"
[1] => "zzzzzzzzzz"
[2] => "id@host.com"
}
精彩评论