开发者

Grep -> How to replace contents of a text file

I have a text file that has a lot of lines and it is laid out like

zzzzz | id@host.tld |
yyyyy | id@host.tld |

one of these per line for about 10 million lines.

using a Grep expression, how can I do a replace to just get

zzzzz
yyyyy

etc for each line in the file

Maybe using Perl to rewrite out the file would be fine too, I just dont know a lot of Perl.

UPDATE 1: Sometimes the export gets run to produce:

id@host.tld | zzzzz
id@host.tld | yyyyy

UPDATE 2: Sometimes they leave row numbers in as:

a variable digit row number | zzzzz | id@host.tld |
a variable digit row number | yyyyy | id@host.tld |

UPDATE 3: This file can contain lines with formats like:

zzzzz | id@host.tld |
yyyyy | id@host.tld |
id@host.tld | zzzzz
id@host.tld | yyyyy
variable digit row number | zzzzz | id@host.tld |
variable digit row number |开发者_如何学Python yyyyy | id@host.tld |


It can be done using (GNU) grep, too:

grep -o '^[^|]*'

Edit:
If you don't want trailing spaces but want to allow leading spaces resp. spaces in the middle of the first field, you could change the command to:

grep -o '^[^|]*[^| ]'


This looks like a job for sed:

sed 's/\(.*\) |.*| \(.*\) |.*|/\1 \2/' filename

or

sed 's/ |[^|]*|//g' filename

EDIT:
The revised question is even easier:

sed 's/ |.*//' filename

You might even be able to get away with

sed 's/ .*//' filename

but that's really pushing it.


Seemed like the question got edited -- or maybe i am losing it :) If all you need is the first part till the "|" something like the following should work

sed 's/\([^|]*\).*/\1/' filename.txt 


with perl...for huge files...

use Tie::File;
tie @array, 'Tie::File', 'file.path/file.name' || die;

for (@array) {
    s/^([^\|]+).*/$1/;
}

untie @array;


Perl one-liner:

perl -e 'while(<>) { /^(.+?) |/ && print "$1\n" }' input.txt > output.txt

Should work flawlessly, unless the first entry may contain |.


It would be pretty simple in perl.

You can do a split on " | " to get an array for each line. Then open a file to write, and write "$array[0]\n"

Your program would look something like:

open IN, '<', "someFile.txt";
@lines = <IN>;
close IN;

open OUT, '>', "outfile.txt";

foreach(@lines){
   chomp;
   @array = split /\s*\|\s*/, $_;
   print OUT $array[0] . "\n";
}
close OUT;

For your updates:

Split is a function that takes a pattern, an expression and returns an array of strings. So in the example above. The pattern is a regular expression. \s is a space, \| is "|". So it's saying split on a space zero or more times (\s*), a pipe (\|) and zero or more spaces (\s*).

Update 1 would look like:

@array = {
           [0] => "id@host.com"
           [1] => "zzzzzzzzzz"
         }

Update 2 would look like:

@array = {
           [0] => "some Number"
           [1] => "zzzzzzzzzz"
           [2] => "id@host.com"
         }
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜