Best way to copy and duplicate each line of input file while regex modifying the duplicated line

2022-12-18 20:58 问答作者：

This question has 2 sections one for "single line match" and one for "multi line region matching" Also, I have a semi working solution, I want to find more robustness and elegance in my solution.

Single Line Match: I would like to duplicate each line of an input file such that the second line was a regex modification of the first: E.G.

File.txt

YY BANANA, YYZ, ABC YHZ YY1
YY APPLE , YYZ, ABC YHZ YY1
YY ORANGE, YYZ, ABC YHZ YY1
YZ GRAPE , YZZ, ABC YHZ YZ1

Would BECOME:

YY BANANA, YYZ, ABC YHZ YY1
XY BANANA, XYZ, ABC YHZ XY1
YY APPLE , YYZ, ABC YHZ YY1
XY APPLE , XYZ, ABC YHZ XY1
YY ORANGE, YYZ, ABC YHZ YY1
XY ORANGE, XYZ, ABC YHZ XY1
YZ GRAPE , YZZ, ABC YHZ YZ1
XZ GRAPE , XZZ, ABC YHZ XZ1

Keep in mind the real file is large, and The example of YY ->XY and YZ ->XZ is exactly correct In other words in my file case YY, YH, YZ, Y1, Y2, Y3 are the symbols that I would like to change to XY, XH, XZ, X1, X2, X3.

I have done something in PERL that is very raw ( will create a link to it as as starting point to show What I was thinking) But the perl script I wrote is not elegant or general and requires multiple passes over the file.

My Raw Stab.... IN PERL. http://www.quantprinciple.com/invest/index.php/docs/tipsandtricks/perl-sed-awk/conditional-duplicate/

Usage of my raw stab:

MatchDuplicate.pl  INPUT.txt YY XY > INPUT2.txt
MatchDuplicate.pl  INPUT2.txt YH XH > INPUT3.txt
MatchDuplicate.pl  INPUT3.txt Y1 X1 > INPUT4.txt
MatchDuplicate.pl  INPUT4.txt Y2 X2 > INPUT5.txt

INPUT5.txt is used...

Multi Line Match Exactly the same as above, but each "record" of the input will match multiple lines:

File.txt

< some starting marker...startRecord:>
data
data
YY data
YY BANANA, YYZ, ABC YHZ YY1
<some ending record marker>
< some starting marker...startRecord:>
data
data
YY data
YY APPLE , YYZ, ABC YHZ YY1
<some ending record marker>
< some starting marker...startRecord:>
data
data
YY data
YY ORANGE, YYZ, ABC YHZ YY1
<some ending record marker>
< some starting marker...startRecord:>
data
data
YZ data
YZ GRAPE , YZZ, ABC YHZ YZ1
<some ending record marker>

Would BECOME:

< some starting marker...startRecord:>
data
data
YY data
YY BANANA, YYZ, ABC YHZ YY1
<some ending record marker>
< some starting marker...startRecord:>
data
data
XY data
XY BANANA, XYZ, ABC YHZ XY1
<some ending record marker>
< some starting marker...startRecord:>
data
data
YY data
YY APPLE , YYZ, ABC YHZ YY1
<some ending record marker>
< some starting marker...startRecord:>
data
data
XY data
XY APPLE , XYZ, ABC YHZ XY1
<some ending record marker>
< some starting marker...startRecord:>
data
data
YY data
YY ORANGE, YYZ, ABC YHZ YY1
<some ending record marker>
< some starting marker...startRecord:>
data
data
XY data
XY ORANGE, XYZ, ABC YHZ XY1
<some ending record marker>
< some starting marker...startRecord:>
data
data
YZ data
YZ GRAPE , YZZ, ABC YHZ YZ1
<some ending record marker>
< some starting marker...startRecord:>
data
data
XZ data
XZ GRAPE , XZZ, ABC YHZ XZ1
<some ending record marker>

M开发者_如何学JAVAy Raw Stab: http://www.quantprinciple.com/invest/index.php/docs/tipsandtricks/perl-sed-awk/multi-line-conditional-duplicate/

For 1:

while(<>) {
    say $_;
    say $_ if s/$pattern/$replacement/;
}

Add file handles and other boilerplate as appropriate.

EDIT: Let's go for something a bit more general then.

First, we'll parse out our command-line arguments, and put our replacements into a hash:

$filename = shift @ARGV;
%patterns = ();
while (scalar @ARGV) {
    my $pattern = shift @ARGV;
    my $replacement = shift @ARGV;
    $patterns{$pattern} = $replacement
}

Then for each line in the file, we'll output the line verbatim, and then see if it matches any of our patterns.

while (<>) {
    say $_;
    while (my ($pattern, $replacement) = each %patterns) {
        s/$pattern/$replacement/g and say $_ if /^$pattern/;
    }
}

This will solve your 1st question:

use strict;
use warnings;

die "usage..." unless @ARGV == 3;
my ($file, $src, $dst) = @ARGV;

open my $fh, '<', $file or die "Can not open $file: $!";
while (<$fh>) {
    print;
    if (/^$src\b/) {
        s/$src/$dst/g;
        print;
    }
}
close $fh;

Looking at your linked scripts... you could easily convert your block comments to POD so that they effectively become a manpage for your code. Then you could use POD::Usage to get usage info when the user does something stupid.

If the end-of-record marker is the same for all records, you can set the $/ variable so that <FILE> will read in one record at a time.

$\ = "<some ending record marker>\n";
while (<FILE>) {
    print $_;
    # $_ is a multi-line string so use /m modifier
    print $_ if s/$pattern/$replacement/m;
}

继续阅读：perl scripting

Best way to copy and duplicate each line of input file while regex modifying the duplicated line

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？