开发者

How to read a block with/without blank line when the block end with blank line?

There are some blocks in my text file. I assumed to structure my text by the block below

How can I read the block by the keywords.(keyword1, keyword2, keyword3, keyword4).

I got two questions.

1. Is there any method to get out the next line of each keyword efficiently?

2. I don't know how to jump the internal blank line between keyword3 and keyword4. The key point is the block defined end with blank.

**block start**

    Keyword1
    Single Line  # I need work on the line
    Keyword2
    Single or Multiple lines  # I need work on the lines
    Keyword3
    (May be there is single or multiple Blank lines)
    Single or Multiple lines  # I need work on the lines
    (May be there is single or multiple Blank lines)
    Keyword4
    Single or Multiple lines  # I need work on the lines
    Single or multipl开发者_StackOverflow社区e Blank line

**block end**


If I understand your data, blank lines are not a reliable indicator, because they can appear before a keyword's text begins, after the text, or not at all. If that's the case, I don't think it will help to read the text in "paragraph mode" (by setting $/ to an empty string). Similarly, the blank lines do not help -- at least not in a simple way -- to identify the start and end of the keyword sections or the "blocks".

You are going to have to parse the text in a more fine-grained way, but you haven't given us enough information to provide a detailed answer. Here's an example that simply stores the non-blank lines by keyword:

use strict;
use warnings;

my (%data, $keyword);

while (my $line = <DATA>){
    next unless $line =~ /\S/;
    chomp $line;
    if ($line =~ /^Keyword/){
        $keyword = $line;
    }
    else {
        push @{$data{$keyword}}, $line;
    }
}

__DATA__
Keyword1
data1 a
Keyword2
data2 a
data2 b
data2 c
Keyword3


data3 a
data3 b


Keyword4
data4 a
data4 b


Do you know about setting $/ to the empty string for “paragraphs mode”?

Every call to <> or readline now returns a multiline record up to one or more blank lines, and chomp removes them all from the end.


Can't you just do a multiline match and use the keywords as anchors like this:

$data =~ /(Keyword1.*?Keyword2.*?Keyword3.*?Keyword4.*?)\n$/sm;
my $block = $1;

Actually, you could do this as well and get the data from each block:

my @keys = $data =~ /Keyword1(.*?)Keyword2(.*?)Keyword3(.*?)Keyword4(.*?)\n$/sm;

and then you could just strip out blank lines in each group.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜