开发者

Split a line into two parts

I have a cut and paste of George Michael's DVD track listing from Amazon in $str and subsequent code to process it by splitting on the first two digits and the rest:

$str = "20 Fastlove 21 Jesus To A Child 22 Spinning the Wheel 23 Older 24 Outside 25 As (with Mary J. Blige) 26 Freeek! 27 Amazing 28 John and Elvis are Dead 29 Flawless (Go To The City) 30 Shoot The Dog 31 Roxanne 32 An Easier Affair 33 If I Told You That (with Whitney Houston) 34 Waltz Away Dreaming 35 Somebody To Love 36 I Can’t Make You Love Me 37 Star People '97 38 You Have Been Loved 39 Killer/ Papa Was A RollIn Stone 40 Round Here";

while ($str =~ /(\d{2}) (\S+)/g) {
        print "$1 $2\n";
}

Result:

开发者_如何学Python
20 Fastlove
21 Jesus
22 Spinning
23 Older
24 Outside
25 As
26 Freeek!
27 Amazing
28 John
29 Flawless
30 Shoot
31 Roxanne
32 An
33 If
34 Waltz
35 Somebody
36 I
37 Star
97 38
39 Killer/
40 Round

Above kind of works, but does not include the full track name. Any advice on how to go about getting the result I'd like? The result I am expecting, or want is:

20 Fastlove
21 Jesus To A Child
22 Spinning the Wheel
[etc.]


As Ignacio said, this can't really be done with 100% accuracy, because the track names can contain digits. But since you can probably assume that the track numbers will be consecutive, you can come pretty close to 100%:

my $str = "20 Fastlove 21 Jesus To A Child 22 Spinning the Wheel 23 Older 24 Outside 25 As (with Mary J. Blige) 26 Freeek! 27 Amazing 28 John and Elvis are Dead 29 Flawless (Go To The City) 30 Shoot The Dog 31 Roxanne 32 An Easier Affair 33 If I Told You That (with Whitney Houston) 34 Waltz Away Dreaming 35 Somebody To Love 36 I Cant Make You Love Me 37 Star People '97 38 You Have Been Loved 39 Killer/ Papa Was A RollIn Stone 40 Round Here";

my ($track) = ($str =~ /^(\d+)/) or die "No initial track number";

my $next;
while ($next = $track + 1 and
       $str =~ s/^\s*             # optional initial whitespace
                 $track \s+       # track number followed by whitespace
                 (\S.*?)          # title begins with non-whitespace
                 (?= \s+ $next \s # title stops at next track #
                     | $ )        # or end-of-string
                //x) {
  print "$track $1\n";
  $track = $next;
}

die "$str left over" if $str =~ /\S/; # sanity check

This modifies $str, so make a copy if necessary.

This will fail if the title of a track contains the next track number, but that should be fairly uncommon. It will also fail if there are missing tracks or the track numbers are otherwise nonconsecutive.


A variant of cjm's answer that scans the input string nondestructively:

if ($str =~ /^(\d+)/) {
    my ($current, $next) = ($1, $1 + 1);
    while ($str =~ /\G *$current ((?:(?! *$next).)+)/g) {
        print "$current $1\n";
        ($current, $next) = ($next, $next + 1);
    }
}


Here's another approach (also on ideone.com):

while ($str =~ /(?<!\S)(\d+)\s+((?!\d+\s)\S+(?:\s+(?!\d+\s)\S+)*)/g) {
    print "$1 $2\n";
}

This assumes any sequence of one or more digits that's followed by whitespace and not preceded by non-whitespace is a track number. That eliminates the '97 in track #37's title, but there's nothing stopping a song title from having a bare number in it.

In general, I think @cjm's consecutive-numbers idea is probably your best bet.


I've upvoted one of the answers here since I think it answers your specific question quite well, other than the "this track name contains the track number of the next track" problem. Albums with this property will be few and far between.

But I've got to say it, your problem really stems from having $str in that format in the first place. If you have a look at the source for this page for example, you could quite easily extract the track names from the HTML itself without regard for the names of the tracks.

That's because the HTML clearly delineates the tracks. Now I don't know if that information is available but you may want to rethink how you're getting that data in the first place. It may make your life a lot easier. Or, if not easier, at least more accurate :-)


You're so darn close:

$str = "20 Fastlove 21 Jesus To A Child 22 Spinning the Wheel 23 Older 24 Outside 25 As (with Mary J. Blige) 26 Freeek! 27 Amazing 28 John and Elvis are Dead 29 Flawless (Go To The City) 30 Shoot The Dog 31 Roxanne 32 An Easier Affair 33 If I Told You That (with Whitney Houston) 34 Waltz Away Dreaming 35 Somebody To Love 36 I Can’t Make You Love Me 37 Star People '97 38 You Have Been Loved 39 Killer/ Papa Was A RollIn Stone 40 Round Here";

while ($str =~ /(\d{2}[^\d]*)/g) {
    print "$1\n";
}

Note the regular expression, I am using the [^ ] syntax to mean not that character. The [^\d] means not a digit, and the asterisk on the end means zero or more.

By specifying that I want the rest of the string to continue until I find a number, I can select the rest of the name (that is, until Star People '97. Darn it. So close...

If you need the number and title in two separate variables, you could use parentheses.

$str = "20 Fastlove 21 Jesus To A Child 22 Spinning the Wheel 23 Older 24 Outside 25 As (with Mary J. Blige) 26 Freeek! 27 Amazing 28 John and Elvis are Dead 29 Flawless (Go To The City) 30 Shoot The Dog 31 Roxanne 32 An Easier Affair 33 If I Told You That (with Whitney Houston) 34 Waltz Away Dreaming 35 Somebody To Love 36 I Can’t Make You Love Me 37 Star People '97 38 You Have Been Loved 39 Killer/ Papa Was A RollIn Stone 40 Round Here";

while ($str =~ /(\d{2})([^\d]*)/g) {
    my $number = $1;
    my $title = $2;

    print "$number: $title\n";
}

Still trying to figure out how to get Star People '97 to work. I believe it has something to do with the beginning single quote. All numbers are preceded by a space or are at the beginning of a line. I wonder if that could be used?


As Ignacio Vazquez-Abrams said, song names with numbers will be a problem, but this should work for all except "Star People '97"

/(\d{2}) (\D+)/g

Note: I'm not a Perl coder, but the regex works correctly in rubular.com (except for the " '97 " case mentioned.)


Your best bet is something like the following. But even it has a problem if one of the tracks contains the number of the next track.

#!/usr/bin/perl

use strict;
use warnings;

my $str = "20 Fastlove 21 Jesus To A Child 22 Spinning the Wheel 23 Older 24 Outside 25 As (with Mary J. Blige) 26 Freeek! 27 Amazing 28 John and Elvis are Dead 29 Flawless (Go To The City) 30 Shoot The Dog 31 Roxanne 32 An Easier Affair 33 If I Told You That (with Whitney Houston) 34 Waltz Away Dreaming 35 Somebody To Love 36 I Can’t Make You Love Me 37 Star People '97 38 You Have Been Loved 39 Killer/ Papa Was A RollIn Stone 40 Round Here";

my @parts = split " ", $str;

my %songs;
my $track     = shift @parts;
my $new_track = $track + 1;
my $song      = "";
while (@parts) {
    my $part = shift @parts;
    unless ($part eq $new_track) {
        $song .= " $part";
        next;
    }
    $songs{$track} = $song;
    $song          = "";
    $track         = $new_track;
    $new_track     = $track + 1;
}

for my $track (sort { $a <=> $b } keys %songs) {
    print "$track\t$songs{$track}\n";
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜