How to stop .+ at the first instance of a character and not the last with regular expressions in perl?
I want to replace:
'''<font size="3"><font color="blue"> SUMMER/WINTER CONFIGURATION FILES</font></font>'''
With:
='''<font color="blue"> SUMMER/WINTER CONFIGURATION FILES</font>'''=
Now my existing code is:
$html =~ s/\n(.+)<font size=\".+?\">(.+)<\/font>(.+)\n/\n=$1$2$3=\n/gm
However this ends up with this as the result:
=''' SUMMER/WINTER CONFIGURATION FILES</font>'''=
Now I can see what 开发者_运维问答is happening, it is matching <font size ="..... all the way up to the end of the <font colour blue">
which is not what I want, I want it to stop at the first instance of " not the last, I thought that is what putting the ? mark there would do, however I've tried .+ .+? .* and .*? with the same result each time.
Anyone got any ideas what I am doing wrong?
Write .+?
in all places to make each match non-greedy.
$html =~ s/\n(.+?)<font size=\".+?\">(.+?)<\/font>(.+?)\n/\n=$1$2$3=\n/gm ^ ^ ^ ^
Also try to avoid using regular expressions to parse HTML. Use an HTML parser if possible.
You could change .+
to [^"]+
(instead of "match anything", "match anything that isn't a "
"...
As Mark said, just use CPAN for this.
#!/usr/bin/env perl
use strict; use warnings;
use HTML::TreeBuilder;
my $s = q{<font size="3"><font color="blue"> SUMMER/WINTER CONFIGURATION FILES</font></font>};
my $tree = HTML::TreeBuilder->new;
$tree->parse( $s );
print $tree->find_by_attribute( color => 'blue' )->as_HTML;
# => <font color="blue"> SUMMER/WINTER CONFIGURATION FILES</font>
This works for your specific case, however:
#!/usr/bin/env perl
use strict; use warnings;
my $s = q{<font size="3"><font color="blue"> SUMMER/WINTER CONFIGURATION FILES</font></font>};
print $s =~ m{
< .+? >
(.+)?
</.+? >
}mx;
# => <font color="blue"> SUMMER/WINTER CONFIGURATION FILES</font>
精彩评论