Regex to pull out quoted text
I would like to use regex to identify quotes in a string with the words between them. I also would like to include both double quotes and single quotes.
Example, If I had a string:
The "cat and the hat" sat on a rat. The 'mouse ran' up the clock.
Then 开发者_JAVA技巧it would identify the following:
cat and the hat
mouse ran
What would the regex be?
(["']).*?\1
Works for me. Assuming that quotes can't exist inside quotes...
#!/usr/bin/env perl
use 5.010;
my $quoted_rx = qr{
(?<quote> ['"] ) # SO highlight bug "'
(?<guts>
(?: (?! \k<quote> ) . ) *
)
\k<quote>
}sx;
my $string = <<'END_OF_STRING';
The "cat and the hat" sat on a rat. The 'mouse ran' up the clock.
END_OF_STRING
while ($string =~ /$quoted_regex/g) {
say $+{guts};
}
Each time you match, the quote-type will be in $+{quote}
and the stuff in between them will be in $+{guts}
.
That only works for U+27 (APOSTROPHE) and U+22 (QUOTATION MARK). If you want it to work for things like ‘this’ and “this”, you’ll have to be fancier. There is a \p{Quotation_Mark}
property for any sort of quotation mark, and \p{Pi}
for initial punctuation and \p{Pf}
for final punctuation.
$s = 'The "cat and the hat" sat on a rat. The \'mouse ran\' up the clock.';
preg_match_all('~([\'"])(.*?)\1~s', $s, $result);
print_r($result[2]);
output (as seen on ideone):
Array ( [0] => cat and the hat [1] => mouse ran )
preg_match_all
saves all the match results in an array of arrays. You can change how the results are arranged, but by default the first array contains the overall matches ($0
or $&
), the second array contains the contents of the first capturing group ($1
, $2
, etc.), and so on.
In this case $result[0]
is the complete quoted strings from all of the matches, $result[1]
is the quote, and $result[2]
is whatever was between the quotes.
精彩评论