Ruby 1.8 regexp: index of match in utf string
I'm trying to search a text for a match and return it with snippet around it. For this, I want to find match with regex, then cut the string using match index +- snippet radius (text.mb_chars[start..finish]).
However, I cannot get ruby's (1.8) regex to return match index which would be multi-byte aware.
I understand that regex is one place in 1.8 which is supposed to be utf aware, but it doesn't seem to work despite /u switch:
"Résumé" =~ /s/u
=> 3
"Resume" =~ /s/u
=> 2
Result should be the same开发者_JAVA百科 if regex was really working in multibyte (/u), but it's returning byte index.
How you get match index in characters, not bytes?
Or maybe some other way to get snippet around (each) match?
Not a real answer, but too long for a comment.
The code
print "Résumé" =~ /s/u
print "\n"
print "Resume" =~ /s/u
on Windows (Ruby 1.8.6, release 26.) prints:
2
2
And on Linux (ruby 1.8.7 (2009-06-12 patchlevel 174) [i486-linux]) it prints:
3
2
How about using this jindex
function I wrote, which corresponds to the other methods in the jcode
library:
class String
def jslice *args
split(//)[*args].join rescue ""
end
def jindex match, start=0
if match.is_a? String
match = Regexp.new(Regexp.escape(match))
end
if self.jslice(start..-1) =~ match
$PREMATCH.jlength + start
else
nil
end
end
end
精彩评论