Getting some elements in a string using a regex
Context
Using Ruby I am parsing strings looking like this:
A t开发者_运维问答ype with an ID...
[Image=4b5da003ee133e8368000002]
[Video=679hfpam9v56dh800khfdd32]
...with between 0 and n additional options separated with @...
[Image=4b5da003ee133e8368000002@size:small]
[Image=4b5da003ee133e8368000002@size:small@media:true]
In this example:
[Image=4b5da003ee133e8368000002@size:small@media:true]
I want to retrieve:
- [Image=4b5da003ee133e8368000002@size:small@media:true]
- Image
- 4b5da003ee133e8368000002
- size:small
- media:true
Problem
Right now using this regex:
(\[([a-zA-Z]+)=([a-zA-Z0-9]+)(@[a-zA-Z]+:[a-zA-Z]+)*\])
I get...
- [Image=4b5da003ee133e8368000002@size:small@media:true]
- Image
- 4b5da003ee133e8368000002
- @media:true
What am I doing wrong? How can I get what I want?
PS: All the results are copied from http://rubular.com/ which is nice to debug regex. Please use it if it can help you help me :)
Edit : if it's impossible to get all options separated, how could I get this:
- [Image=4b5da003ee133e8368000002@size:small@media:true]
- Image
- 4b5da003ee133e8368000002
- @size:small@media:true
Edit:
Ruby's Regex implementation seems not to support multiple captures on one group, as most other regex engines do. Therefore, you'll have to do two steps; first getting all the @*:*
in one string and then split those.
To get all of them, this should work:
(\[([a-zA-Z]+)=([a-zA-Z0-9]+)((?:@[a-zA-Z]+:[a-zA-Z]+)*)\])
To get the "tail" of options, you could fetch it from $4
with
/(\[([a-zA-Z]+)=([a-zA-Z0-9]+)((@[a-zA-Z]+:[a-zA-Z]+)*)\])/
and then split
on at-signs.
For example:
#! /usr/bin/ruby
str = "[Image=4b5da003ee133e8368000002@size:small@media:true]"
if /(\[([a-zA-Z]+)=([a-zA-Z0-9]+)((@[a-zA-Z]+:[a-zA-Z]+)*)\])/.match(str)
print $1, "\n",
$2, "\n",
$3, "\n",
$4, "\n";
$4[1..-1].split(/@/).each do |s|
print s, "\n";
end
end
Output:
[Image=4b5da003ee133e8368000002@size:small@media:true] Image 4b5da003ee133e8368000002 @size:small@media:true size:small media:true
(\[([a-zA-Z]+)=([a-zA-Z0-9]+)(?:@([a-zA-Z]+:[a-zA-Z]+))*\])
will give you media:true. Note that media:true is overwriting the previous size:small match. I don't think there's a way to get exactly what you want in a single match call.
It looks like the regex only keeps the last match. I think to get the list of matches will require a different approach.
"a=b@c:d@e:f".split(/=|@/)
which creates a list:
["a", "b", "c:d", "e:f"]
which is close to what you want...
Although it can be tricky to do it purely within a regexp, it's not too hard to split it out as a two-step operation:
while (line = DATA.gets)
line.chomp!
if (m = line.match(/\[([a-zA-Z]+)=([a-zA-Z0-9]+)((?:@[a-zA-Z]+:[a-zA-Z]+)*)\]/))
(type, hash, options) = m.to_a[1, 3]
options = options.split(/@/).reject { |s| s.empty? }
puts [ type, hash, options.join(',') ].join(' / ')
end
end
__END__
[Image=4b5da003ee133e8368000002]
[Video=679hfpam9v56dh800khfdd32]
[Image=4b5da003ee133e8368000002@size:small]
[Image=4b5da003ee133e8368000002@size:small@media:true]
[Image=4b5da003ee133e8368000002@size:small@media:true@foo:bar]
This produces the output:
Image / 4b5da003ee133e8368000002 /
Video / 679hfpam9v56dh800khfdd32 /
Image / 4b5da003ee133e8368000002 / size:small
Image / 4b5da003ee133e8368000002 / size:small,media:true
Image / 4b5da003ee133e8368000002 / size:small,media:true,foo:bar
精彩评论