开发者

Getting some elements in a string using a regex

Context


Using Ruby I am parsing strings looking like this:

A t开发者_运维问答ype with an ID...

[Image=4b5da003ee133e8368000002]
[Video=679hfpam9v56dh800khfdd32]

...with between 0 and n additional options separated with @...

[Image=4b5da003ee133e8368000002@size:small]
[Image=4b5da003ee133e8368000002@size:small@media:true]

In this example:

[Image=4b5da003ee133e8368000002@size:small@media:true]

I want to retrieve:

  1. [Image=4b5da003ee133e8368000002@size:small@media:true]
  2. Image
  3. 4b5da003ee133e8368000002
  4. size:small
  5. media:true

Problem


Right now using this regex:

(\[([a-zA-Z]+)=([a-zA-Z0-9]+)(@[a-zA-Z]+:[a-zA-Z]+)*\])

I get...

  1. [Image=4b5da003ee133e8368000002@size:small@media:true]
  2. Image
  3. 4b5da003ee133e8368000002
  4. @media:true

What am I doing wrong? How can I get what I want?

PS: All the results are copied from http://rubular.com/ which is nice to debug regex. Please use it if it can help you help me :)


Edit : if it's impossible to get all options separated, how could I get this:

  1. [Image=4b5da003ee133e8368000002@size:small@media:true]
  2. Image
  3. 4b5da003ee133e8368000002
  4. @size:small@media:true


Edit:

Ruby's Regex implementation seems not to support multiple captures on one group, as most other regex engines do. Therefore, you'll have to do two steps; first getting all the @*:* in one string and then split those.

To get all of them, this should work:

(\[([a-zA-Z]+)=([a-zA-Z0-9]+)((?:@[a-zA-Z]+:[a-zA-Z]+)*)\])


To get the "tail" of options, you could fetch it from $4 with

/(\[([a-zA-Z]+)=([a-zA-Z0-9]+)((@[a-zA-Z]+:[a-zA-Z]+)*)\])/

and then split on at-signs.

For example:

#! /usr/bin/ruby

str = "[Image=4b5da003ee133e8368000002@size:small@media:true]"
if /(\[([a-zA-Z]+)=([a-zA-Z0-9]+)((@[a-zA-Z]+:[a-zA-Z]+)*)\])/.match(str)
  print $1, "\n",
        $2, "\n",
        $3, "\n",
        $4, "\n";

  $4[1..-1].split(/@/).each do |s|
    print s, "\n";
  end
end

Output:

[Image=4b5da003ee133e8368000002@size:small@media:true]
Image
4b5da003ee133e8368000002
@size:small@media:true
size:small
media:true


(\[([a-zA-Z]+)=([a-zA-Z0-9]+)(?:@([a-zA-Z]+:[a-zA-Z]+))*\])

will give you media:true. Note that media:true is overwriting the previous size:small match. I don't think there's a way to get exactly what you want in a single match call.


It looks like the regex only keeps the last match. I think to get the list of matches will require a different approach.

"a=b@c:d@e:f".split(/=|@/)

which creates a list:

["a", "b", "c:d", "e:f"]

which is close to what you want...


Although it can be tricky to do it purely within a regexp, it's not too hard to split it out as a two-step operation:

while (line = DATA.gets)
  line.chomp!

  if (m = line.match(/\[([a-zA-Z]+)=([a-zA-Z0-9]+)((?:@[a-zA-Z]+:[a-zA-Z]+)*)\]/))
    (type, hash, options) = m.to_a[1, 3]
    options = options.split(/@/).reject { |s| s.empty? }
    puts [ type, hash, options.join(',') ].join(' / ')
  end
end

__END__
[Image=4b5da003ee133e8368000002]
[Video=679hfpam9v56dh800khfdd32]
[Image=4b5da003ee133e8368000002@size:small]
[Image=4b5da003ee133e8368000002@size:small@media:true]
[Image=4b5da003ee133e8368000002@size:small@media:true@foo:bar]

This produces the output:

Image / 4b5da003ee133e8368000002 / 
Video / 679hfpam9v56dh800khfdd32 / 
Image / 4b5da003ee133e8368000002 / size:small
Image / 4b5da003ee133e8368000002 / size:small,media:true
Image / 4b5da003ee133e8368000002 / size:small,media:true,foo:bar
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜