开发者

Regex - repeating matches

Another regexp question

I have input text like this:

test start first end start second end start third end

and I need matches like this:

test first
test second 
test third

I've tried something like that:

start(.*?)end

but how to add "test"?

Thanks for any suggestion

Lennyd

(edited - the开发者_如何学Gore was mistake in input text)


There is no chance to use another programming language, it should be just regexp. I need this for parsing web page with (part) syntax like this:

Season 1
    Episode 1
    Episode 2
    Episode 3
Season 2
    Episode 1
    Episode 2
...etc

and with this regexp i need output like


<episodeslist>>
  <episode season="1" episode="1">
  <episode season="1" episode="2">
.. etc

.. deatiled - it is for xmbc.org media scraper


Am I the only one who didnt understand what lennyd wants in the first example?

Now for this one

input

Season 1
  Episode 1
  Episode 2
  Episode 3

output

<episodeslist>
  <episode season="1" episode="1">
  <episode season="1" episode="2">

assuming you're using a regex multiline tool

catch
/Season[^0-9]*([0-9]+)[^\n]*[\s]+Episode[^0-9]*([0-9]+)\n/gs
add as many [\s]+Episode[^0-9]*([0-9]+)\n as needed

return

<list>
<episode season=$1 episode=$2>
<episode season=$1 episode=$3>
<episode season=$1 episode=$4>
<episode season=$1 episode=$5>

just not sure about [^\n] , use [^E] if the input in really that clean

If the number of episodes varies between 24 o 26, just run 3 regex

If you want something more flexible, you'll need some powerfull app like GREP on linux or some clones with UI for other OS, that can do "regex inside regex"

If its some scripted language running regex functions, you could easily wrap the following in a loop, untill input no longer matches anything
{

1 - Match only `Season[^0-9]*([0-9]+)`, strip if off the input, store the season # in a variable,  
2 - Match a block of episodes `([\s]+Episode[^0-9]*[0-9]+\n)+`  
3 - Then inside that block match single lines `[\s]+Episode[^0-9]*[0-9]+`  
4 - Using the season variable, output the appropriate XML  

}


A very primitive regex will be:

echo "test start first end start second end test third end" |
     perl -ne 'print "$1 -> $2\n" while (/(\w+).*?(\w+) end/g);'
test -> first
start -> second
test -> third

but I agree with Alan Moore, that you sample output is a bit wired.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜