Regex - repeating matches
Another regexp question
I have input text like this:
test start first end start second end start third end
and I need matches like this:
test first
test second
test third
I've tried something like that:
start(.*?)end
but how to add "test"?
Thanks for any suggestion
Lennyd
(edited - the开发者_如何学Gore was mistake in input text)
There is no chance to use another programming language, it should be just regexp. I need this for parsing web page with (part) syntax like this:
Season 1 Episode 1 Episode 2 Episode 3 Season 2 Episode 1 Episode 2 ...etc
and with this regexp i need output like
<episodeslist>>
<episode season="1" episode="1">
<episode season="1" episode="2">
.. etc
.. deatiled - it is for xmbc.org media scraper
Am I the only one who didnt understand what lennyd wants in the first example?
Now for this one
input
Season 1
Episode 1
Episode 2
Episode 3
output
<episodeslist>
<episode season="1" episode="1">
<episode season="1" episode="2">
assuming you're using a regex multiline tool
catch
/Season[^0-9]*([0-9]+)[^\n]*[\s]+Episode[^0-9]*([0-9]+)\n/gs
add as many [\s]+Episode[^0-9]*([0-9]+)\n
as needed
return
<list>
<episode season=$1 episode=$2>
<episode season=$1 episode=$3>
<episode season=$1 episode=$4>
<episode season=$1 episode=$5>
just not sure about [^\n] , use [^E] if the input in really that clean
If the number of episodes varies between 24 o 26, just run 3 regex
If you want something more flexible, you'll need some powerfull app like GREP on linux or some clones with UI for other OS, that can do "regex inside regex"
If its some scripted language running regex functions, you could easily wrap the following in a loop, untill input no longer matches anything
{
1 - Match only `Season[^0-9]*([0-9]+)`, strip if off the input, store the season # in a variable,
2 - Match a block of episodes `([\s]+Episode[^0-9]*[0-9]+\n)+`
3 - Then inside that block match single lines `[\s]+Episode[^0-9]*[0-9]+`
4 - Using the season variable, output the appropriate XML
}
A very primitive regex will be:
echo "test start first end start second end test third end" |
perl -ne 'print "$1 -> $2\n" while (/(\w+).*?(\w+) end/g);'
test -> first
start -> second
test -> third
but I agree with Alan Moore, that you sample output is a bit wired.
精彩评论