RegEx: Matching Pattern within Pattern - I think I need to use Positive Lookbehinds?
I'm trying to use RegEx to find a pattern within a pattern. Specifically what I want to do is capture a URL into a reference and search within that for everything that comes after the last = sign and capture that as well.
So given this string
<a href="http://my.domain.com/?s_cid=EM&s_ev9=CMC21892&s_ev10=EM_CMC21892_LC_stuff" style="color: #365EBF:">stuff</a>
I开发者_如何学编程 would initially find
href="http://my.domain.com/?s_cid=EM&s_ev9=CMC21892&s_ev10=EM_CMC21892_LC_stuff"
Using this RegEx: href="(https?[^"]*)"
From there I could parse the actual string (when looking at the captured group) I'm looking for EM_CMC21892_LC_stuff
with this: =[^"=]*$
I am having no success though when I try to combine the two to accomplish it in one RegEx.
Any thoughts?
He's right, using regexes to parse HTML is just asking for trouble.
That said, try href="http[^"]+=([^"]+?)"
.
I agree with Mark Byer's comment about using existing html/url parsing functions instead of regex (though you didn't specify which language you are using so we can't really help on that...)
However, if you insist on doing it the regex way, here is a pattern:
/href="([^"]*=([^"]*))"/
edit to add: here is what the result would looks like, wasn't sure if you wanted to still capture the full url or just that last param value, but this pattern captures both:
Array
(
[0] => Array
(
[0] => href="http://my.domain.com/?s_cid=EM&s_ev9=CMC21892&s_ev10=EM_CMC21892_LC_stuff"
)
[1] => Array
(
[0] => http://my.domain.com/?s_cid=EM&s_ev9=CMC21892&s_ev10=EM_CMC21892_LC_stuff
)
[2] => Array
(
[0] => EM_CMC21892_LC_stuff
)
)
精彩评论