c# regex - matching optionals after a named group
I'm sure this has been quite numerous times but though i've checked all similar questions, i couldn't come up with a solution.
The problem is tha开发者_如何学Got i've an input urls similar to;
- http://www.justin.tv/peacefuljay
- http://www.justin.tv/peacefuljay#/w/778713616/3
- http://de.justin.tv/peacefuljay#/w/778713616/3
I want to match the slug part of it (in above examples, it's peacefuljay).
Regex i've tried so far are;
http://.*\.justin\.tv/(?<Slug>.*)(?:#.)?
http://.*\.justin\.tv/(?<Slug>.*)(?:#.)
But i can't come with a solution. Either it fails in the first url or in others.
Help appreciated.
The easiest way of parsing a Uri is by using the Uri
class:
string justin = "http://www.justin.tv/peacefuljay#/w/778713616/3";
Uri uri = new Uri(justin);
string s1 = uri.LocalPath; // "/peacefuljay"
string s2 = uri.Segments[1]; // "peacefuljay"
If you insisnt on a regex, you can try someting a bit more specific:
Match mate = Regex.Match(str, @"http://(\w+\.)*justin\.tv(?:/(?<Slug>[^#]*))?");
(\w+\.)*
- Ensures you match the domain, not anywhere else in the string (eg, hash or query string).(?:/(?<Slug>[^#]*))?
- Optional group with the string you need.[^#]
limits the characters you expect to see in your slug, so it should eliminate the need of the extra group after it.
As I see it there's no reason to treat to the parts after the "slug".
Therefore you only need to match all characters after the host that aren't "/" or "#".
http://.*\.justin\.tv/(?<Slug>[^/#]+)
http://.*\.justin\.tv/(?<Slug>.*)#*?
or
http://.*\.justin\.tv/(?<Slug>.*)(#|$)
精彩评论