How to conditional regex
I want a regex that does one thing if it has 3 instances of .
in the string, and something else if it has more than 3 instances.
for example
aaa.bbb.ccc.ddd // one part of the regex
aaa.bbb.ccc.ddd.eee // the second part of the regex
how do I achieve this in either js
or c#
?
something like
?(\.){4} then THIS else THAT
within the regex...
Update
Ok basically what I'm doing is this:
I want to switch, for any given System.Uri
, to another subdomain in an extension method.
The problem I came across is that my domains are usually of the form http://subdomain.domain.TLD.TLD/more/url
, but sometimes, it can be just http://domain.TLD.TLD/more/url
(which just points to www
)
So this is what I came up with:
public static class UriExtensions
{
private const string TopLevelDomainRegex = @"(\.[^\.]{2,3}|\.[^\.]{2,3}\.[^\.]{2,3})$";
private const string UnspecifiedSubdomainRegex = @"^((http[s]?|ftp):\/\/)(()([^:\/\s]+))(:([^\/]*))?((?:\/)?|(?:\/)(((\w+)*\/)([\w\-\.]+[^#?\s]+)(\?([^#]*))?(#(.*))?))?$";
private const string SpecifiedSubdomainRegex = @"^((http[s]?|ftp):\/\/)(([^.:\/\s]*)[\.]([^:\/\s]+))(:([^\/]*))?((?:\/)?|(?:\/)(((\w+)*\/)([\w\-\.]+[^#?\s]+)(\?([^#]*))?(#(.*))?))?$";
public static string AbsolutePathToSubdomain(this Uri uri, string subdomain)
{
subdomain = subdomain == "www" ? string.Em开发者_Python百科pty : string.Concat(subdomain, ".");
var replacement = "$1{0}$5$6".FormatWith(subdomain);
var spec = Regex.Replace(uri.Authority, TopLevelDomainRegex, string.Empty).Distinct().Count(c => c == '.') != 0;
return Regex.Replace(uri.AbsoluteUri, spec ? SpecifiedSubdomainRegex : UnspecifiedSubdomainRegex, replacement);
}
}
Basically with this code I take the System.Uri
and:
- Take just the
subdomain.domain.TLD.TLD
using theAuthority
property. - Match it against "pseudo TLDs" (I'm never going to have a registered domain with 2-3 letters that would break the regex, which basically checks for anything ending in
.XX[X]
or.XX[X].XX[X]
) - I strip the TLDs, and end up with either
domain
orsubdomain.domain
- If the resulting data has zero dots, I use the
UnspecifiedSubdomainRegex
, because I couldn't figure out how to useSpecifiedSubdomainRegex
to tell it that if it has no dots on that part, it should returnstring.Empty
My question then is if there is a way to merge these three regexes into something simpler
PD: Forget about javascript, I was just using it to test the regex on the fly
You can do this using the (?(?=condition)then|else)
construct. However, this is not available in JavaScript (but it is available in .NET, Perl and PCRE):
^(?(?=(?:[^.]*\.){3}[^.]*$)aaa|eee)
for example, will check if a string contains exactly three dots, and if it does, it tries to match aaa
at the start of the string; otherwise it tries to match eee
. So it will match the first three letters of
aaa.bbb.ccc.ddd
eee.ddd.ccc.bbb.aaa
eee
but fail on
aaa.bbb.ccc
eee.ddd.ccc.bbb
aaa.bbb.ccc.ddd.eee
Explanation:
^ # Start of string
(? # Conditional: If the following lookahead succeeds:
(?= # Positive lookahead - can we match...
(?: # the following group, consisting of
[^.]*\. # 0+ non-dots and 1 dot
){3} # 3 times
[^.]* # followed only by non-dots...
$ # until end-of-string?
) # End of lookahead
aaa # Then try to match aaa
| # else...
eee # try to match eee
) # End of conditional
^(?:[^.]*\.[^.]*){3}$
the regex above will match the string that has exactly 3 dots --- http://rubular.com/r/Tsaemvz1Yi.
^(?:[^.]*\.[^.]*){4,}$
and this one - for the string that has 4 dots or more --- http://rubular.com/r/IJDeQWVhEB
In Python (excuse me; but regexes are without language frontier)
import re
regx = re.compile('^([^.]*?\.){3}[^.]*?\.')
for ss in ("aaa.bbb.ccc",
"aaa.bbb.ccc.ddd",
'aaa.bbb.ccc.ddd.eee',
'a.b.c.d.e.f.g.h.i...'):
if regx.search(ss):
print ss + ' has at least 4 dots in it'
else:
print ss + ' has a maximum of 3 dots in it'
result
aaa.bbb.ccc has a maximum of 3 dots in it
aaa.bbb.ccc.ddd has a maximum of 3 dots in it
aaa.bbb.ccc.ddd.eee has at least 4 dots in it
a.b.c.d.e.f.g.h.i... has at least 4 dots in it
This regex' pattern doesn't require that the entire string be analysed (no symbol $ in it). It's better on long strings.
You don't need Regex for this (as for many other common tasks).
public static string AbsolutePathToSubdomain(this Uri uri, string subdomain)
{
// Pre-process the new subdomain
if (subdomain == null || subdomain.Equals("www", StringComparison.CurrentCultureIgnoreCase))
subdomain = string.Empty;
// Count number of TLDs (assume at least one)
List<string> parts = uri.Host.Split('.').ToList();
int tldCount = 1;
if (parts.Count >= 2 && parts[parts.Count - 2].Length <= 3)
{
tldCount++;
}
// Drop all subdomains
if (parts.Count - tldCount > 1)
parts.RemoveRange(0, parts.Count - tldCount - 1);
// Add new subdomain, if applicable
if (subdomain != string.Empty)
parts.Insert(0, subdomain);
// Construct the new URI
UriBuilder builder = new UriBuilder(uri);
builder.Host = string.Join(".", parts.ToArray());
builder.Path = "/";
builder.Query = "";
builder.Fragment = "";
return builder.Uri.ToString();
}
精彩评论