Regular Expression to match subdomains of particular domain, with no path
I want a regex to find the following types of strings:
- ht开发者_运维技巧tp://anything.abc.tld
- http://anything.abc.tld/
where
abc -> abc always remains abc
anything -> it could be any string
tld -> it could be any tld (top-level-domain) like .com
.net
.co.in
.co.uk
etc.
Note: The url must not contain any other thing at the end, means http://anything.abc.tld/xyz is not acceptable.
Note: As the list of tlds is a long list and still there are chances that you forget to include some tlds, I don't want to write each tld in the regex to check for. Instead I would like to have a regex that checks for the following (for tld):
After abc, there is a period (.)
After the period(.) there is atleast one character
There are quite a lot TLDs and their number is growing. You could use
^http://[\w.-]+\.abc\.(com|net|co\.in|.... )/?$
But this would have to be maintained on a regular basis.
Just using [^/]*
for the TLD might be easier. This would look like
^http://[\w.-]+\.abc\.[^/]+/?$
^http://[a-zA-Z0-9.-]+\.abc\.[a-zA-Z.]+/?$
Might differ a little depending on which regex dialect are you using.
^(http://)(.+)(abc)+.([^/]+)$
All grouped for you too :)
I highly suggest using the RegEx tool by gskinner.com
alt text http://img683.imageshack.us/img683/3760/regexmatch.jpg
First identify which kind of data you will be dealing with: are these line-based records, or XML (for example, they could be anything else)? That will tell you how you need to anchor the matches. If you can anchor them with ^, then that makes it easier. Do you need a variable number of strings between "http://" and the top-level domain? If you don't want to write out the top-level domain, then use
\.[a-z]\{2,3\}
The exact form will depend on whether you are using Basic Regular Expressions (sed, grep) or Extended Regular Expressions (awk), or Perl Compatible Regular Expressions.
What have you tried already? How have you tested it?
精彩评论