开发者

Regular Expression to match subdomains of particular domain, with no path

I want a regex to find the following types of strings:

where

abc -> abc always remains abc

anything -> it could be any string

tld -> it could be any tld (top-level-domain) like .com .net .co.in .co.uk etc.

Note: The url must not contain any other thing at the end, means http://anything.abc.tld/xyz is not acceptable.

Note: As the list of tlds is a long list and still there are chances that you forget to include some tlds, I don't want to write each tld in the regex to check for. Instead I would like to have a regex that checks for the following (for tld):

  • After abc, there is a period (.)

  • After the period(.) there is atleast one character


There are quite a lot TLDs and their number is growing. You could use

^http://[\w.-]+\.abc\.(com|net|co\.in|....  )/?$

But this would have to be maintained on a regular basis. Just using [^/]* for the TLD might be easier. This would look like

^http://[\w.-]+\.abc\.[^/]+/?$


^http://[a-zA-Z0-9.-]+\.abc\.[a-zA-Z.]+/?$

Might differ a little depending on which regex dialect are you using.


^(http://)(.+)(abc)+.([^/]+)$

All grouped for you too :)

I highly suggest using the RegEx tool by gskinner.com

alt text http://img683.imageshack.us/img683/3760/regexmatch.jpg


First identify which kind of data you will be dealing with: are these line-based records, or XML (for example, they could be anything else)? That will tell you how you need to anchor the matches. If you can anchor them with ^, then that makes it easier. Do you need a variable number of strings between "http://" and the top-level domain? If you don't want to write out the top-level domain, then use

\.[a-z]\{2,3\}

The exact form will depend on whether you are using Basic Regular Expressions (sed, grep) or Extended Regular Expressions (awk), or Perl Compatible Regular Expressions.

What have you tried already? How have you tested it?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜