开发者

RegEx - character not before match

I understand the concepts of RegEx, but this is more or less the first time I've actually been trying to write some myself.

As a part of a project, I'm attempting to parse out strings which match to a certain domain (actually an array of domains, but let's keep it simple).

At first I started out with this:

url.match('www.example.com')

But I noticed I was also getting input like this:

http://www.someothersite.com/page?ref=http://www.example.com

These rows will of course match for www.example.com but I wish to exclude them. So I was thinking along these lines: Only match rows that contain www.example.com, but not after a ? character. This is what I came up with:

var reg = new RegExp("[^\\?]*" + url + "(\\.*)", "gi"); 

This does however not seem to work, any suggestions would be greatly appreciated as I fear I've used what little knowledge I yet possess in the matter.

Edit: Some clarifications.

  • The input is logged GET requests. From these I wish to filter out only a few domains. These will have/should handle 0-1 arbitrary subdomains (example.com, www.example.org, www.somethirdsite.com and web.example.net should all be valid), these will be stored in a variable.
  • I specifically found a request as mentioned above, but I would like to also be able to handle http://www.someothersite.com/page?ref=https://www.example.com and http://www.someothersite.com/page?ref=www.example.com i.e., if my needle is开发者_Python百科 not part of the request domain, but part of the request data, I do not want the match.


Edit: here is the modified regex for arbitrary domain:

RegExp("(^|\\s)(https?://)?(\\w+\\.)?" + url, "gi");

The idea here is that you're matching only url preceded by some white spaces character, which makes it impossible to be inside the query.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜