RegEx - character not before match
I understand the concepts of RegEx, but this is more or less the first time I've actually been trying to write some myself.
As a part of a project, I'm attempting to parse out strings which match to a certain domain (actually an array of domains, but let's keep it simple).
At first I started out with this:
url.match('www.example.com')
But I noticed I was also getting input like this:
http://www.someothersite.com/page?ref=http://www.example.com
These rows will of course match for www.example.com
but I wish to exclude them. So I was thinking along these lines: Only match rows that contain www.example.com
, but not after a ?
character. This is what I came up with:
var reg = new RegExp("[^\\?]*" + url + "(\\.*)", "gi");
This does however not seem to work, any suggestions would be greatly appreciated as I fear I've used what little knowledge I yet possess in the matter.
Edit: Some clarifications.
- The input is logged GET requests. From these I wish to filter out only a few domains. These will have/should handle 0-1 arbitrary subdomains (
example.com
,www.example.org
,www.somethirdsite.com
andweb.example.net
should all be valid), these will be stored in a variable. - I specifically found a request as mentioned above, but I would like to also be able to handle
http://www.someothersite.com/page?ref=https://www.example.com
andhttp://www.someothersite.com/page?ref=www.example.com
i.e., if my needle is开发者_Python百科 not part of the request domain, but part of the request data, I do not want the match.
Edit: here is the modified regex for arbitrary domain:
RegExp("(^|\\s)(https?://)?(\\w+\\.)?" + url, "gi");
The idea here is that you're matching only url preceded by some white spaces character, which makes it impossible to be inside the query.
精彩评论