开发者

Regex Match Constant

I am having some trouble with this regular expression, Can somebody maybe assist me with the regex...

I want to match the following in the source of websites that have this line installed on there pages:

The code should always match this exact match (It is a constant):

<img src="http://www.domain.com/test.asp" width="1" height="1" />



htstring.match(/\<img src\=""http:\/\/www.domain.com\/test.asp"" width=""1"" height=""1"" \/>/ig);

My problem seems to be escaping the " in the regex

Any help would be appreciated!

Thank开发者_开发知识库s


You don't need to escape them.

But you do need to escape the periods(.). With a backslash.


Regex's are useful when you're trying to match variations. For example, if your tag was constant except for the domain in the "src" element or the whitespace. Stefan and Andy are exactly correct, but the (working) regex you now have is still no different than the string literal in my answer above.

So both the regex and the string are equivalent, and both match:

'<img src="http://www.domain.com/test.asp" width="1" height="1" />'.match(/<img src="http:\/\/www\.domain\.com\/test\.asp" width="1" height="1" \/>/)
=> #<MatchData:0x5ebbf90>

vs

'<img src="http://www.domain.com/test.asp" width="1" height="1" />'.match('<img src="http://www.domain.com/test\.asp" width="1" height="1" />')
=> #<MatchData:0x5eb6cac>

If you want to match subtle variations (for example, the whitespace isn't always exactly one space, sometimes it's 1 space, sometimes 2, others 3, etc.) then you need a regex, not a string, but the current regex won't match either because it's just doing an exact match (because it's not using any regex stuff at all - it might as well be a string). Eg, 2 spaces after "img":

'<img  src="http://www.domain.com/test.asp" width="1" height="1" />'.match(/<img src="http:\/\/www\.domain\.com\/test\.asp" width="1" height="1" \/>/)
=> nil

But a regex actually using power of regex with special regex characters will match - note the "\s+" after "img", which will match 1..n whitespace characters:

 '<img  src="http://www.domain.com/test.asp" width="1" height="1" />'.match(/<img\s+src="http:\/\/www\.domain\.com\/test\.asp" width="1" height="1" \/>/)
=> #<MatchData:0x5e94fbc>

Also, I might not have been explicit enough last time, but it's pretty important that you specify what language you're working in. Like Tim pointed out, regex can vary between lanuages so an answer could be correct but not work for you depending on whether you're both using Ruby or C# or Java or whatever.


Exactly how a regexp behaves depends on which engine your language is using. Not all regexp engines are the same.

That said, it appears that you are escaping what should be the end of the matching regexp :

/>/ig

should probably be

/>/ig

Also, you may not want to use double quotes, e.g. =""htt should be ="htt

There are regular expression testers available on the internet, one being at http://www.regular-expressions.info/javascriptexample.html


If the string is a constant, you don't need to use a regex. I don't see anything in your regex that is "regexy" - eg, there is nothing but the constant string so just using a string would be easiest.

Also, what programming language are you using? From the syntax, I guessed it was Ruby - but that's only a guess, so the syntax below may not work for you.

htstring.match('<img src="http://www.domain.com/test.asp" width="1" height="1" />')


You should only need to escape forward slashes and periods.

myRegex = /<img src="http:\/\/www\.domain\.com\/test\.asp" width="1" height="1" \/>/


If you're using .NET, you can escape the string:

var matchMe = "<img src=\"http://www.domain.com/test.asp\" width=\"1\" height=\"1\" />";
var pattern = Regex.Escape(matchMe);

It doesn't look like you're using .NET though. I don't think you have to escape quotes like that. In fact, in your pattern, the only characters I know you should escape are the period . and forward slash /.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜