regex : how to eliminiate urls ending with .dtd

2022-12-25 10:48 问答作者：

This is JavaScript regex.

regex = /(http:\/\/[^\s]*)/g;

text = "I have http://hibernate.sourceforge.net/hibernate-mapping-3.0.dtd and I like http://google.com a lot";

matches = text.match(regex);

console.lo开发者_StackOverflow社区g(matches);

I get both the urls in the result. However I want to eliminate all the urls ending with .dtd . How do I do that?

Note that I am saying ending with .dtd should be removed. It means a url like http://a.dtd.google.com should pass .

The nicest way to do it is to use a negative lookbehind (in languages that support them):

/(?>http:\/\/[^\s]*)(?<!\.dtd)/g

The ?> in the first bracket makes it an atomic grouping which stops the regex engine backtracking - so it'll match the full URL as it does now, and if/when the next part fails it won't try going back and matching less.

The (<!\.dtd) is a negative lookbehind, which only matches if \.dtd doesn't match ending at that position (i.e., the URL doesn't end in .dtd).

For languages that don't (such as JavaScript), you can do a negative lookahead instead, which is a bit more ugly and is generally less efficient:

/(http:\/\/(?![^\s]*\.dtd\b)[^\s]*)/g

Will match http://, then scan ahead to make sure it doesn't end in .dtd, then backtrack and scan forward again to get the actual match.

As always, http://www.regular-expressions.info/ is a good reference for more information

继续阅读：regex

regex : how to eliminiate urls ending with .dtd

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？