Regular Expression to Remove Subdomain from Root Domain in List - Notepad++ or Gvim

2023-03-14 18:26 问答作者：

I have a list of URLs stored in a .txt file (I'm using Windows 7).

The format of the URLs is this:开发者_JS百科

somesite1.com
somesite2.com
somesite3.com
sub1.somesite3.com
sub2.somesite3.com
sub3.somesite3.com
sub1.somesite3.net
sub1.somesite1.org

In notepad++, there is an option to use "find-replace with regular expressions", and I'm fairly sure that gvim allows the user of regular expressions (although I'm not entirely sure how to use them in Gvim).

Anyway, I don't know what to put in the find & replace boxes so it can go through the contents of the file and leave me with only the root domains. If done properly, it would turn the above example list into this:

somesite1.com
somesite2.com
somesite3.com
somesite3.com
somesite3.com
somesite3.com
somesite3.net
somesite1.org

Can somebody help me out?

A couple of ways of doing it for Vim (the trailing slashes are optional, too):

:%s/^.\+\.\ze[^.]\+\.[^.]\+$//
:%s/^.\+\.\([^.]\+\.[^.]\+\)$/\1/

See also :help /\ze etc. \ze and \zs are Vim-specific and very useful. There are also look-ahead and look-behind assertions which can be useful, in Vim and PCRE.

I believe Notepad++ uses PCRE; find ^.+\.([^.]+\.[^.]+)$ and replace it with \1 should work (but I don't use Notepad++).

Be aware this won't work well with country code top level domains which use third-level registration - example.com.au would be turned into com.au. And then there are some countries which use second- or third-level registration under certain rules... if you care about those cases, you'll need more rules and a full parser would be neater than a regular expression (though as always it would be possible with regular expressions).

Replace ^[^.]*\.(?=\w+\.\w+$) with <blank>

Deciphered, this means:

^ = start of line
[^.]* = any number of chars that are not a dot
\. = a dot
(?=[^.]+\.[^.]+$) = there must be exactly one word, one dot then one word from here to the end

EDITED - Added look ahead for another dot

EDITED AGAIN - Changed look ahead for exactly one dot between words

Replace whole of line to Last word and previous word of one.

%s/^.*\.\(\w\+\.\w\+\)$/\1/g

Note that vim require \,(,) for + like \+

UPDATE:

%s/^.*\.\([0-9a-z\-]\+\.[0-9a-z\-]\+\)$/\1/g

is better maybe.

继续阅读：notepad++regex vim

Regular Expression to Remove Subdomain from Root Domain in List - Notepad++ or Gvim

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？