Regex to return all characters until "/" searching backwards

2023-03-14 22:15 问答作者：

I'm having trouble with this regex and I think I'm almost there.

m =re.findall('[a-z]{6}\.[a-z]{3}\.[a-z]{2} (?=\" target)', 'http://domain.com.uy " target')

This gives me the "exact" output that I want. that is domain.com.uy but obviously this is just an example since [a-z]{6} just matches the previous 6 characters and this is not what I want.

I want it to return domain.com.uy so basically the instruction would be match a开发者_如何学Cny character until "/" is encountered (backwards).

Edit:

m =re.findall('\w+\.[a-z]{3}\.[a-z]{2} (?=\" target)', 'http://domain.com.uy " target')

Is very close to what I want but wont match "_" or "-".

For the sake of completeness I do not need the http://

I hope the question is clear enough, if I left anything open to interpretation please ask for any clarification needed!

Thank in advance!

Another option is to use a positive lookbehind such as (?<=//):

>>> re.search(r'(?<=//).+(?= \" target)', 
...           'http://domain.com.uy " target').group(0)
'domain.com.uy'

Note that this will match slashes within the url itself, if that's desired:

>>> re.search(r'(?<=//).+(?= \" target)',
...           'http://example.com/path/to/whatever " target').group(0)
'example.com/path/to/whatever'

If you just wanted the bare domain, without any path or query parameters, you could use r'(?<=//)([^/]+)(/.*)?(?= \" target)' and capture group 1:

>>> re.search(r'(?<=//)([^/]+)(/.*)?(?= \" target)',
...           'http://example.com/path/to/whatever " target').groups()
('example.com', '/path/to/whatever')

try this (maybe you need to escape / in Python):

/([^/]*)$

If regular expressions are not a requirement and you simply wish to extract the FQDN from the URL in Python. Use urlparse and str.split():

>>> from urlparse import urlparse
>>> url = 'http://domain.com.uy " target'
>>> urlparse(url)
ParseResult(scheme='http', netloc='domain.com.uy " target', path='', params='', query='', fragment='')

This has broken up the URL into its component parts. We want netloc:

>>> urlparse(url).netloc
'domain.com.uy " target'

Split on whitespace:

>>> urlparse(url).netloc.split()
['domain.com.uy', '"', 'target']

Just the first part:

>>> urlparse(url).netloc.split()[0]
'domain.com.uy'

It's as simple as this:

[^/]+(?= " target)

But be aware that http://domain.com/folder/site.php will not return the domain. And remember to escape the regex properly in a string.

继续阅读：findall python regex

Regex to return all characters until "/" searching backwards

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？