开发者

Python 2.6+ str.format() and regular expressions

Using str.format() is the new standard for formatting strings in Python 2.6, and Python 3. I've run into an issue when using str.format() with regular ex开发者_开发技巧pressions.

I've written a regular expression to return all domains that are a single level below a specified domain or any domains that are 2 levels below the domain specified, if the 2nd level below is www...

Assuming the specified domain is delivery.com, my regex should return a.delivery.com, b.delivery.com, www.c.delivery.com ... but it should not return x.a.delivery.com.

import re

str1 = "www.pizza.delivery.com"
str2 = "w.pizza.delivery.com"
str3 = "pizza.delivery.com"

if (re.match('^(w{3}\.)?([0-9A-Za-z-]+\.){1}delivery.com$', str1): print 'String 1 matches!'
if (re.match('^(w{3}\.)?([0-9A-Za-z-]+\.){1}delivery.com$', str2): print 'String 2 matches!'
if (re.match('^(w{3}\.)?([0-9A-Za-z-]+\.){1}delivery.com$', str3): print 'String 3 matches!'

Running this should give the result:

String 1 matches!
String 3 matches!

Now, the problem is when I try to replace delivery.com dynamically using str.format...

if (re.match('^(w{3}\.)?([0-9A-Za-z-]+\.){1}{domainName}$'.format(domainName = 'delivery.com'), str1): print 'String 1 matches!'

This seems to fail, because the str.format() expects the {3} and {1} to be parameters to the function. (I'm assuming)

I could concatenate the string using + operator

'^(w{3}\.)?([0-9A-Za-z-]+\.){1}' + domainName + '$'

The question comes down to, is it possible to use str.format() when the string (usually regex) has "{n}" within it?


you first would need to format string and then use regex. It really doesn't worth it to put everything into a single line. Escaping is done by doubling the curly braces:

>>> pat= '^(w{{3}}\.)?([0-9A-Za-z-]+\.){{1}}{domainName}$'.format(domainName = 'delivery.com')
>>> pat
'^(w{3}\\.)?([0-9A-Za-z-]+\\.){1}delivery.com$'
>>> re.match(pat, str1)

Also, re.match is matching at the beginning of the string, you don't have to put ^ if you use re.match, you need ^ if you're using re.search, however.

Please note, that {1} in regex is rather redundant.


Per the documentation, if you need a literal { or } to survive the formatting opertation, use {{ and }} in the original string.

'^(w{{3}}\.)?([0-9A-Za-z-]+\.){{1}}{domainName}$'.format(domainName = 'delivery.com')
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜