开发者

python regular expresssion for a string

consider this string

开发者_StackOverflow中文版
prison break: proof of innocence (2006) {abduction (#1.10)}

i just want to know whether there is (# floating point value )} in the string or not

i tried few regular expressions like

re.search('\(\#+\f+\)\}',xyz) 

and

re.search('\(\#+(\d\.\d)+\)\}',xyz)

nothing worked though...can someone suggest me something here


Try r'\(#\d+\.\d+\)\}'

The (, ), ., and } are all special metacharacters, that's why they're preceded by \, so they're matched literally instead.

You also need to apply the + repetition at the right element. Here it's attached to the \d -- the shorthand for digit character class -- to mean that only the digits can appear one-or-more times.

The use of r'raw string literals' makes it easier to work with regex patterns because you don't have to escape backslashes excessively.

See also

  • What exactly do u and r string flags in Python do, and what are raw string literals?

Variations

For instructional purposes, let's consider a few variations. This will show a few basic features of regex. Let's first consider one of the attempted patterns:

\(\#+(\d\.\d)+\)\}

Let's space out the parts for readability:

\( \#+ ( \d \. \d )+ \) \}
       \__________/
         this is one group, repeated with +

So this pattern matches:

  • A literal (, followed by one-or-more #
  • Followed by one-or-more of:
    • A digit, a literal dot, and a digit
  • Followed by a literal )}

Thus, the pattern will match e.g. (###1.23.45.6)} (as seen on rubular.com). Obviously this is not the pattern we want.

Now let's try to modify the solution pattern and say that perhaps we also want to allow just a sequence of digits, without the subsequent period and following digits. We can do this by grouping that part (…), and making it optional with ?.

BEFORE
\(#\d+\.\d+\)\}
      \___/
      let's make this optional! (…)?

AFTER
\(#\d+(\.\d+)?\)\}

Now the pattern matches e.g. (#1.23)} as well as e.g. (#666)} (as seen on rubular.com).

References

  • regular-expressions.info - Optional, Brackets for Grouping


"Escape everything" and use raw-literal syntax for safety:

>>> s='prison break: proof of innocence (2006) {abduction (#1.10)}'
>>> re.search(r'\(\#\d+\.\d+\)\}', s)
<_sre.SRE_Match object at 0xec950>
>>> _.group()
'(#1.10)}'
>>> 

This assumes that by "floating point value" you mean "one or more digits, a dot, one or more digits", and is not tolerant of other floating point syntax variations, multiple hashes (which you appear from your RE patterns to want to support but don't mention in your Q's text), arbitrary whitespace among the relevant parts (again, unclear from your Q whether you need it), ... -- some issues can be adjusted pretty easily, others "not so much" (it's particularly hard to guess what gamut of FP syntax variations you want to support, for example).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜