python regular expresssion for a string
consider this string
开发者_StackOverflow中文版prison break: proof of innocence (2006) {abduction (#1.10)}
i just want to know whether there is (# floating point value )}
in the string or not
i tried few regular expressions like
re.search('\(\#+\f+\)\}',xyz)
and
re.search('\(\#+(\d\.\d)+\)\}',xyz)
nothing worked though...can someone suggest me something here
Try r'\(#\d+\.\d+\)\}'
The (
, )
, .
, and }
are all special metacharacters, that's why they're preceded by \
, so they're matched literally instead.
You also need to apply the +
repetition at the right element. Here it's attached to the \d
-- the shorthand for digit character class -- to mean that only the digits can appear one-or-more times.
The use of r'raw string literals'
makes it easier to work with regex patterns because you don't have to escape backslashes excessively.
See also
- What exactly do
u
andr
string flags in Python do, and what are raw string literals?
Variations
For instructional purposes, let's consider a few variations. This will show a few basic features of regex. Let's first consider one of the attempted patterns:
\(\#+(\d\.\d)+\)\}
Let's space out the parts for readability:
\( \#+ ( \d \. \d )+ \) \}
\__________/
this is one group, repeated with +
So this pattern matches:
- A literal
(
, followed by one-or-more#
- Followed by one-or-more of:
- A digit, a literal dot, and a digit
- Followed by a literal
)}
Thus, the pattern will match e.g. (###1.23.45.6)}
(as seen on rubular.com). Obviously this is not the pattern we want.
Now let's try to modify the solution pattern and say that perhaps we also want to allow just a sequence of digits, without the subsequent period and following digits. We can do this by grouping that part (…)
, and making it optional with ?
.
BEFORE
\(#\d+\.\d+\)\}
\___/
let's make this optional! (…)?
AFTER
\(#\d+(\.\d+)?\)\}
Now the pattern matches e.g. (#1.23)}
as well as e.g. (#666)}
(as seen on rubular.com).
References
- regular-expressions.info - Optional, Brackets for Grouping
"Escape everything" and use raw-literal syntax for safety:
>>> s='prison break: proof of innocence (2006) {abduction (#1.10)}'
>>> re.search(r'\(\#\d+\.\d+\)\}', s)
<_sre.SRE_Match object at 0xec950>
>>> _.group()
'(#1.10)}'
>>>
This assumes that by "floating point value" you mean "one or more digits, a dot, one or more digits", and is not tolerant of other floating point syntax variations, multiple hashes (which you appear from your RE patterns to want to support but don't mention in your Q's text), arbitrary whitespace among the relevant parts (again, unclear from your Q whether you need it), ... -- some issues can be adjusted pretty easily, others "not so much" (it's particularly hard to guess what gamut of FP syntax variations you want to support, for example).
精彩评论