Python Regex returns me the value with parentheses
I'm trying to run this code:
picture = re.search("#4F9EFF;\"><img src=\"(.+?)\" wid开发者_JAVA技巧th=\"120\" height=\"90\"", data)
and when i do print picture.groups(1)
it returns me the value but with parentheses, why?
Output:
('http://sample.com/img/file.jpg',)
The group is a tuple containing one element. You can access the string (which is the first match) as output[0]
. The important part is the comma after the string.
BUT
DON'T PARSE HTML WITH REGEX
You should use a proper HTML parser. This will save you innumerable headaches in the future, when your regex fails to match or gets too much. Look into BeautifulSoup or lxml.
Notice the comma before the closing parenthesis? This is a tuple (albeit one with just one element in it).
As the documentation for MatchObject.groups()
says:
groups([default])
Return a tuple containing all the subgroups of the match, from 1 up to however many groups are in the pattern. The default argument is used for groups that did not participate in the match; it defaults to None.
As noted by other posters, you want to use MatchObject.group()
instead.
You should be using
picture.group(1)
not groups()
in plural if you're only looking for one specific group. groups()
always returns a tuple, group()
is the one you're looking for.
groups()
returns a tuple of all the groups. You want pictures.group(1)
which returns the string that matched group 1.
As the groups
help says is returns "a tuple containing all the subgroups of the match".
If you want a single group use the group
method.
精彩评论