开发者

Using regex to find data in Python

I am new to python, and developing in general. L开发者_C百科et me give an example what I am trying to do.

I want to find the text name="username" type="hidden" value="blah" and I only want to pull the "blah"

How would I begin to go about that?


You can use regex groups to pick out relevant parts of a match.

#!/usr/bin/env python

s = """ Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim
veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea
commodo consequat. 
name="username" type="hidden" value="blah" 
Duis aute irure dolor in reprehenderit in voluptate velit
esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat
non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
"""

import re

pattern = re.compile(r'name="username"\stype="hidden"\svalue="([^"]*)"')
for match in pattern.finditer(s):
    print match.group(1)
    # => blah


Something like this maybe:

string = 'name="username" type="hidden" value="blah"'
#get the text between the quotes that is lead by an equal sign and a non whitespace character.
regex = re.compile('\S="([^"]+)"')
print regex.findall(string)

These are great resources for regex in python:

  • http://www.pythonregex.com/
  • http://docs.python.org/library/re.html


If you want to get all of the values into a dictionary, you can use this function:

import re

def get_pair_map(s):
    map = {}
    pair_re = re.compile('(\w+)="(\w+)"')
    map.update(pair_re.findall(s))
    return map


The others have given excellent examples of using the re module in Python's standard library, but you may also consider using Python's generic string processing. It avoids import's and is usually considered more "Pythonic".

Example line:

name="username" type="hidden" value="blah"

# given a file of the example line
for line in open('my_file.txt'):
    # split on the spaces in the line
    for item in line.split():
            # check if this is the 'value' attribute you need
            if 'value' in item:
                print item.split('"')[1]
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜