Extracting data from a text file to use in a python script?
Basically, I have a file like this:
Url/Host: www.example.com
Login: user
Password: password
Data_I_Dont_Need: something_else
How can I use RegEx to separate the details to place them into variables?
Sorry if this is a terrible question, I can just never grasp RegEx. So another question would be, can you provide the开发者_运维技巧 RegEx, but kind of explain what each part of it is for?
You should put the entries in a dictionary, not in so many separate variables -- clearly, the keys you're using need NOT be acceptable as variable names (that slash in 'Url/Host' would be a killer!-), but they'll be just fine as string keys into a dictionary.
import re
there = re.compile(r'''(?x) # verbose flag: allows comments & whitespace
^ # anchor to the start
([^:]+) # group with 1+ non-colons, the key
:\s* # colon, then arbitrary whitespace
(.*) # group everything that follows
$ # anchor to the end
''')
and then
configdict = {}
for aline in open('thefile.txt'):
mo = there.match(aline)
if not mo:
print("Skipping invalid line %r" % aline)
continue
k, v = mo.groups()
configdict[k] = v
the possibility of making RE patterns "verbose" (by starting them with (?x)
or using re.VERBOSE
as the second argument to re.compile
) is very useful to allow you to clarify your REs with comments and nicely-aligning whitespace. I think it's sadly underused;-).
For a file as simple as this you don't really need regular expressions. String functions are probably easier to understand. This code:
def parse(data):
parsed = {}
for line in data.split('\n'):
if not line: continue # Blank line
pair = line.split(':')
parsed[pair[0].strip()] = pair[1].strip()
return parsed
if __name__ == '__main__':
test = """Url/Host: www.example.com
Login: user
Password: password
"""
print parse(test)
Will do the job, and results in:
{'Login': 'user', 'Password': 'password', 'Url/Host': 'www.example.com'}
Well, if you don't know about regex, simply change you file like this:
Host = www.example.com
Login = uer
Password = password
And use ConfigParser python module http://docs.python.org/library/configparser.html
EDIT: Better Solution
for line in input:
key, val = re.search('(.*?):\s*(.*)', line).groups()
ConfigParser module supports ':'
delimiter.
import ConfigParser
from cStringIO import StringIO
class Parser(ConfigParser.RawConfigParser):
def _read(self, fp, fpname):
data = StringIO("[data]\n"+fp.read())
return ConfigParser.RawConfigParser._read(self, data, fpname)
p = Parser()
p.read("file.txt")
print dict(p.items("data"))
Output:
{'login': 'user', 'password': 'password', 'url/host': 'www.example.com'}
Though a regex or manual parsing might be more appropriate in your case.
精彩评论