开发者

Extracting data from a text file to use in a python script?

Basically, I have a file like this:

Url/Host:   www.example.com
Login:     user
Password:   password
Data_I_Dont_Need:    something_else

How can I use RegEx to separate the details to place them into variables?

Sorry if this is a terrible question, I can just never grasp RegEx. So another question would be, can you provide the开发者_运维技巧 RegEx, but kind of explain what each part of it is for?


You should put the entries in a dictionary, not in so many separate variables -- clearly, the keys you're using need NOT be acceptable as variable names (that slash in 'Url/Host' would be a killer!-), but they'll be just fine as string keys into a dictionary.

import re

there = re.compile(r'''(?x)      # verbose flag: allows comments & whitespace
                       ^         # anchor to the start
                       ([^:]+)   # group with 1+ non-colons, the key
                       :\s*      # colon, then arbitrary whitespace
                       (.*)      # group everything that follows
                       $         # anchor to the end
                    ''')

and then

 configdict = {}
 for aline in open('thefile.txt'):
   mo = there.match(aline)
   if not mo:
     print("Skipping invalid line %r" % aline)
     continue
   k, v = mo.groups()
   configdict[k] = v

the possibility of making RE patterns "verbose" (by starting them with (?x) or using re.VERBOSE as the second argument to re.compile) is very useful to allow you to clarify your REs with comments and nicely-aligning whitespace. I think it's sadly underused;-).


For a file as simple as this you don't really need regular expressions. String functions are probably easier to understand. This code:

def parse(data):
    parsed = {}    
    for line in data.split('\n'):
        if not line: continue # Blank line
        pair = line.split(':')
        parsed[pair[0].strip()] = pair[1].strip()
    return parsed

if __name__ == '__main__':
    test = """Url/Host:   www.example.com
    Login:     user
    Password:   password
"""
    print parse(test)

Will do the job, and results in:

{'Login': 'user', 'Password': 'password', 'Url/Host': 'www.example.com'}


Well, if you don't know about regex, simply change you file like this:

Host = www.example.com
Login = uer
Password = password

And use ConfigParser python module http://docs.python.org/library/configparser.html


EDIT: Better Solution

for line in input: 
    key, val = re.search('(.*?):\s*(.*)', line).groups()


ConfigParser module supports ':' delimiter.

import ConfigParser
from cStringIO import StringIO

class Parser(ConfigParser.RawConfigParser):
    def _read(self, fp, fpname):
        data = StringIO("[data]\n"+fp.read()) 
        return ConfigParser.RawConfigParser._read(self, data, fpname)

p = Parser()
p.read("file.txt")
print dict(p.items("data"))

Output:

{'login': 'user', 'password': 'password', 'url/host': 'www.example.com'}

Though a regex or manual parsing might be more appropriate in your case.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜