开发者

how to retrieve field value (values are itself string)in given line using regular expression in python

I am newbie to python,i am facing below issue please he开发者_StackOverflowlp me:

I read line by line from one file, each line having field name and its value, now i have to find out field name and filevalue in the line.example of line is:

line=" A= 4 | B='567' |c=4|D='aaa' "

Since some field values are itself a string so I am unable to create regex to retrieve field name and filed value.

Please let me know regex for above example. the output should be

A=4 

B='567'

c=4

D='aaa'


The simplest solution I can think of is converting each line into a dictionary. I assume that you don't have any quote marks or | marks in your strings (see my comments on the question).

result={}                      # Initialize a dictionary
for line in open('input.txt'): # Read file line by line in a memory-efficient way
    # Split line to pairs using '|', split each pair using '='
    pairs = [pair.split('=') for pair in line.split('|')]
    for pair in pairs:
        key, value = pair[0].strip(), pair[1].strip()
        try:                     # Try an int conversion
            value=int(value)
        except:                  # If fails, strip quotes
            value=value.strip("'").strip('"')
        result[key]=value        # Add current item to the results dictionary

which, for the following input:

A= 4 | B='567' |c=4|D='aaa' 
E= 4 | F='567' |G=4|D='aaa' 

Would give:

{'A': 4, 'c': 4, 'B': '567', 'E': 4, 'D': 'aaa', 'G': 4, 'F': '567'}

Notes:

  • If you consider '567' to be a number, you can strip the " and ' before trying to convert it to integer.
  • If you need to take floats into account, you can try value=float(value). Remeber to do it after the int convertion attempt, because every int is also a float.


try this one:

import re

line = " A= 4 | B='567' |c=4|D='aaa' "
re.search( '(?P<field1>.*)=(?P<value1>.*)\|(?P<field2>.*)=(?P<value2>.*)\|(?P<field3>.*)=(?P<value3>.*)\|(?P<field4>.*)=(?P<value4>.*)', line ).groups()

output:

(' A', ' 4 ', ' B', "'567' ", 'c', '4', 'D', "'aaa' ")

you can also try using \S* instead of .* if your fields and values do not contain whitespaces. this will eliminate the whitespaces from output:

re.search( '(?P<field1>\S*)\s*=\s*(?P<value1>\S*)\s*\|\s*(?P<field2>\S*)\s*=\s*(?P<value2>\S*)\s*\|\s*(?P<field3>\S*)\s*=\s*(?P<value3>\S*)\s*\|\s*(?P<field4>\S*)\s*=\s*(?P<value4>\S*)', line ).groupdict()

output:

{'field1': 'A',
 'field2': 'B',
 'field3': 'c',
 'field4': 'D',
 'value1': '4',
 'value2': "'567'",
 'value3': '4',
 'value4': "'aaa'"
}

this will create related groups:

[ re.search( '\s*([^=]+?)\s*=\s*(\S+)', group ).groups( ) for group in re.findall( '([^=|]*\s*=\s*[^|]*)', line ) ]

output:

[('A', '4'), ('B', "'567'"), ('c', '4'), ('D', "'aaa'")]

does it help?


Assuming you don't have nasty things like nested quotes or unmatched quotes you can do it all with split and strip:

>>> line = " A= 4 | B='567' |c=4|D='aaa' "
>>> values = dict((x.strip(" '"), y.strip(" '")) for x,y in (entry.split('=') for entry in line.split('|')))
>>> values
{'A': '4', 'c': '4', 'B': '567', 'D': 'aaa'}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜