Python: How to extract required information from a string?
I am new to Python. Is there a StringTokenizer in Python? Can I do character by character scanning and copying.
I have the follo开发者_开发知识库wing input string
data = '123:Palo Alto, CA -> 456:Seattle, WA 789'
I need to extract the two (city, state) fields from this string. Here is the code I wrote
name_list = []
while i < len(data)):
if line[i] == ':':
name = ''
j = 0
i = i + 1
while line[i] != '-' and line[i].isnumeric() == False:
name[j] = line[i] # This line gives error
i = i + 1
j = j + 1
name_list.append(name)
i = i + 1
What should I do?
data = '123:Palo Alto, CA -> 456:Seattle, WA 789'
citys = []
for record in data.split("->"):
citys.append(
re.search(r":(?P<city>[\w\s]+),\s*(?P<state>[\w]+)",record)
.groupdict()
)
print citys
Gives:
[{'city': 'Palo Alto', 'state': 'CA'}, {'city': 'Seattle', 'state': 'WA'}]
My take, assuming the string is always formatted as per your example:
import re
data = '123:Palo Alto, CA -> 456:Seattle, WA 789'
name_list = []
r = re.compile("(\s?\d)|:")
name_list += r.sub("", data).split(" ->")
print name_list # Prints ['Palo Alto, CA', 'Seattle, WA']
As a note on your error, the empty string will have a length of 0, so the index 0 doesn't exist:
>>> s = ""
>>> len(s)
0
You can, however, concatenate strings in Python with the +
operator, like so:
>>> s += "Some"
>>> s += " Text"
>>> print s
Some Text
You could always use a regular expression, if you wanted: /\d+:(\w+),\s(\w+)/
. Its not pretty, but it should get the job done. Assuming string to match is the test string you had.
import re
for s in string_to_match.split("->"):
m = re.match(r"\d+:(\w+),\s(\w+)", s)
city = m.group(1)
state = m.group(2)
Syntax may be a little off, but the general idea is there.
assuming that you always have the string formatted as shown you could do:
cityState = []
for line in data.split('->'):
cityState.append({'city':city=line.strip().split(',')[0].split(':')[1],
'state':state=line.strip().split(',').split(' ')[1]})
You can use regex. Here is my ugly regex, you can do better
inputStr = '123:Palo Alto, CA -> 456:Seattle, WA 789';
m = re.search('.*:(.*),(.*)->.*:(.*),\s*(\S{2})', inputStr)
print "City1=" + m.group(1)
print "State1=" + m.group(2)
print "City2=" + m.group(3)
print "State2=" + m.group(4)
Produces
City1=Palo Alto
State1= CA
City2=Seattle
State2=WA
精彩评论