Python name grabber
if I have a string in the format of
(static string) name (different static string ) message (last static s开发者_JAVA百科tring)
(static string) name (different static string ) message (last static string)
(static string) name (different static string ) message (last static string)
(static string) name (different static string ) message (last static string)
what would be the best way of searching through the messages for word and generate an array of all of the name's that had that word in their message?
>>> s="(static string) name (different static string ) message (last static string)"
>>> _,_,s=s.partition("(static string)")
>>> name,_,s=s.partition("(different static string )")
>>> message,_,s=s.partition("(last static string)")
>>> name
' name '
>>> message
' message '
Expecting this string:
Foo NameA Bar MessageA Baz
this regex will match:
Foo\s+(\w+)\s+Bar\s+(\w+)\s+Baz
Group 1 will be the name, group 2 will be the message. FooBarBaz are the static parts.
Here it is using the repl of Python:
Python 2.6.1 (r261:67517, Dec 4 2008, 16:51:00) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> s = "Foo NameA Bar MessageA Baz"
>>> m = re.match("Foo\s+(\w+)\s+Bar\s+(\w+)\s+Baz", s)
>>> m.group(0)
'Foo NameA Bar MessageA Baz'
>>> m.group(1)
'NameA'
>>> m.group(2)
'MessageA'
>>>
Here's a full answer showing how to do it using replace()
.
strings = ['(static string) name (different static string ) message (last static string)',
'(static string) name (different static string ) message (last static string)',
'(static string) name (different static string ) message (last static string)',
'(static string) name (different static string ) message (last static string)',
'(static string) name (different static string ) message (last static string)',
'(static string) name (different static string ) message (last static string)']
results = []
target_word = 'message'
separators = ['(static string)', '(different static string )', '(last static string)']
for s in strings:
for sep in separators:
s = s.replace(sep, '')
name, message = s.split()
if target_word in message:
results.append((name, message))
>>> results
[('name', 'message'), ('name', 'message'), ('name', 'message'), ('name', 'message'), ('name', 'message'), ('name', 'message')]
Note that this will match any message
that contains the substring target_word
. It will not look for word boundaries, e.g. compare a run of this with target_word = 'message'
vs. target_word = 'sag'
- will produce the same results. You may need regular expressions if your word matching is more complicated.
for line in open("file"):
line=line.split(")")
for item in line:
try:
print item[:item.index("(")]
except:pass
output
$ more file
(static string) name (different static string ) message (last static string)
(static string) name (different static string ) message (last static string)
(static string) name (different static string ) message (last static string)
(static string) name (different static string ) message (last static string)
$ python python.py
name
message
name
message
name
message
name
message
精彩评论