开发者

Python - regex - Splitting string before word

I am trying to split a string in pyt开发者_运维百科hon before a specific word. For example, I would like to split the following string before "path:".

  • split string before "path:"
  • input: "path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism"
  • output: ['path:bte00250 Alanine, aspartate and glutamate metabolism', 'path:bte00330 Arginine and proline metabolism']

I have tried

rx = re.compile("(:?[^:]+)")
rx.findall(line)

This does not split the string anywhere. The trouble is that the values after "path:" will never be known to specify the whole word. Does anyone know how to do this?


using a regular expression to split your string seems a bit overkill: the string split() method may be just what you need.

anyway, if you really need to match a regular expression in order to split your string, you should use the re.split() method, which splits a string upon a regular expression match.

also, use a correct regular expression for splitting:

>>> line = 'path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism'
>>> re.split(' (?=path:)', line)
['path:bte00250 Alanine, aspartate and glutamate metabolism', 'path:bte00330 Arginine and proline metabolism']

the (?=...) group is a lookahead assertion: the expression matches a space (note the space at the start of the expression) which is followed by the string 'path:', without consuming what follows the space.


You could do ["path:"+s for s in line.split("path:")[1:]] instead of using a regex. (note that we skip first match, that has no "path:" prefix.


This can be done without regular expressons. Given a string:

s = "path:bte00250 Alanine, aspartate ... path:bte00330 Arginine and ..."

We can temporarily replace the desired word with a placeholder. The placeholder is a single character, which we use to split by:

word, placeholder = "path:", "|"
s = s.replace(word, placeholder).split(placeholder)
s
# ['', 'bte00250 Alanine, aspartate ... ', 'bte00330 Arginine and ...']

Now that the string is split, we can rejoin the original word to each sub-string using a list comprehension:

["".join([word, i]) for i in s if i]
# ['path:bte00250 Alanine, aspartate ... ', 'path:bte00330 Arginine and ...']


in_str = "path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism"
in_list = in_str.split('path:')
print ",path:".join(in_list)[1:]
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜