开发者

Python compile all non-words except dot[.]

I am trying to break a line on all non-word patterns except .(dot)

Usually I guess it can be done as [\W 开发者_Go百科^[.]] in java, but how to I do in python?


>>> import re
>>> the_string="http://hello-world.com"
>>> re.findall(r'[\w.]+',the_string)
['http', 'hello', 'world.com']


A very good reference for Python's regular expression module is available here. Following should do the trick for you.

import re
re.split(r'[\w.]+', text_string)

Or,

import re
re.findall('[^\w.]+', text_string)


Your Java syntax is off, to begin with. This is what you were trying for:

[\W&&[^.]]

That matches a character from the intersection of the sets described by "any non-word character" and "any character except ." But that's overkill when you can just use:

[^\w.]

...or, "any character that's not a word character or .". It's the same in Python (and in most other flavors, too), though you probably want to match one or more of the characters:

re.split(r'[^\w.]+', the_string)

But it's probably simpler to use @gnibbler's approach of matching the parts that you want to keep, not the ones you want to throw away:

re.findall(r'[\w.]+', the_string)


I'm assuming that you want to split a string on all non-word patterns except a dot.

Edit: Python doesn't support the Java-style regex syntax that you are using. I'd suggest first replacing all dots with a long string, then splitting the string, then putting the dots back in.

import re
long_str = "ABCDEFGH"
str = str.replace('.', long_str)
result = re.split(r'\W', str)

Then as you are using result, replace all the long_str sequences with a dot again.

This is a very bad solution, but it works.


Python has a convenience function for that

>>> s = "ab.cd.ef.gh"
>>> s.split(".")
['ab', 'cd', 'ef', 'gh']
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜