Python compile all non-words except dot[.]

2023-01-11 09:31 问答作者：

I am trying to break a line on all non-word patterns except .(dot)

Usually I guess it can be done as [\W 开发者_Go百科^[.]] in java, but how to I do in python?

>>> import re
>>> the_string="http://hello-world.com"
>>> re.findall(r'[\w.]+',the_string)
['http', 'hello', 'world.com']

A very good reference for Python's regular expression module is available here. Following should do the trick for you.

import re
re.split(r'[\w.]+', text_string)

Or,

import re
re.findall('[^\w.]+', text_string)

Your Java syntax is off, to begin with. This is what you were trying for:

[\W&&[^.]]

That matches a character from the intersection of the sets described by "any non-word character" and "any character except ." But that's overkill when you can just use:

[^\w.]

...or, "any character that's not a word character or .". It's the same in Python (and in most other flavors, too), though you probably want to match one or more of the characters:

re.split(r'[^\w.]+', the_string)

But it's probably simpler to use @gnibbler's approach of matching the parts that you want to keep, not the ones you want to throw away:

re.findall(r'[\w.]+', the_string)

I'm assuming that you want to split a string on all non-word patterns except a dot.

Edit: Python doesn't support the Java-style regex syntax that you are using. I'd suggest first replacing all dots with a long string, then splitting the string, then putting the dots back in.

import re
long_str = "ABCDEFGH"
str = str.replace('.', long_str)
result = re.split(r'\W', str)

Then as you are using result, replace all the long_str sequences with a dot again.

This is a very bad solution, but it works.

Python has a convenience function for that

>>> s = "ab.cd.ef.gh"
>>> s.split(".")
['ab', 'cd', 'ef', 'gh']

继续阅读：python regex

Python compile all non-words except dot[.]

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？