开发者

Reconstituting Strings in Python

I would like to do something like:

temp=a.split()
#do some stuff with this new list
b=" ".join(temp)

where a is开发者_运维问答 the original string, and b is after it has been modified. The problem is that when performing such methods, the newlines are removed from the new string. So how can I do this without removing newlines?


I assume in your third line you mean join(temp), not join(a).

To split and yet keep the exact "splitters", you need the re.split function (or split method of RE objects) with a capturing group:

>>> import re
>>> f='tanto va\nla gatta al lardo'
>>> re.split(r'(\s+)', f)
['tanto', ' ', 'va', '\n', 'la', ' ', 'gatta', ' ', 'al', ' ', 'lardo']

The pieces you'd get from just re.split are at index 0, 2, 4, ... while the odd indices have the "separators" -- the exact sequences of whitespace that you'll use to re-join the list at the end (with ''.join) to get the same whitespace the original string had.

You can either work directly on the even-spaced items, or you can first extract them:

>>> x = re.split(r'(\s+)', f)
>>> y = x[::2]
>>> y
['tanto', 'va', 'la', 'gatta', 'al', 'lardo']

then alter y as you will, e.g.:

>>> y[:] = [z+z for z in y]
>>> y
['tantotanto', 'vava', 'lala', 'gattagatta', 'alal', 'lardolardo']

then reinsert and join up:

>>> x[::2] = y
>>> ''.join(x)
'tantotanto vava\nlala gattagatta alal lardolardo'

Note that the \n is exactly in the position equivalent to where it was in the original, as desired.


You need to use regular expressions to rip your string apart. The resulting match object can give you the character ranges of the parts that match various sub-expressions.

Since you might have an arbitrarily large number of sections separated by whitespace, you're going to have to match the string multiple times at different starting points within the string.

If this answer is confusing to you, I can look up the appropriate references and put in some sample code. I don't really have all the libraries memorized, just what they do. :-)


It depends in what you want to split.

For default split use '\n', ' ' as delimitador, you can use

a.split(" ") 

if you only want spaces as delimitador.

http://docs.python.org/library/stdtypes.html#str.split


I don't really understand your question. Can you give an example of what you want to do?

Anyway, maybe this can help:

b = '\n'.join(a)


First of all, I assume that when you say

b = " ".join(a)

You actually mean

b = " ".join(temp)

When you call split() without specifying a separator, the function will interpret whitespace of any length as a separator. I believe whitespace includes newlines, so those dissapear when you split the string. Try explicitly passing a separator (such as a simple " " space character) to split(). If you have multiple spaces in a row, using split this way will remove them all and include a series of "" empty strings in the returned list.

To restore the original spacing, just make sure that you call join() from the same string which you used as your separator in split(), and that you don't remove any elements from your intermediary list of strings.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜