开发者

Bug in Python's str.rstrip() function, or my own stupidity?

Either this is a bug, or I'm about to learn something new about how Python behaves. :)

I have a dictionary filled with key/value pairs. Each key has a unique prefix, ias_XX_XX_. I'm attempting to get a list of every unique prefix in the dictionary.

  1. First I get a list of all keys which end in '_x1'.
  2. Next, I strip '_x1' from all of them using rstrip('_x1').

This works fine for all of them, except for the last one, ias_1_1_x1. Instead of being stripped to ias_1_1, it becomes ias_. Run the code to see for yourself:

d = {
'ias_16_10_x2':     575, 
'ias_16_10_x1':     0, 
'ias_16_10_y1':     0, 
'ias_16_10_y2':     359,
'ias_16_9_x2':      575, 
'ias_16_9_x1':      0, 
'ias_16_9_y1':      18, 
'ias_16_9_y2':      341, 
'ias_1_1_y1':       0, 
'ias_1_1_y2':       359,  
'ias_1_1_x2':       467, 
'ias_1_1_x1':       108,
}

x1_key_matches = [key for key in d if '_x1' in key]
print x1_key_matches

unique_ids = []
for x1_field in x1_key_matches:
    unique_ids.append(x1_field.rstrip('_x1'))

print unique_ids

Actual Output: (Python 2.6, 2.7, and 3.2 (must change print to print() for 3.x to work))

['ias_16_10_x1', 'ias_16_9_x1',开发者_如何学Go 'ias_1_1_x1']
['ias_16_10', 'ias_16_9', 'ias']   # <<<--- Why isn't this last one ias_1_1???

Expected Output:

['ias_16_10_x1', 'ias_16_9_x1', 'ias_1_1_x1']
['ias_16_10', 'ias_16_9', 'ias_1_1']

If I change the key's name from ias_1_1 to something like ias_1_2, or ias_1_3, the glitch doesn't occur. Why is this happening?


The parameter to rstrip() is a set of characters to be stripped, not an exact string:

>>> "abcbcbaba".rstrip("ab")
"abcbc"

General hint: If you suspect a bug in some function, read its documentation.


From the docs, emphasis added:

The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace. The chars argument is not a suffix; rather, all combinations of its values are stripped.


.rstrip's parameter isn't the string which we want to strip, it's the characters we want to strip. Check that examples:

>>> "12345678".rstrip("158")
'1234567'
>>> "12345678".rstrip("asd8qwe")
'1234567'
>>> "12345678".rstrip("78")
'123456'
>>> "1234568788".rstrip("78")
'123456'


.rstrip() removes all combinations of matching characters, not the actual string you provide. See http://docs.python.org/library/stdtypes.html.


Try this out instead:

unique_ids.append(re.sub('_x1$', '', x1_field)


rstrip returns a copy of the string with trailing characters removed.

For example:

>>> '   spacious   '.rstrip()
'   spacious'
>>> "AABAA".rstrip("A")
'AAB'
>>> "ABBA".rstrip("AB") # both AB and BA are stripped
''
>>> "ABCABBA".rstrip("AB")
'ABC'

########

>>> '   spacious   '.rstrip()
'   spacious'
>>> 'mississippi'.rstrip('ipz')
'mississ'

If you are dealing with file names be extra careful,

>>> "cosmac.csv".replace(".csv")
'cosma'
>>> "cosmac.csv".replace(".csv", "")
'cosmac'

Hope this helps!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜