Bug in Python's str.rstrip() function, or my own stupidity?
Either this is a bug, or I'm about to learn something new about how Python behaves. :)
I have a dictionary filled with key/value pairs. Each key has a unique prefix, ias_XX_XX_
. I'm attempting to get a list of every unique prefix in the dictionary.
- First I get a list of all keys which end in
'_x1'
. - Next, I strip
'_x1'
from all of them usingrstrip('_x1'
).
This works fine for all of them, except for the last one, ias_1_1_x1
. Instead of being stripped to ias_1_1
, it becomes ias_
. Run the code to see for yourself:
d = {
'ias_16_10_x2': 575,
'ias_16_10_x1': 0,
'ias_16_10_y1': 0,
'ias_16_10_y2': 359,
'ias_16_9_x2': 575,
'ias_16_9_x1': 0,
'ias_16_9_y1': 18,
'ias_16_9_y2': 341,
'ias_1_1_y1': 0,
'ias_1_1_y2': 359,
'ias_1_1_x2': 467,
'ias_1_1_x1': 108,
}
x1_key_matches = [key for key in d if '_x1' in key]
print x1_key_matches
unique_ids = []
for x1_field in x1_key_matches:
unique_ids.append(x1_field.rstrip('_x1'))
print unique_ids
Actual Output: (Python 2.6, 2.7, and 3.2 (must change print to print() for 3.x to work))
['ias_16_10_x1', 'ias_16_9_x1',开发者_如何学Go 'ias_1_1_x1']
['ias_16_10', 'ias_16_9', 'ias'] # <<<--- Why isn't this last one ias_1_1???
Expected Output:
['ias_16_10_x1', 'ias_16_9_x1', 'ias_1_1_x1']
['ias_16_10', 'ias_16_9', 'ias_1_1']
If I change the key's name from ias_1_1
to something like ias_1_2
, or ias_1_3
, the glitch doesn't occur. Why is this happening?
The parameter to rstrip()
is a set of characters to be stripped, not an exact string:
>>> "abcbcbaba".rstrip("ab")
"abcbc"
General hint: If you suspect a bug in some function, read its documentation.
From the docs, emphasis added:
The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace. The chars argument is not a suffix; rather, all combinations of its values are stripped.
.rstrip's parameter isn't the string which we want to strip, it's the characters we want to strip. Check that examples:
>>> "12345678".rstrip("158")
'1234567'
>>> "12345678".rstrip("asd8qwe")
'1234567'
>>> "12345678".rstrip("78")
'123456'
>>> "1234568788".rstrip("78")
'123456'
.rstrip()
removes all combinations of matching characters, not the actual string you provide. See http://docs.python.org/library/stdtypes.html.
Try this out instead:
unique_ids.append(re.sub('_x1$', '', x1_field)
rstrip returns a copy of the string with trailing characters removed.
For example:
>>> ' spacious '.rstrip()
' spacious'
>>> "AABAA".rstrip("A")
'AAB'
>>> "ABBA".rstrip("AB") # both AB and BA are stripped
''
>>> "ABCABBA".rstrip("AB")
'ABC'
########
>>> ' spacious '.rstrip()
' spacious'
>>> 'mississippi'.rstrip('ipz')
'mississ'
If you are dealing with file names be extra careful,
>>> "cosmac.csv".replace(".csv")
'cosma'
>>> "cosmac.csv".replace(".csv", "")
'cosmac'
Hope this helps!
精彩评论