Generate a string representation of a one-hot encoding
In Python, I need to generate a dict
that maps a letter to a pre-defined "one-hot" representation of that letter. By way of illustration, the dict
should look like this:
{ 'A': '1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0',
'B': '0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0', # ...
}
There is one bit (represented as a character) per letter of the alphabet. Hence each string will contain 25 zeros and one 1. The position of the 1
is determined by the position of the corresponding letter in the alphabet.
I came up with some code that generates this:
# Character set is explicitly specified for fine grained control
_letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"开发者_开发技巧
n = len(_letters)
one_hot = [' '.join(['0']*a + ['1'] + ['0']*b)
for a, b in zip(range(n), range(n-1, -1, -1))]
outputs = dict(zip(_letters, one_hot))
Is there a more efficient/cleaner/more pythonic way to do the same thing?
I find this to be more readable:
from string import ascii_uppercase
one_hot = {}
for i, l in enumerate(ascii_uppercase):
bits = ['0']*26; bits[i] = '1'
one_hot[l] = ' '.join(bits)
If you need a more general alphabet, just enumerate over a string of the characters, and replace ['0']*26
with ['0']*len(alphabet)
.
In Python 2.5 and up you can use the conditional operator:
from string import ascii_uppercase
one_hot = {}
for i, c in enumerate(ascii_uppercase):
one_hot[c] = ' '.join('1' if j == i else '0' for j in range(26))
one_hot = [' '.join(['0']*a + ['1'] + ['0']*b)
for a, b in zip(range(n), range(n-1, -1, -1))]
outputs = dict(zip(_letters, one_hot))
In particular, there's a lot of code packed into these two lines. You might try the Introduce Explaining Variable refactoring. Or maybe an extract method.
Here's one example:
def single_onehot(a, b):
return ' '.join(['0']*a + ['1'] + ['0']*b)
range_zip = zip(range(n), range(n-1, -1, -1))
one_hot = [ single_onehot(a, b) for a, b in range_zip]
outputs = dict(zip(_letters, one_hot))
Although you might disagree with my naming.
That seems pretty clear, concise, and Pythonic to me.
精彩评论