开发者

Easiest way to map data from double to integer without losing consistency

I'm trying to find the best way to normalize consistently.

Basically I have a certain number of instances, each one with a certain number of attributes with floating values:

For example:

At1 At2 At3

0.1 0.3 3.0

0.1 4.5 2.1

...

And I want to map each attribute to integer values, trying to be consistent with the data.

I tried for example to simply , for each attribute, divide the difference between the max value and the min value for that attribute, dividing it into an arbitrary value like 10, and then map all the double values of each attributes to the index of it's corresponding interval, and by doing so, normalizing my attributes to integer values between 1 and ten...

But I would like an app开发者_如何学Pythonroach that would use the shortest number possible of intervals for each attributes without losing consistency, for example, If I have one attribute with three possible values: 1.2, 3.5 and 223.3 by my approach using for example intervals of 10 possible values I would have a ton of unnecessary intervals for that attribute, and a LOT of wasted space...

Any suggestions?


I think you're asking about encoding for compression, or more specifically, how to find a 1-1 map of reals to integers.

Huffman encoding is probably the most famous, and can be proven to be the smallest (have the least number of wasted intervals, in your terminology). Range encoding is also popular.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜