开发者

Hard to explain question. Downconvert/limit string to a certain charset without stripping

I've encountered this problem a few times, and now I finally decided to ask, hoping someone knows w开发者_JS百科hat I'm talking about.

What I wish to do is this form of char convertion:

ÆØÅ => AOA
ÉÈÊ => EEE
üÿï => uyi

So far the closest I've come to a search criteria I can type into google as this:

  • Something similar to base64/URLEncode
  • A sound algorithm such as Metaphone or Soundex

This did not work as expected. There seemed to be no correlation between ÉÈÊ and EEE any different from that and ÆØÅ. So, held up against E, all six chars would've been converted to E, which wasn't the accuracy I was looking for.

  • Convertion from the origin encoding (e.g. ASCII) to a charset/encoding consiting of only alphanumerics

I'm not very confident about this approach as the encoding would have to be able to recognize, say E, as an ancestor/nearest (alphanumeric) neighbour of È.

I feel like I'm saying a lot of words which are around the ballpark.

Does anyone understand what I'm trying to achieve, or know what this "method" I'm looking for is called?

Any ideas/thoughts are very much appreciates (and I do mean any),

  • Mik


I suspect you'd have to consider a database of Unicode codepoints, mapping them to their nearest US-ASCII equivalent (where possible). I imagine it would be a relatively sparse map, since most Unicode codepoints don't have a US-ASCII equivalent.

Hopefully this answer has some key words in that help you look for what you want.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜