base64 encoding alternative to underscore
We're using a file system/url safe variation of base64 encoding such that:
"=" replaced with ""
"+" replaced with "-"
"/" replaced with "_"
We are now using Azure blob storage that does not allow use of "_" within container names.
We are base64 encoding a Guid. If I was to replace underscore with say a "0" am I at risk of collisions?
Update
Not sure why the downvote. But to clarify.
Why not just use a Guid?
- The Guid is the id of an entity within my application. Since the paths are public, I don't really like exposing the Id, hence why I'm encoding it.
I want shorter and more friendly looking paths. Contrary to one of the comments below, the base 64 encoding is NOT longer:
Guid: 5b263cdd-2bc2-485d-83d4-81b96930dc5a
Base64 Encoded: 3TwmW8IrXUiD1IG5aTDcWg== (even shorter after removing ==)
(Another) Update
Seems there is some confusion about what it is I'm trying to achieve (so sorry about that). Heres the short version.
- I have a Guid that represents an entity in my application.
- I need to create a publicly accessible directory for the entity (via a Url).
- I don't want to use the Guid as the directory name, for the reasons above.
- I asked previously on SO about how I could generate a friendlier looking Url that guaranteed uniqueness and did not expose the origi开发者_开发问答nal Guid. The suggestion was Base64 encoding.
- This has worked fine until recently when we needed to use Azure blob storage, which does not allow underscores "_" in it's directory (Container) names.
This is where I'm at.
Just "encode" the GUID in base16. The only characters it uses are 0123456789ABCDEF which should be safe for most purposes.
var encoded = guid.ToString("N");
The base 64 character set is
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=
So you can't use 0 since it is already in use.
Instead of taking base64 and change 4 characters you could encode your data in base60.
Your base60 char list doesn't contain the 4 chars you don't like and so there's no need to replace anything.
Encoding your identifiers does not encrypt them. Any technically savvy observer can base64-uncode an identifier. If you want to make your paths opaque, then either encrypt them or hash them with a salt. If you do want to keep your paths transparent, just use hex without any hyphens or braces. That way, your UUID is serialized to 32 code points, whereas Azure container names can be up to 63 character long.
If you really want shorter and funnier container names, and if Azure supports internationalized domain names, Braille encoding fits the bill as the least typable option. Here's a Haskell one-liner for generating a UUIDv4, mapping each octet of the UUID to a braille letter and encoding the resulting string in UTF-16BE (for a total of 32 octets).
import Data.Binary (encode)
import Data.ByteString.Lazy (intersperse, cons)
import Data.Functor ((<&>))
import Data.UUID.V4 (nextRandom)
braille :: IO Data.ByteString.Lazy.Internal.ByteString
braille = nextRandom <&> encode <&> intersperse 40 <&> cons 40
(In F#, |> would be used instead of <&>.)
For your amusement, see the following gist for how to convert an octet-stream into UTF-16LE or UTF-8 encoded braille strings which makes each bit literally stand out.
https://gist.github.com/bjartur/ea5db281f0b88128455ed79621abbd1d
精彩评论