Sentence that uses every base64 character
I am trying to construct a sentence/letter combination that will return every base64 ch开发者_如何学Pythonaracter, but failing to find a word for purposes of unit testing.
The unit tests I have so far are failing to hit the lines that handle the + and / characters. While I can sling a them at the encoder/decoder directly it would be nice to have a human readable source (the base64 equivalent of 'the quick brown dog').
Here is a Base64 encoded test string that includes all 64 possible Base64 symbols:
char base64_encoded_test[] =
"U28/PHA+VGhpcyA0LCA1LCA2LCA3LCA4LCA5LCB6LCB7LCB8LCB9IHRlc3RzIEJhc2U2NCBlbmNv"
"ZGVyLiBTaG93IG1lOiBALCBBLCBCLCBDLCBELCBFLCBGLCBHLCBILCBJLCBKLCBLLCBMLCBNLCBO"
"LCBPLCBQLCBRLCBSLCBTLCBULCBVLCBWLCBXLCBYLCBZLCBaLCBbLCBcLCBdLCBeLCBfLCBgLCBh"
"LCBiLCBjLCBkLCBlLCBmLCBnLCBoLCBpLCBqLCBrLCBsLCBtLCBuLCBvLCBwLCBxLCByLCBzLg==";
char base64url_encoded_test[] =
"U28_PHA-VGhpcyA0LCA1LCA2LCA3LCA4LCA5LCB6LCB7LCB8LCB9IHRlc3RzIEJhc2U2NCBlbmNv"
"ZGVyLiBTaG93IG1lOiBALCBBLCBCLCBDLCBELCBFLCBGLCBHLCBILCBJLCBKLCBLLCBMLCBNLCBO"
"LCBPLCBQLCBRLCBSLCBTLCBULCBVLCBWLCBXLCBYLCBZLCBaLCBbLCBcLCBdLCBeLCBfLCBgLCBh"
"LCBiLCBjLCBkLCBlLCBmLCBnLCBoLCBpLCBqLCBrLCBsLCBtLCBuLCBvLCBwLCBxLCByLCBzLg==";
It decodes to a string composed entirely of relatively human-readable text:
char test_string[] = "So?<p>"
"This 4, 5, 6, 7, 8, 9, z, {, |, } tests Base64 encoder. "
"Show me: @, A, B, C, D, E, F, G, H, I, J, K, L, M, "
"N, O, P, Q, R, S, T, U, V, W, X, Y, Z, [, \\, ], ^, _, `, "
"a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s.";
This decoded string contains only letters in the limited range of isprint()'able 7-bit ASCII characters (space through '~').
Since I did it, I would argue that it is possible :-).
You probably can't do that.
/
in base64 encodes 111111
(6 '1' bits).
As all ASCII (which are the type-able and printable characters) are in the range of 0-127 (i.e. 00000000
and 01111111
), the only ASCII character that could be encoded using '/' is the ASCII character with the code 127, which is the non-printable DEL
character.
If you allow values higher than 127, you could have a printable but non-typeable string.
When attempting to encode/decode, this is the one place where I break the rule of unit testing a single method at once. You can have methods for encoding or decoding separately, but the only way to tell if you're doing it correctly is to use both encoding and decoding in a single assert. I would use the following psuedo code.
Generate a random string using Path.GetRandomFilename() this string is cryptographically strong
Pass the string to the encode method
Pass the output of the encode to the decode method
Assert.AreEqual(input from GetRandomFilename, output from Decode)
You can loop over this as many times as you want in order to say it's tested. You can also cover some specific cases; however, since encoding (sometimes) differs based on the positioning of the letters, you're better off going with a random string and just calling encode/decode about 50 or so times.
If you find that encoding/decoding fails in accepted scenarios, create unit tests for those and filter out the strings that contain those characters/character combinations. Also, document those failures in XMLDocs comments, code comments, and any documentation your app has.
What I came up with, may prove not unuseful. Needs to be entered exactly as is: I include a link to a screenshot showing all the usually invisible characters below, as well as the Base64 data string to which it converts, and a table of the relevant statistics pertinent to each of the 64 characters therein.
<HTML><HEAD></HEAD><BODY><PRE>
Did
THE
THE QUICK BROWN FOX
jump
over
the
lazy
dogs
or
was
he
pushed
?
</PRE><B>hmm.</B></BODY><HTML>
ÿß®Þ~c*¯/
This encodes to the Base64 string:
PEhUTUw+PEhFQUQ+PC9IRUFEPjxCT0RZPjxQUkU+DQpEaWQJDQoNCiBUSEUJDQoNCiAgVEhFIFFVSUNLIEJST1dOIEZPWAkNCg0KICAganVtcAkNCg0KICAgIG92ZXIJDQoNCiAgICAgdGhlCQ0KDQogICAgICBsYXp5CQ0KDQogICAgICAgZG9ncwkNCg0KICAgICAgICBvcgkNCg0KICAgICAgICAgd2FzCQ0KDQogICAgICAgICAgaGUJDQoNCiAgICAgICAgICAgcHVzaGVkCQ0KDQogICAgICAgICAgICA/CQ0KDQo8L1BSRT48Qj5obW0uPC9CPjwvQk9EWT48SFRNTD4NCg0KDQoNCg0KDQoNCg//367efmMqry/==
which contains
5--/'s
4--+'s
3--='s
14--0's
3--1's
3--2's
2--3's
4--4's
3--5's
2--6's
2--7's
4--8's
6--9's
5--a's
27--A's
2--b's
5--B's
5--c's
4--C's
4--d's
14--D's
2--e's
10--E's
2--f's
8--F's
36--g's
6--G's
5--h's
2--H's
5--i's
30--I's
5--j's
6--J's
8--k's
12--K's
2--l's
3--L's
2--m's
4--M's
3--n's
14--N's
13--o's
2--O's
3--p's
9--P's
2--q's
24--Q's
2--r's
5--R's
2--s's
6--S's
2--t's
7--T's
2--u's
1--U's
3--v's
6--V's
4--w's
5--W's
3--x's
6--X's
2--y's
4--Y's
3--z's
5--Z's
精彩评论