Consistent implementation of tr?
I have a ksh script that generates a long, random string using /dev/urandom
and tr
:
STRING="$(cat /dev/urandom|tr -dc 'a-zA-Z0-9-_'|fold -w 64 |head -1)"
On the Linux and AIX servers where I used this it resulted in 64 characters of upper and lower case alpha chars, digits, dash and underscore characters. Example:
W-uch3_4fbnk34u2nc08w_nj23n089023ncNjxz979823n23-n88h30pmLCxkMKj
When I used the script on Solaris the ranges were interpreted as literals and it resulted in strings from the set aAzZ09-_
. Example:
AA0z9_aZ-a-z00aZ9_azAZa0zZza9-Az0-_za-9aa0az_a0z-0a0z000-A9Z_0a
Oddly, on this Solaris server the man page for tr
indicates that the syntax used should have produced the desired result.
The idea is to use /dev/urandom
to produce a pseudo-random string from which we extract characters so that the result a) does not contain spaces and b) does not contain s开发者_如何学Chell special characters. The string will be used on the command line as an argument later on in the script. We don't want to use classes like :alnum:
because locale can convert these into multi-byte values that don't work on the command line. This ksh one-liner did the trick perfectly on a great many installations until we got to Solaris.
We have temporarily converted this to a somewhat nasty Perl regex. Is there a syntax for tr
or some other utility or ksh built-in that will perform this task consistently across UNIX variants and is universally installed? Doesn't have to be a one-liner but simplicity is appreciated.
Update: We tried the Locale settings with no luck. Waiting on results of using xpg6 version.
$ uname -a
SunOS hostname 5.10 Generic_142900-04 sun4u sparc SUNW,SPARC-Enterprise
$ cat /dev/urandom | tr -dc "a-zA-Z0-9-_" | fold -w 64 | head -1 | sed 's/^-/_/'
0-a9-z9a_zzZAa_a_0az-9_z0a_90Z_9az09aZzZAa-9aa_-__za0ZA9_ZzzZazA
$ set | grep '^L[AC]'
LANG=C
LC_ALL=C
LC_COLLATE=en_US
LC_CTYPE=en_US
LC_MESSAGES=en_US
LC_MONETARY=en_US
LC_NUMERIC=en_US
LC_TIME=en_US
$ export LC_CTYPE="$LC_ALL" LC_MESSAGES="$LC_ALL"
$ set | grep '^L[AC]'
LANG=C
LC_ALL=C
LC_COLLATE=en_US
LC_CTYPE=C
LC_MESSAGES=C
LC_MONETARY=en_US
LC_NUMERIC=en_US
LC_TIME=en_US
$ cat /dev/urandom | tr -dc "a-zA-Z0-9-_" | fold -w 64 | head -1 | sed 's/^-/_/'
0900z9az99_a0za09__0zA0_Z--Z_-Aa-AaA9zAZz-Aa90A00z__ZzA9A-Z0aA_-
$ unset LC_ALL; export LC_COLLATE=C LC_NUMERIC=C LC_TIME=C
$ set | grep '^L[AC]'
LANG=C
LC_COLLATE=C
LC_CTYPE=C
LC_MESSAGES=C
LC_MONETARY=en_US
LC_NUMERIC=C
LC_TIME=C
$ cat /dev/urandom | tr -dc "a-zA-Z0-9-_" | fold -w 64 | head -1 | sed 's/^-/_/'
_AA9aA_Za-A0-AZa_A-0ZA--a_za-a9zZZz__a0az_-0A-9-0aA-0za00A-__9-0
$ unset LANG LC_COLLATE LC_NUMERIC LC_TIME
$ set | grep '^L[AC]'
LC_CTYPE=C
LC_MESSAGES=C
LC_MONETARY=en_US
$ cat /dev/urandom | tr -dc "a-zA-Z0-9-_" | fold -w 64 | head -1 | sed 's/^-/_/'
_-_9zz9Z-Z-Z-Z_0_a9zzzZZaAa--9_zAZaaAZz-ZaAZ09Z-_z-zz09ZZAzAz0Z0
$ unset LC_CTYPE LC_MESSAGES LC_MONETARY
$ set | grep '^L[AC]'
$ cat /dev/urandom | tr -dc "a-zA-Z0-9-_" | fold -w 64 | head -1 | sed 's/^-/_/'
_0aAa9_Z_a_Z--_Az-aa0ZA0ZzZ-9Aa9-Z0--0A_Z0Zaz-AA_Zz0z---Z_99z_a9
$ export LANG=C LC_ALL=C LC_COLLATE=C LC_CTYPE=C LC_MESSAGES=C LC_MONETARY=C LC_NUMERIC=C LC_TIME=C
$ set | grep '^L[AC]'
LANG=C
LC_ALL=C
LC_COLLATE=C
LC_CTYPE=C
LC_MESSAGES=C
LC_MONETARY=C
LC_NUMERIC=C
LC_TIME=C
$ cat /dev/urandom | tr -dc "a-zA-Z0-9-_" | fold -w 64 | head -1 | sed 's/^-/_/'
Za_000z9aa--aA00zAAZza0AA90090--z0a00_zZ9ZA0_---aZZ09a0ZA0_0zZaa
$ cat /dev/urandom | tr -dc "[a-z][A-Z][0-9]-_" | fold -w 64 | head -1 | sed 's/^-/_/'
x7dni9gIXVF6AHQc3B-H6hjnBVHChJ9zM-z5EQ5UEruATI_NNFaCoVLOqM6gVaT5
$
Of course, on Linux that last version spits out square brackets.
If you set your path to /usr/xpg6/bin/ then it'll work as expected The locale seems to have no affect here. A cross platform hack is:
tr -dc '[a-z][A-Z][0-9]_-' < /dev/urandom | tr -d '][' | fold -w64 | head -n1
What you've observed is not a different between operating systems, but different machines having different locale settings. Your Solaris machine has LC_COLLATE set to a non-default value, which is a sure recipe for the kind of problems you have.
Locale settings are set from the environment as follows:
If the environment variable
LC_ALL
is set, its value is used for all categories.Otherwise, if
LC_FOO
is set, its value is used for categoryLC_FOO
.Otherwise, if
LANG
is set, its value is used for categories that weren't explicitly set.The default locale is called
C
. On Unix systems,POSIX
is a synonym forC
.
The main locale categories are:
LC_CTYPE
indicates the character set and encoding used for file names, file contents and terminal I/O. You should carefully preserve this setting unless you know it's inaccurate (e.g. because a particular file format specifies a particular encoding).LC_MESSAGES
is the language of the messages that the user sees. You should preserve this setting. If you really need to parse an error message, setLC_MESSAGES=C
.LC_COLLATE
indicates the sorting order of characters. It's nearly always undesirable in scripts. Most values other thanC
cause trouble, such asA
-Z
matching lowercase letters.Occasionally
LC_NUMERIC
may cause trouble because numbers may be printed with different punctuation, andLC_TIME
influences the way some commands show a date and time. The other categories hardly ever matter in scripts.
Here's a reasonable strategy for scripts (warning, typed directly into the browser):
unset LANGUAGE # a GNU-specific setting
if [ -n "$LC_ALL" ]; then
export LC_CTYPE="$LC_ALL" LC_MESSAGES="$LC_ALL"
unset LC_ALL
elif [ -n "$LANG" ]; then
export LC_COLLATE=C LC_NUMERIC=C LC_TIME=C
else
unset LC_COLLATE LC_NUMERIC LC_TIME
fi
Standard shell utilities obey the locale settings. Perl doesn't unless you tell it to.
Try:
LANG=C tr -dc 'a-zA-Z0-9-_'
also try specifying the full path to tr
(and compare the results from /usr/bin/tr
to the xpg version).
What is the difference between -c
("values") and -C ("characters") on Solaris? On Linux they're the same.
An aside: Are you able to use head -c 64
to replace fold -w 64 |head -1
? Also, you can eliminate cat
: tr ... < /dev/urandom | ...
Ultimately, depending on availability one of these may work for you (but the character set may be a little different than what you want):
base64 /dev/urandom | head -c 64
or
uuencode /dev/urandom | head -c 64
精彩评论