Unix sort utility: use hexadecimal byte value as delimiter
I'm wondering if I can use a hexadecimal value as delimiter of the Unix sort
utility.
Basically I want to do something like:
sort -t '\x00' <input
But it doesn't w开发者_StackOverflow社区ork if I do it in the way above.
If you read the GNU sort
manual, you will find:
-t separator
,--field-separator=separator
Use character separator as the field separator when finding the sort keys in each line. By default, fields are separated by the empty string between a non-blank character and a blank character. By default a blank is a space or a tab, but the LC_CTYPE locale can change this. That is, given the input line
foo bar
, sort breaks it into fieldsfoo
andbar
. The field separator is not considered to be part of either the field preceding or the field following, so withsort -t " "
the same input line has three fields: an empty field, ‘foo’, and ‘bar’. However, fields that extend to the end of the line, as-k 2
, or fields consisting of a range, as-k 2,3
, retain the field separators present between the endpoints of the range. To specify ASCII nul as the field separator, use the two-character string\0
, e.g.,sort -t ’\0’
.
This worked with old (GNU CoreUtils 5.97) sort
.
There does not seem to be a way to do it on Linux. I've tried a number of tricks to get a NUL (0x00) byte into the delimiter, and the sort
command complains:
sort: empty tab
You can't do it with Control-V @ as you are typing the command line; the shell (bash
) does not like that.
I have a program genchar
that writes bytes to output, so I tried:
sort -t "$(genchar 0)" ...
And that did not work either; I got the error from sort
.
$ genchar 0 | od -c
0000000 \0 \n
0000002
$
If you were able to use control-A instead, then there'd be no problem.
Note that sort
does not expand hex escape sequences in the '-t
' option argument; you have to supply the actual byte you want to use. You probably can't use newline as a field delimiter, either; if you did, what would the record delimiter be?
GNU 'sort' (from CoreUtils 5.97, at any rate; the current version is 8.12 - as of 2011-04-26) does support a -z
option:
-z
,--zero-terminated
end lines with 0 byte, not newline
This is not, sadly, what you are looking for.
精彩评论