Tailing a binary file in Erlang adds mysterious bit-string
I want to run tail on a named pipe to facilitate some binary logfile processing. The problem is that mysterious data is being added to the beginning of the stream. I run my tests by starting the erlang process with the opened port (open_port) and then I use another shell to cat the bin into the named pipe.
Here is a simple function for getting data from the port:
bin_from_tail() ->
open_port({spawn,"/usr/bin/tail -F named_pipe"},
[binary,in,eof]),
receive
{_,{data,<<Data/binary>>}} -> Data
end.
So here are two ways for me to grab the same data...
Create the named pipe
mkfifo named_pipe
开发者_如何学PythonThis command blocks until you run "cat log.bin > named_pipe" from another shell
{ok,TailBin} = file:read_file(log.bin).
Read the entire file into memory using the erlang file library FileBin = file:read_file(log.in).
But TailBin and FileBin are not the same! TailBin has a mysterious 120-byte string at the beginning:
<<40,6,161,69,172,216,56,14,100,0,80,6,0,0,0>>
Thanks for the idea about the endlessly looping cat/restarting a dead port. It appears that named pipes buffer just a little bit, so if the port opens up fast enough the writer process (another program) won't crash! Definitely risky stuff, but as far as hacks go... it works.
Because all the mailing list posts just said do this, do that without examples, I'm going to post how mine works! If anyone wants to offer up improvements, please feel free to do so. My solution:
read() ->
Port = open_port({spawn,"/bin/cat /path/to/pipe"},
[binary,in,eof]),
do_read(Port).
do_read(Port) ->
receive
{Port,{data,<<Data/binary>>}} ->
case do_something:with(Data) of
ok ->
io:format("G") % Good
Any ->
io:format("B") % Bad
end;
{Port,eof} ->
read();
Any ->
io:format("No match fifo_client:do_read/1, ~p~n",[Any])
end,
do_read(Port).
I found the same thing happened outside erlang. The problem is that tail is trying to show you the end of the file, not the whole file. If you use it on a normal file, anything written would be new, and picked up by -f, but in this case it looks like tail is waiting until the end of the file (the eof that comes through the pipe) and then showing the last 10 lines (treating the binary as text).
tail -F -c 9999999
(assuming your log is 9999999 bytes or less) would probably work.
Maybe try using cat instead of tail -F, that seemed to work for me. Then you just need to avoid the fact that cat exits upon eof, which I assume you were trying to avoid by using tail.
So a shell script which loops cat endlessly, maybe?
Or get erlang to restart close and recreate the port when it dies, since you're getting the eof signal anyway. Or use the exit_status flag to open_port to be signalled when the process exits, incase you need to distinguish eof and process exit. (If you use both exit_status and eof, the eof never comes, a brief test with cat < /dev/null indicates)
精彩评论