开发者

Tailing a binary file in Erlang adds mysterious bit-string

I want to run tail on a named pipe to facilitate some binary logfile processing. The problem is that mysterious data is being added to the beginning of the stream. I run my tests by starting the erlang process with the opened port (open_port) and then I use another shell to cat the bin into the named pipe.

Here is a simple function for getting data from the port:

bin_from_tail() ->
  open_port({spawn,"/usr/bin/tail -F named_pipe"},
                             [binary,in,eof]),
  receive
  {_,{data,<<Data/binary>>}} -> Data
  end.

So here are two ways for me to grab the same data...

  1. Create the named pipe

    mkfifo named_pipe

    开发者_如何学Python
  2. This command blocks until you run "cat log.bin > named_pipe" from another shell

    {ok,TailBin} = file:read_file(log.bin).

  3. Read the entire file into memory using the erlang file library FileBin = file:read_file(log.in).

But TailBin and FileBin are not the same! TailBin has a mysterious 120-byte string at the beginning:

<<40,6,161,69,172,216,56,14,100,0,80,6,0,0,0>>


Thanks for the idea about the endlessly looping cat/restarting a dead port. It appears that named pipes buffer just a little bit, so if the port opens up fast enough the writer process (another program) won't crash! Definitely risky stuff, but as far as hacks go... it works.

Because all the mailing list posts just said do this, do that without examples, I'm going to post how mine works! If anyone wants to offer up improvements, please feel free to do so. My solution:

read() ->
  Port = open_port({spawn,"/bin/cat /path/to/pipe"},
                   [binary,in,eof]),
  do_read(Port).

do_read(Port) ->
  receive
    {Port,{data,<<Data/binary>>}} ->
      case do_something:with(Data) of
        ok ->
          io:format("G") % Good
        Any ->
          io:format("B") % Bad
      end;
    {Port,eof} ->
      read();
    Any ->
      io:format("No match fifo_client:do_read/1, ~p~n",[Any])
  end,
  do_read(Port).


I found the same thing happened outside erlang. The problem is that tail is trying to show you the end of the file, not the whole file. If you use it on a normal file, anything written would be new, and picked up by -f, but in this case it looks like tail is waiting until the end of the file (the eof that comes through the pipe) and then showing the last 10 lines (treating the binary as text).

tail -F -c 9999999

(assuming your log is 9999999 bytes or less) would probably work.

Maybe try using cat instead of tail -F, that seemed to work for me. Then you just need to avoid the fact that cat exits upon eof, which I assume you were trying to avoid by using tail.

So a shell script which loops cat endlessly, maybe?

Or get erlang to restart close and recreate the port when it dies, since you're getting the eof signal anyway. Or use the exit_status flag to open_port to be signalled when the process exits, incase you need to distinguish eof and process exit. (If you use both exit_status and eof, the eof never comes, a brief test with cat < /dev/null indicates)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜