Identifying programs "before" and "after" program in a pipeline are from the same "toolset"
Say, I am writting some toolset where every single tool operates on the same textual data stream, parses it, does some operation on it and returns textual stream back using the same syntax as in the original input. The tools can be combined (together with other unix tools/scripts/whatever) in a pipeline. Because the textual input processing (parsing) is quite expensive, I would like to avoid it in case two or more tools from the toolset are one right after another in the pipeline and use binary streams instead (to store directly in a mem开发者_如何转开发ory struct, w/o useless "extra" parsing). Is it possible to know (using some trick, inter-process communication, or whatever else) if the tool "before" or "after" any tool in a pipeline is part of the toolset? I guess the unix env. is not prepared for such sort of "signalling" (AFAIK). Thanks for your ideas...
No, processes that are piped together have no methods of two-way communication. If the parsing is really so expensive that this is necessary (I'd guess it isn't, but profile it), then you have a two options that I can think of:
- Have a master program that takes options to tell it which tools to run, in which order, and then have it run a "parse" tool, followed by the requested tools (all using binary I/O), followed by an "output" tool. It wouldn't be terribly difficult to also expose the individual tools, wrapped with the parse/output tools.
If users are expected to be knowledgeable enough, have each tool allow flags to tell them to expect binary input and give binary output, so that users can chain like:
tool1 -o | tool2 -i -o | tool3 -i -o | tool4 -i
where
-o
means give binary output and-i
means accept binary input.
You can certainly have the processes in the tool chain talk, but it requires a bit of work. One idea is to have each process in the toolset use the pgid (the pgid for each process in the pipeline is the same) to determine a shared memory name and then write their pid and inodes of their input streams into the shared memory. Then each process in the tool set will know the other processes in the pipeline that are also in the pipeline. If inodes match, they will know whether their neighbor is in the tool set.
Another way would be to have all the tools read either textual or binary representations, perhaps indicated by a magic number at the beginning of the file. And a command-line option could select the output format. Depending on the usage, it may be preferable to make binary the "default", and select text-output with an option.
prog0 -binout <input.file | prog1 -binout | prog2 >output.file
vs.
prog0 <input.file | prog1 | prog2 -txtout >output.file
You don't need a magic number for the text format if the binary magic number consists of non-ASCII bytes.
精彩评论