Identifying programs "before" and "after" program in a pipeline are from the same "toolset"

2023-03-16 09:10 问答作者：

Say, I am writting some toolset where every single tool operates on the same textual data stream, parses it, does some operation on it and returns textual stream back using the same syntax as in the original input. The tools can be combined (together with other unix tools/scripts/whatever) in a pipeline. Because the textual input processing (parsing) is quite expensive, I would like to avoid it in case two or more tools from the toolset are one right after another in the pipeline and use binary streams instead (to store directly in a mem开发者_如何转开发ory struct, w/o useless "extra" parsing). Is it possible to know (using some trick, inter-process communication, or whatever else) if the tool "before" or "after" any tool in a pipeline is part of the toolset? I guess the unix env. is not prepared for such sort of "signalling" (AFAIK). Thanks for your ideas...

No, processes that are piped together have no methods of two-way communication. If the parsing is really so expensive that this is necessary (I'd guess it isn't, but profile it), then you have a two options that I can think of:

Have a master program that takes options to tell it which tools to run, in which order, and then have it run a "parse" tool, followed by the requested tools (all using binary I/O), followed by an "output" tool. It wouldn't be terribly difficult to also expose the individual tools, wrapped with the parse/output tools.
If users are expected to be knowledgeable enough, have each tool allow flags to tell them to expect binary input and give binary output, so that users can chain like:
```
tool1 -o | tool2 -i -o | tool3 -i -o | tool4 -i
```
where -o means give binary output and -i means accept binary input.

You can certainly have the processes in the tool chain talk, but it requires a bit of work. One idea is to have each process in the toolset use the pgid (the pgid for each process in the pipeline is the same) to determine a shared memory name and then write their pid and inodes of their input streams into the shared memory. Then each process in the tool set will know the other processes in the pipeline that are also in the pipeline. If inodes match, they will know whether their neighbor is in the tool set.

Another way would be to have all the tools read either textual or binary representations, perhaps indicated by a magic number at the beginning of the file. And a command-line option could select the output format. Depending on the usage, it may be preferable to make binary the "default", and select text-output with an option.

prog0 -binout <input.file | prog1 -binout | prog2 >output.file

vs.

prog0 <input.file | prog1 | prog2 -txtout >output.file

You don't need a magic number for the text format if the binary magic number consists of non-ASCII bytes.

继续阅读：c pipe pipeline

Identifying programs "before" and "after" program in a pipeline are from the same "toolset"

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？