开发者

Using parallel AWK - Has anyone heard of this?

Is there such a thing? Can anyone kindly elucidate on this? I have been using AWK to perform simple tasks such as printing columns and merging large data file, but not for calculations? I was thinking if one can run AWK parallel using all the nodes and CPUs in my computer or in the network. But how? What is the primary aim using parallel AWK?

Thank you for your input.

After having posted the question, I found out Parallel AWK does exist. Yo开发者_运维知识库u can find more about it. Here is the link http://www.parallel-awk.org/


The problem with a parallel awk implementation is that the semantics explicitly assume that operations are processed in order. For example:

awk '{print NR, $0}'

gives you output akin to cat -n. The difficulty with processing this in parallel is that NR is the total number of lines processed, not just the number of lines in the given file (FNR)

Also, there are more complicated tricks involving commands like getline, which cannot be parallelized (for example, a script can be short-circuited to emulate the gnu nextfile extension)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜