using cat with perl replacement expression string with GNU Parallel in R

2022-12-07 18:00 问答作者：

I'm trying to use GNU Parallel to parallelize an argument.

The software itself is a Python package, which I've successfully tested on the command line (I'm using a Mac). I've been testing executing the command line argument in R via a system() argument. Here is what I have so far:

system(paste("parallel --jobs 2 --dry-run eval 'mhcflurry-predict --alleles {=1 s/[,]/ /g; =} --peptides cat {2} --out {1/.}_{2/.}_pred.csv", "' ::: cat ", ciwdfiles, " ::: ", pepfiles, sep =""))

Let's say ciwdfiles is a vector like (C1.txt C2.txt), and pepfiles is a vector like (pep1.txt pep2.txt), where the files are delimited by a space. C1.txt and C2.txt look something like "A01:01,A01:02" and "A01:03, A02:01". I want to run mhcflurry-predict on these inputs with parallel jobs. In the example above, I would have a total of four jobs (C1.txt with pep1.txt, C1.txt with pep2.txt, C2.txt with pep1.txt, and C2.txt with pep2.txt).

However, I have to modify the contents of C1.txt and C2.txt on the fly by replacing the comma with a space. I am able to accomplish this with parallel's built in perl expression replacement string feature {=1 s/[,]/ /g; =}. In order for this to work, I have to cat the contents of ciwdfiles as the input. This impacts the parallelization, as the ciwdfiles are catted into one file, instead of being two separate files.

So, how can I feed the contents of C1.txt and C2.txt to the perl replacement string without using cat in my input specification? Alternatively, how can I manipulate C1.txt and C2.txt on the fly, and pass that to --alleles?

I've also tried to step away from using the perl replacement string and tried using sed and pipeart instead, to no avail:

parallel eval 'mhcflurry-predict --alleles -a {1} --pipepart 'sed -r "s/[,]+/\ /g"' --peptides cat {2}--out /Users/tran/predictions.csv' ::: ciwdfiles ::: pepfiles I also tried this using sed instead of catting:

system(paste("parallel --jobs 2 --dry-run eval 'mhcflurry-predict --alleles {1} --peptides cat {2}--out {1/.}_{2/.}_pred.csv", "' :::sed -r 's/[,]+/ /g' ", ciwdfiles, "::: ", pepfiles, sep =""))

This sort of works. With the space as the replacement, the contents of the file get are broken up. Here are the results of the dry-run:

eval mhcflurry-predict --alleles 'HLA-A01:01' --peptides cat pep.txt --out 'HLA-A01:01'_pep_pred.csv eval mhcflurry-predict --alleles 'HLA-A01:01' --peptides cat pep2.txt --out 'HLA-A01:01'_pep2_pred.csv eval mhcflurry-predict --alleles 'HLA-A01:02' --peptides cat pep.txt --out 'HLA-A01:02'_pep_pred.csv eval mhcflurry-predict --alleles 'HLA-A01:02' --peptides cat pep2.txt --out 'HLA-A01:02'_pep2_pred.csv eval mhcflurry-predict --alleles 'HLA-A01:03' --peptides cat pep.txt --out 'HLA-A01:03'_pep_pred.csv eval mhcflurry-predict --alleles 'HLA-A01:03' --peptides cat pep2.txt --out 'HLA-A01:03'_pep2_pred.csv eval mhcflurry-predict --alleles 'HLA-A02:01' --peptides cat pep.txt --out 'HLA-A02:01'_pep_pred.csv eval mhcflurry-predict --alleles 'HLA-A02:01' --peptides cat pep2.txt --out 'HLA-A02:01'_pep2_pred.csv

If I don't use an underscore as the replacement (sed -r 's/[,]+/_/g), it works fine:

eval mhcflurry-predict --alleles 'HLA-A01:01_HLA-A01:02' --peptides cat pep.txt --out 'HLA-A01:01_HLA-A01:02'_pep_pred.csv eval mhcflurry-predict --alleles 'HLA-A01:01_HLA-A01:02' --peptides cat pep2.txt --out 'HLA-A01:01_HLA-A01:02'_pep2_pred.csv eval mhcflurry-predict --alleles 'HLA-A01:03_HLA-A02:01' --peptides cat pep.txt --out 'HLA-A01:03_HLA-A02:01'_pep_pred.csv eval mhcflurry-predict --alleles 'HLA-A01:03_HLA-A02:01' --peptides cat pep2.txt --out 'HLA-A01:03_HLA-A02:01'_pep2_pred.csv

However, I need the de开发者_如何学Golimiter to be a space, as that's the only structure that will be accepted.

继续阅读：gnu gnu-parallel r

using cat with perl replacement expression string with GNU Parallel in R

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？