About python loop

2023-03-15 18:50 问答作者：

I have two path

path1 = "/home/x/nearline"
path2 = "/home/x/sge_jobs_output"

In path1, I have a bunch of fastq files:

ERR001268_1.recal.fastq.gz
ERR001268_2.recal.fastq.gz
ERR001269_1.recal.fastq.gz
ERR001269_2.recal.fastq.gz
.............

In path2, I have many .txt corresponding to the fastq files in path1:

ERR001268_1.txt
ERR001268_2.txt
ERR001269_1.txt
ERR001269_2.txt
.............

NOw I've made script to calculate fastq_seq_num from fastq files in path1, see below:

for file in os.listdir(path1):
  if re.match('.*\.recal.fastq.gz', file):
    fullpath1 = os.path.join(path1, file)
#To calculate the sequence number in fastq.gz files  
    result = commands.getoutput('zcat ' + fullpath1 + ' |wc -l')
    fastq_seq_num = int(result)/4.0
  print file,fastq_seq_num

And also calculate num_seq_processed_sai from .txt files in path2, see below:

for file in os.listdir(path2):
  if re.match('.*\.txt', file):
      fullpath2 = os.path.join(path2, file)
#To calculate how many sequences have been processed in .sai file
      linelist = open (fullpath2,'r').readlines
      lastline = linelist[len(linelist)-1]
      num_seq_processed_sai = lastline.split(']')[1].split()[0]
  print file,num_seq_processed_sai

OK, now my problem is : I want to create a loop, in which I calculate fastq_seq_num for the FIRST fastq file in path1; then calcula开发者_StackOverflow社区te num_seq_processed for the FIRST txt file in path2; then compare this two numbers; then end the loop. Then the second loop starts...How can I desgin some loop to achieve this? thanks!!!

Iterate the fastq files, check for the corresponding .txt file and if the pair exists, run your processing and compare the outputs

import commands
import glob
from os import path

dir1 = '/home/x/nearline'
dir2 = '/home/x/sge_jobs_output'

for filepath in glob.glob(path.join(dir1, '*.recal.fastq.gz')):
    filename = path.basename(filepath)
    job_id = filename.split('.', 1)[0]

    ## Look for corresponding .txt file
    txt_filepath = path.join(dir2, '%s.txt' % job_id)
    ## Fail early if corresponding .txt file is missing
    if not path.isfile(txt_filepath):
        print('Missing %s for %s' % (txt_filepath, filepath))
        continue

    ## Both exist, process each
    ## This is from your code snippet
    result = commands.getoutput('zcat ' + fullpath1 + ' |wc -l')
    fastq_seq_num = int(result)/4.0

    linelist = open(txt_filepath).readlines()
    lastline = linelist[len(linelist)-1]
    num_seq_processed_sai = lastline.split(']')[1].split()[0]

    if fastq_seq_num == num_seq_processed_sai:
        print "Sequence numbers match (%d : %d)" % (fastq_seq_num, num_seq_processed_sai)
    else:
        print "Sequence numbers do not match (%d : %d)" % (fastq_seq_num, num_seq_processed_sai)

I suggest using glob.glob() to list your files (as I've done in this snippet).

I also suggest replacing commands with subprocess, it's newer and I think commands is deprecated.

Also, reading all lines of a .txt file to get at the last line is inefficient and may cause memory issues if the files are large. Consider calling tail to get the last line (*nix only I think).

Are the number of files in both directories guaranteed to be the same? If so, you can use the zip function to accomplish this:

for fastqFile, txtFile in zip(glob.glob(path1+'/*.recal.fastq.gz'), glob.glob(path2+'/*.txt')):
    result = commands.getoutput('zcat ' + fastqFile + ' |wc -l')
    fastq_seq_num = int(result)/4.0

    lastline = linelist[-1]
    num_seq_processed_sai = lastline.split(']')[1].split()[0]


    print fastqFile, fastq_seq_num 
    print txtFile, num_seq_processed_sai

EDIT: As side notes, using glob.glob() is almost always preferable to manually filtering the output of os.listdir() and the word file is a built-in type in Python, which you should never use as a variable name. In addition, to get to the last item of a list, you only need to access it with listName[-1]. Accessing it with listName[len(listName)-1] is unpythonic.

继续阅读：fastq loops python

About python loop

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？