Python - making counters, making loops?
I am having some trouble with a piece of code below:
Input: li is a nested list as below:
li = [['>0123456789 mouse gene 1\n', 'ATGTTGGGTT/CTTAGTTG\n', 'ATGGGGTTCCT/A\n'], ['>9876543210 mouse gene 2\n', 'ATTTGGTTTCCT\n', 'ATTCAATTTTAAGGGGGGGG\n']]
Using the function below, my desired output is simply the 2nd to the 9th digits following '>' under the condition that the number of '/' present in the entire sublist is > 1.
Instead, my code gives the digits to all entries. Also, it gives them multiple times. I therefore assume something is wrong with my counter and my 开发者_C百科for loop. I can't quite figure this out.
Any help, greatly appreciated.
import os
cwd = os.getcwd()
def func_one():
outp = open('something.txt', 'w') #output file
li = []
for i in os.listdir(cwd):
if i.endswith('.ext'):
inp = open(i, 'r').readlines()
li.append(inp)
count = 0
lis = []
for i in li:
for j in i:
for k in j[1:] #ignore first entry in sublist
if k == '/':
count += 1
if count > 1:
lis.append(i[0][1:10])
next_func(lis, outp)
Thanks, S :-)
Your indentation is possibly wrong, you should check count > 1
within the for j in i
loop, not within the one that checks every single character in j[1:]
.
Also, here's a much easier way to do the same thing:
def count_slashes(items):
return sum(item.count('/') for item in items)
for item in li:
if count_slashes(item[1:]) > 1:
print item[0][1:10]
Or, if you need the IDs in a list:
result = [item[0][1:10] for item in li if count_slashes(item[1:]) > 1]
Python list comprehensions and generator expressions are really powerful tools, try to learn how to use them as it makes your life much simpler. The count_slashes
function above uses a generator expression, and my last code snippet uses a list comprehension to construct the result list in a nice and concise way.
Tamás has suggested a good solution, although it uses a very different style of coding than you do. Still, since your question was "I am having some trouble with a piece of code below", I think something more is called for.
How to avoid these problems in the future
You've made several mistakes in your approach to getting from "I think I know how to write this code" to having actual working code.
You are using meaningless names for your variables which makes it nearly impossible to understand your code, including for yourself. The thought "but I know what each variable means" is obviously wrong, otherwise you would have managed to solve this yourself. Notice below, where I fix your code, how difficult it is to describe and discuss your code.
You are trying to solve the whole problem at once instead of breaking it down into pieces. Write small functions or pieces of code that do just one thing, one piece at a time. For each piece you work on, get it right and test it to make sure it is right. Then go on writing other pieces which perhaps use pieces you've already got. I'm saying "pieces" but usually this means functions, methods or classes.
Fixing your code
That is what you asked for and nobody else has done so.
You need to move the count = 0
line to after the for i in li:
line (indented appropriately). This will reset the counter for every sub-list. Second, once you have appended to lis
and run your next_func
, you need to break out of the for k in j[1:]
loop and the encompassing for j in i:
loop.
Here's a working code example (without the next_func but you can add that next to the append):
>>> li = [['>0123456789 mouse gene 1\n', 'ATGTTGGGTT/CTTAGTTG\n', 'ATGGGGTTCCT/A\n'], ['>9876543210 mouse gene 2\n', 'ATTTGGTTTCCT\n', 'ATTCAATTTTAAGGGGGGGG\n']]
>>> lis = []
>>> for i in li:
count = 0
for j in i:
break_out = False
for k in j[1:]:
if k == '/':
count += 1
if count > 1:
lis.append(i[0][1:10])
break_out = True
break
if break_out:
break
>>> lis
['012345678']
Re-writing you code to make it readable
This is so you see what I meant in the beginning of my answer.
>>> def count_slashes(gene):
"count the number of '/' character in the DNA sequences of the gene."
count = 0
dna_sequences = gene[1:]
for sequence in dna_sequences:
count += sequence.count('/')
return count
>>> def get_gene_name(gene):
"get the name of the gene"
gene_title_line = gene[0]
gene_name = gene_title_line[1:10]
return gene_name
>>> genes = [['>0123456789 mouse gene 1\n', 'ATGTTGGGTT/CTTAGTTG\n', 'ATGGGGTTCCT/A\n'], ['>9876543210 mouse gene 2\n', 'ATTTGGTTTCCT\n', 'ATTCAATTTTAAGGGGGGGG\n']]
>>> results = []
>>> for gene in genes:
if count_slashes(gene) > 1:
results.append(get_gene_name(gene))
>>> results
['012345678']
>>>
import itertools
import glob
lis = []
with open('output.txt', 'w') as outfile:
for file in glob.iglob('*.ext'):
content = open(file).read()
if content.partition('\n')[2].count('/') > 1:
lis.append(content[1:10])
next_func(lis, outfile)
The reason you digits to all entries, is because you're not resetting the counter.
精彩评论