Python to extract data from a file
I am trying to extract the text between that has specific text file:
----
data1
data1
data1
extractme
----
data2
data2
data2
----
data3
data3
extractme
----
and then dump it to text file so that
----
data1
data1
data1
extractme
---开发者_StackOverflow
data3
data3
extractme
---
Thanks for the help.
This works well enough for me. Your sample data is in a file called "data.txt" and the output goes to "result.txt"
inFile = open("data.txt")
outFile = open("result.txt", "w")
buffer = []
keepCurrentSet = True
for line in inFile:
buffer.append(line)
if line.startswith("----"):
#---- starts a new data set
if keepCurrentSet:
outFile.write("".join(buffer))
#now reset our state
keepCurrentSet = False
buffer = []
elif line.startswith("extractme"):
keepCurrentSet = True
inFile.close()
outFile.close()
I imagine the change in number of dashes (4 in the input, sometimes 4 and sometimes 3 in the output) is an error and not actually desired (since no algorithm is even hinted at, to explain how many dashes are to be output on different occasions).
I would structure the task in terms of reading and yielding one block of lines at a time:
def readbyblock(f):
while True:
block = []
for line in f:
if line = '----\n': break
block.append(line)
if not block: break
yield block
so that the (selective) output can be neatly separated from the input:
with open('infile.txt') as fin:
with open('oufile.txt', 'w') as fou:
for block in readbyblock(fin):
if 'extractme\n' in block:
fou.writelines(block)
fou.write('----\n')
This is not optimal, performance-wise, if the blocks are large, since it has a separate loop on all lines in the block implied in the if
clause. So, a good refactoring might be:
def selectivereadbyblock(f, marker='extractme\n'):
while True:
block = []
extract = False
for line in f:
if line = '----\n': break
block.append(line)
if line==marker: extract = True
if not block: break
if extract: yield block
with open('infile.txt') as fin:
with open('oufile.txt', 'w') as fou:
for block in selectivereadbyblock(fin):
fou.writelines(block)
fou.write('----\n')
Parameterizing the separators (now hard-coded as '----\n' for both input and output) is another reasonable coding tweak.
For Python2
#!/usr/bin/env python
with open("infile.txt") as infile:
with open("outfile.txt","w") as outfile:
collector = []
for line in infile:
if line.startswith("----"):
collector = []
collector.append(line)
if line.startswith("extractme"):
for outline in collector:
outfile.write(outline)
For Python3
#!/usr/bin/env python3
with open("infile.txt") as infile, open("outfile.txt","w") as outfile:
collector = []
for line in infile:
if line.startswith("----"):
collector = []
collector.append(line)
if line.startswith("extractme"):
for outline in collector:
outfile.write(outline)
data=open("file").read().split("----")
print '----'.join([ i for i in data if "extractme" in i ])
精彩评论