Importing data from a text file using python

2023-01-03 11:58 问答作者：

I have a text file containing data in rows and columns (~17000 rows in total). Each column is a uniform number of characters long, with the 'unused' characters filled in by spaces. For example, the first column is 11 characters long, but the last four characters in that column are always spaces (so that it appears to be a nice column when viewed with a text editor). Sometimes it's more than four if the entry is less than 7 characters.

The columns are not otherwise separat开发者_C百科ed by commas, tabs, or spaces. They are also not all the same number of characters (the first two are 11, the next two are 8 and the last one is 5 - but again, some are spaces).

What I want to do is import the entires (which are numbers) in the last two columns if the second column contains the string 'OW' somewhere in it. Any help would be greatly appreciated.

Python's struct.unpack is probably the quickest way to split fixed-length fields. Here's a function that will lazily read your file and return tuples of numbers that match your criteria:

import struct

def parsefile(filename):
    with open(filename) as myfile:
        for line in myfile:
            line = line.rstrip('\n')
            fields = struct.unpack('11s11s8s8s5s', line)
            if 'OW' in fields[1]:
                yield (int(fields[3]), int(fields[4]))

Usage:

if __name__ == '__main__':
    for field in parsefile('file.txt'):
        print field

Test data:

1234567890a1234567890a123456781234567812345
something  maybe OW d 111111118888888855555
aaaaa      bbbbb      1234    1212121233333
other thinganother OW 121212  6666666644444

Output:

(88888888, 55555)
(66666666, 44444)

In Python you can extract a substring at known positions using a slice - this is normally done with the list[start:end] syntax. However you can also create slice objects that you can use later to do the indexing.

So you can do something like this:

columns = [slice(11,22), slice(30,38), slice(38,44)]

myfile = open('some/file/path')
for line in myfile:
    fields = [line[column].strip() for column in columns]
    if "OW" in fields[0]:
        value1 = int(fields[1])
        value12 = int(fields[2]) 
        ....

Separating out the slices into a list makes it easy to change the code if the data format changes, or you need to do stuff with the other fields.

Here's a function which might help you:

def rows(f, columnSizes):
    while True:
        row = {}
        for (key, size) in columnSizes:
            value = f.read(size)
            if len(value) < size: # EOF
                return
            row[key] = value
        yield row

for an example of how it's used:

from StringIO import StringIO

sample = StringIO("""aaabbbccc
d  e  f  
g  h  i  
""")

for row in rows(sample, [('first', 3),
                         ('second', 3),
                         ('third', 4)]):
    print repr(row)

Note that unlike the other answers, this example is not line-delimited (it uses the file purely as a provider of bytes, not an iterator of lines), since you specifically mentioned that the fields were not separated, I assumed that the rows might not be either; the newline is taken into account specifically.

You can test if one string is a substring of another with the 'in' operator. For example,

>>> 'OW' in 'hello'
False
>>> 'OW' in 'helOWlo'
True

So in this case, you might do

if 'OW' in row['third']:
    stuff()

but you can obviously test any field for any value as you see fit.

entries = ((float(line[30:38]), float(line[38:43])) for line in myfile if "OW" in line[11:22])

for num1, num2 in entries:
  # whatever

entries = []
with open('my_file.txt', 'r') as f:
  for line in f.read().splitlines()
    line = line.split()
    if line[1].find('OW') >= 0
      entries.append( ( int(line[-2]) , int(line[-1]) ) )

entries is an array containing tuples of the last two entries

edit: oops

继续阅读：python

Importing data from a text file using python

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？