Read numeric arrays from text file without delimiters

2023-03-08 22:40 问答作者：

I am trying to read some numeric data from a text file but am struggling to read numbers stored without any deliminators. The file format itself is a fairly standard format used in numerous codes around the world and so cannot be changed. The following is a snippet of the head of an example file:

SOME TEXT OF A FIXED LENGTH      33
 3.192839854E+00 3.189751983E+00 3.186795271E+00 3.183874776E+00 3.180986976E+00
 3.178133610E+00 3.175318116E+00 3.172544681E+00 3.169818171E+00 3.167143271E+00
 3.164524875E+00 3.161968464E+00 3.159479193E+00 3.157062171E+00 3.154723040E+00
 3.152466964E+00 3.150299067E+00 3.148224863E+00 3.146249721E+00 3.144379226E+00
 3.142619004E+00 3.140974218E+00 3.139450283E+00 3.138052814E+00 3.136786929E+00
 3.135657986E+00 3.134671499E+00 3.133833067E+00 3.133149899E+00 3.132631559E+00
 3.132282773E+00 3.132080343E+00 3.131954939E+00
-5.487648393E-01-5.476736110E-01-5.447693831E-01-5.405765060E-01-5.353610408E-01
-5.291415409E-01-5.219573970E-01-5.137449740E-01-5.045337620E-01-4.943949468E-01
-4.832213992E-01-4.710109577E-01-4.578747780E-01-4.436967869E-01-4.285062978E-01
-4.123986122E-01-3.952894227E-01-3.开发者_如何学Python771859951E-01-3.580934057E-01-3.379503384E-01
-3.168282028E-01-2.947799605E-01-2.716835737E-01-2.476267515E-01-2.226373818E-01
-1.966313850E-01-1.696421504E-01-1.415353640E-01-1.118510940E-01-8.041086734E-02
-4.968321601E-02-2.772555484E-02-2.631111359E-02
....

The first line contains some comments (of a fixed length) followed by an integer which gives the length of arrays which follow. The arrays themselves are stored as a list of numbers of fixed width. In this case the first array shouldn't cause me any problems. However, as you can see from the second array, all the numbers are negative and thus there are no spaces between the numbers. Therefore, methods such as str.split() will not return a list of numbers. I would be grateful for any suggestions about how best to process this file.

One final bit of information which may be important: the arrays themselves contain newline characters, i.e. the following code

with open('some_file') as fh:
    data = [line for line in fh]

npts = int(data.pop(0).split()[-1])
print data

returns:

[' 3.192839854E+00 3.189751983E+00 3.186795271E+00 3.183874776E+00 3.180986976E+00\n',
 ' 3.178133610E+00 3.175318116E+00 3.172544681E+00 3.169818171E+00 3.167143271E+00\n',
 ' 3.164524875E+00 3.161968464E+00 3.159479193E+00 3.157062171E+00 3.154723040E+00\n',
 ' 3.152466964E+00 3.150299067E+00 3.148224863E+00 3.146249721E+00 3.144379226E+00\n',
 ' 3.142619004E+00 3.140974218E+00 3.139450283E+00 3.138052814E+00 3.136786929E+00\n',
 ' 3.135657986E+00 3.134671499E+00 3.133833067E+00 3.133149899E+00 3.132631559E+00\n',
 ' 3.132282773E+00 3.132080343E+00 3.131954939E+00\n', 
 '-5.487648393E-01-5.476736110E-01-5.447693831E-01-5.405765060E-01-5.353610408E-01\n',
 '-5.291415409E-01-5.219573970E-01-5.137449740E-01-5.045337620E-01-4.943949468E-01\n',
 '-4.832213992E-01-4.710109577E-01-4.578747780E-01-4.436967869E-01-4.285062978E-01\n',
 '-4.123986122E-01-3.952894227E-01-3.771859951E-01-3.580934057E-01-3.379503384E-01\n',
 '-3.168282028E-01-2.947799605E-01-2.716835737E-01-2.476267515E-01-2.226373818E-01\n',
 '-1.966313850E-01-1.696421504E-01-1.415353640E-01-1.118510940E-01-8.041086734E-02\n',
 '-4.968321601E-02-2.772555484E-02-2.631111359E-02\n', ... ]

Hopefully this is relatively clear - let me know if you require more information about the file format.

The arrays themselves are stored as a list of numbers of fixed width.

Since each entry is exactly sixteen characters in width, the following will convert one line of your input file into an list of floats:

In [1]: line = '-5.487648393E-01-5.476736110E-01-5.447693831E-01-5.405765060E-01-5.353610408E-01'

In [2]: [float(line[i:i+16]) for i in xrange(0, len(line), 16)]
Out[2]: 
[-0.54876483929999997,
 -0.547673611,
 -0.5447693831,
 -0.54057650599999996,
 -0.53536104080000002]

Here, I assume that the line does not contain a trailing newline; if it might, str.rstrip can be used to remove it first. The following code snippet also demonstrates how to split the sequence of numbers into chunks of n (note that it doesn't attempt to parse the header line):

n = 33
arr = []
for line in open('data.txt'):
  line = line.rstrip('\n')
  arr.extend(float(line[i:i+16]) for i in xrange(0, len(line), 16))
  if len(arr) >= n:
    print arr[:n]
    arr = arr[n:]

Chris, in this case you should use f.read(size) in order to read number after number.

This should gave you an idea. Also assure that you post the original sample file somewhere to the net so we can test with it, copy and pasting in wiki will probably break their format.

def split_len(seq, length):
    return [seq[i:i+length] for i in range(0, len(seq), length)]

f = open("sample.txt")

header = f.readline()
(a,b,size) = header.rpartition(' ')
size = int(size)
lines = f.readlines()
found = 0
for line in lines:
    for number in split_len(line.rstrip(), 16):
        found = found + 1
        print(number)
        if found==size:
            break

Some pseudocode:

Loop though the line-as-string one character at a time. 
  |-> A. Add each character to a buffer. 
  |-> B. If you hit a space or hyphen character, treat either as a delimiter.
  |---> Add your buffered string to an array of numbers.
  |-> C. Reset buffer.
  |-> D. Repeat A. through C. until you hit a newline character.

How about using a regular expression? The following should definitely work:

>>> import re
>>> ...
>>> data = ' '.join([e[:-1] for e in data]
>>> numbers = re.findall(r'[ \-]\d+\.\d+E[+\-]\d+',data)
>>> numbers
[' 3.192839854E+00', ' 3.189751983E+00', ' 3.186795271E+00', ' 3.183874776E+00', ' ...  
>>> map(float,numbers)
[3.1928398539999998, 3.1897519829999998, 3.1867952709999998, 3.1838747760000001, ...

继续阅读：python

Read numeric arrays from text file without delimiters

更多精彩内容

精彩评论

最新问答

女神异闻录夜幕魅影俱乐部三层解谜技巧攻略具体介绍？

《黑神话：悟空》腿甲玄铁硬足怎么获得？

你们投票我就赢了？

漳州市医院供卵试管移植需要打麻药吗？？

已进驻29家单位！二十届中央第四轮巡视进驻即将完成？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？