开发者

Read numeric arrays from text file without delimiters

I am trying to read some numeric data from a text file but am struggling to read numbers stored without any deliminators. The file format itself is a fairly standard format used in numerous codes around the world and so cannot be changed. The following is a snippet of the head of an example file:

SOME TEXT OF A FIXED LENGTH      33
 3.192839854E+00 3.189751983E+00 3.186795271E+00 3.183874776E+00 3.180986976E+00
 3.178133610E+00 3.175318116E+00 3.172544681E+00 3.169818171E+00 3.167143271E+00
 3.164524875E+00 3.161968464E+00 3.159479193E+00 3.157062171E+00 3.154723040E+00
 3.152466964E+00 3.150299067E+00 3.148224863E+00 3.146249721E+00 3.144379226E+00
 3.142619004E+00 3.140974218E+00 3.139450283E+00 3.138052814E+00 3.136786929E+00
 3.135657986E+00 3.134671499E+00 3.133833067E+00 3.133149899E+00 3.132631559E+00
 3.132282773E+00 3.132080343E+00 3.131954939E+00
-5.487648393E-01-5.476736110E-01-5.447693831E-01-5.405765060E-01-5.353610408E-01
-5.291415409E-01-5.219573970E-01-5.137449740E-01-5.045337620E-01-4.943949468E-01
-4.832213992E-01-4.710109577E-01-4.578747780E-01-4.436967869E-01-4.285062978E-01
-4.123986122E-01-3.952894227E-01-3.开发者_如何学Python771859951E-01-3.580934057E-01-3.379503384E-01
-3.168282028E-01-2.947799605E-01-2.716835737E-01-2.476267515E-01-2.226373818E-01
-1.966313850E-01-1.696421504E-01-1.415353640E-01-1.118510940E-01-8.041086734E-02
-4.968321601E-02-2.772555484E-02-2.631111359E-02
....

The first line contains some comments (of a fixed length) followed by an integer which gives the length of arrays which follow. The arrays themselves are stored as a list of numbers of fixed width. In this case the first array shouldn't cause me any problems. However, as you can see from the second array, all the numbers are negative and thus there are no spaces between the numbers. Therefore, methods such as str.split() will not return a list of numbers. I would be grateful for any suggestions about how best to process this file.

One final bit of information which may be important: the arrays themselves contain newline characters, i.e. the following code

with open('some_file') as fh:
    data = [line for line in fh]

npts = int(data.pop(0).split()[-1])
print data

returns:

[' 3.192839854E+00 3.189751983E+00 3.186795271E+00 3.183874776E+00 3.180986976E+00\n',
 ' 3.178133610E+00 3.175318116E+00 3.172544681E+00 3.169818171E+00 3.167143271E+00\n',
 ' 3.164524875E+00 3.161968464E+00 3.159479193E+00 3.157062171E+00 3.154723040E+00\n',
 ' 3.152466964E+00 3.150299067E+00 3.148224863E+00 3.146249721E+00 3.144379226E+00\n',
 ' 3.142619004E+00 3.140974218E+00 3.139450283E+00 3.138052814E+00 3.136786929E+00\n',
 ' 3.135657986E+00 3.134671499E+00 3.133833067E+00 3.133149899E+00 3.132631559E+00\n',
 ' 3.132282773E+00 3.132080343E+00 3.131954939E+00\n', 
 '-5.487648393E-01-5.476736110E-01-5.447693831E-01-5.405765060E-01-5.353610408E-01\n',
 '-5.291415409E-01-5.219573970E-01-5.137449740E-01-5.045337620E-01-4.943949468E-01\n',
 '-4.832213992E-01-4.710109577E-01-4.578747780E-01-4.436967869E-01-4.285062978E-01\n',
 '-4.123986122E-01-3.952894227E-01-3.771859951E-01-3.580934057E-01-3.379503384E-01\n',
 '-3.168282028E-01-2.947799605E-01-2.716835737E-01-2.476267515E-01-2.226373818E-01\n',
 '-1.966313850E-01-1.696421504E-01-1.415353640E-01-1.118510940E-01-8.041086734E-02\n',
 '-4.968321601E-02-2.772555484E-02-2.631111359E-02\n', ... ]

Hopefully this is relatively clear - let me know if you require more information about the file format.


The arrays themselves are stored as a list of numbers of fixed width.

Since each entry is exactly sixteen characters in width, the following will convert one line of your input file into an list of floats:

In [1]: line = '-5.487648393E-01-5.476736110E-01-5.447693831E-01-5.405765060E-01-5.353610408E-01'

In [2]: [float(line[i:i+16]) for i in xrange(0, len(line), 16)]
Out[2]: 
[-0.54876483929999997,
 -0.547673611,
 -0.5447693831,
 -0.54057650599999996,
 -0.53536104080000002]

Here, I assume that the line does not contain a trailing newline; if it might, str.rstrip can be used to remove it first. The following code snippet also demonstrates how to split the sequence of numbers into chunks of n (note that it doesn't attempt to parse the header line):

n = 33
arr = []
for line in open('data.txt'):
  line = line.rstrip('\n')
  arr.extend(float(line[i:i+16]) for i in xrange(0, len(line), 16))
  if len(arr) >= n:
    print arr[:n]
    arr = arr[n:]


Chris, in this case you should use f.read(size) in order to read number after number.

This should gave you an idea. Also assure that you post the original sample file somewhere to the net so we can test with it, copy and pasting in wiki will probably break their format.

def split_len(seq, length):
    return [seq[i:i+length] for i in range(0, len(seq), length)]

f = open("sample.txt")

header = f.readline()
(a,b,size) = header.rpartition(' ')
size = int(size)
lines = f.readlines()
found = 0
for line in lines:
    for number in split_len(line.rstrip(), 16):
        found = found + 1
        print(number)
        if found==size:
            break


Some pseudocode:

Loop though the line-as-string one character at a time. 
  |-> A. Add each character to a buffer. 
  |-> B. If you hit a space or hyphen character, treat either as a delimiter.
  |---> Add your buffered string to an array of numbers.
  |-> C. Reset buffer.
  |-> D. Repeat A. through C. until you hit a newline character.


How about using a regular expression? The following should definitely work:

>>> import re
>>> ...
>>> data = ' '.join([e[:-1] for e in data]
>>> numbers = re.findall(r'[ \-]\d+\.\d+E[+\-]\d+',data)
>>> numbers
[' 3.192839854E+00', ' 3.189751983E+00', ' 3.186795271E+00', ' 3.183874776E+00', ' ...  
>>> map(float,numbers)
[3.1928398539999998, 3.1897519829999998, 3.1867952709999998, 3.1838747760000001, ...
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜