creating a function to allow for a header row and row names column
I'm defining a function that will return a list of lists where element zero is the 2Darray, element one is the header information, and element 2 is the rowname. How can I read this in from a file where the
file looks like this:
genes S1 S2 S3 S4 S5
100 -0.243 -0.021 -0.205 -1.283 0.411
10000 -1.178 -0.79 0.063 -0.878 0.011
def input2DarrayData(fn):
# define twoDarray, headerLine and rowLabels
twoDarray = []
# open filehandle
fh = open(fileName)
# collect header information
# read in the rest of the data and organize it into a list of lists
for line in fh:
# split line into columns and append to array
arrayCols = line.strip().split('\t')
# collect rowname information
*开发者_StackOverflow中文版*what goes here?**
# convenient float conversion for each element in the list using the
# map function. note that this assumes each element is a number and can
# be cast as a float. see floatizeData(), which gives the explicit
# example of how the map function works conceptually.
twoDarray.append(map(float, arrayCols))
# return data
return twoDarray
I keep getting an error saying that it can't convert the first word in the file (genes) to a float because it is a string. So my problem is figuring out how to read in just that first line
def input2DarrayData(fn):
# define twoDarray, headerLine and rowLabels
twoDarray = []
headerLine = None
rowLabels = []
# open filehandle
fh = open(fn)
headerLine = fh.readline()
headerLine = headerLine.strip().split('\t')
for line in fh:
arrayCols = line.strip().split('\t')
rowLabels.append(arrayCols[0])
twoDarray.append(map(float, arrayCols[1:]))
# return data
return [twoDarray, headerLine, rowLabels]
If this work for you, please read PEP-8 and refactor variable and function names. Also do not forget to close the file. Best use with
that closes it for you:
def input2DarrayData(fn):
""
twoDarray = []
rowLabels = []
#
with open(fn) as fh:
headerLine = fh.readline()
headerLine = headerLine.strip().split('\t')
for line in fh:
arrayCols = line.strip().split('\t')
rowLabels.append(arrayCols[0])
twoDarray.append(map(float, arrayCols[1:]))
#
return [twoDarray, headerLine, rowLabels]
To handle the header line (first line in the file) consume it explicitly with .readline()
before iterating over the remaining lines:
fh = open(fileName)
headers = fh.readline().strip().split('\t')
for line in fh:
arrayCols = line.strip().split('\t')
## etc...
I'm unsure about what data structure you want to get from a file; You seem to imply you want a list per line that includes the headers. Duplicating the headers like that doesn't make too much sense.
Assuming a fairly trivial file structure with header row, and fixed number of colums per line, the following is a generator that yields a dictionary per line using headers as keys, and column values as values:
def process_file(filepath):
## open the file
with open('my_file') as src:
## read the first line as headers
headers = src.readline().strip().split('\t')
for line in src:
## Split the line
line = line.strip().split('\t')
## Coerce each value to a float
line = [float(col) for col in line]
## Create a dictionary using headers and cols
line_dict = dict(zip(headers, line))
## Yield it
yield line_dict
>>> for row in process_file('path/to/myfile'):
... print row
>>>
>>> {'genes':100.00, 'S1':-0.243, 'S2':-0.021, 'S3':-0.205, 'S4': -1.283, 'S5': 0.411}
>>> {'genes':10000.00, 'S1':-1.178, 'S2':-0.79, 'S3':0.063, 'S4': -0.878, 'S5': 0.011}
精彩评论