开发者

creating a function to allow for a header row and row names column

I'm defining a function that will return a list of lists where element zero is the 2Darray, element one is the header information, and element 2 is the rowname. How can I read this in from a file where the

file looks like this:

genes S1 S2 S3 S4 S5

100 -0.243 -0.021 -0.205 -1.283 0.411

10000 -1.178 -0.79 0.063 -0.878 0.011

def input2DarrayData(fn):
    # define twoDarray, headerLine and rowLabels
    twoDarray = []
    # open filehandle
    fh = open(fileName)
    # collect header information


    # read in the rest of the data and organize it into a list of lists
    for line in fh:
        # split line into columns and append to array
        arrayCols = line.strip().split('\t')
        # collect rowname information

        *开发者_StackOverflow中文版*what goes here?**


        # convenient float conversion for each element in the list using the
        # map function. note that this assumes each element is a number and can
        # be cast as a float. see floatizeData(), which gives the explicit
        # example of how the map function works conceptually.
        twoDarray.append(map(float, arrayCols))
    # return data
    return twoDarray

I keep getting an error saying that it can't convert the first word in the file (genes) to a float because it is a string. So my problem is figuring out how to read in just that first line


def input2DarrayData(fn):
    # define twoDarray, headerLine and rowLabels
    twoDarray = []
    headerLine = None
    rowLabels = []
    # open filehandle
    fh = open(fn)

    headerLine = fh.readline()
    headerLine = headerLine.strip().split('\t')

    for line in fh:
        arrayCols = line.strip().split('\t')
        rowLabels.append(arrayCols[0])

        twoDarray.append(map(float, arrayCols[1:]))
    # return data
    return [twoDarray, headerLine, rowLabels]

If this work for you, please read PEP-8 and refactor variable and function names. Also do not forget to close the file. Best use with that closes it for you:

def input2DarrayData(fn):
    ""
    twoDarray = []
    rowLabels = []
    #
    with open(fn) as fh:
       headerLine = fh.readline()
       headerLine = headerLine.strip().split('\t')
       for line in fh:
           arrayCols = line.strip().split('\t')
           rowLabels.append(arrayCols[0])
           twoDarray.append(map(float, arrayCols[1:]))
    #
    return [twoDarray, headerLine, rowLabels]


To handle the header line (first line in the file) consume it explicitly with .readline() before iterating over the remaining lines:

fh = open(fileName)
headers = fh.readline().strip().split('\t')
for line in fh:
    arrayCols = line.strip().split('\t')
    ## etc...

I'm unsure about what data structure you want to get from a file; You seem to imply you want a list per line that includes the headers. Duplicating the headers like that doesn't make too much sense.

Assuming a fairly trivial file structure with header row, and fixed number of colums per line, the following is a generator that yields a dictionary per line using headers as keys, and column values as values:

def process_file(filepath):
    ## open the file
    with open('my_file') as src:
        ## read the first line as headers
        headers = src.readline().strip().split('\t')
        for line in src:
            ## Split the line
            line = line.strip().split('\t')
            ## Coerce each value to a float
            line = [float(col) for col in line]
            ## Create a dictionary using headers and cols
            line_dict = dict(zip(headers, line))
            ## Yield it
            yield line_dict

>>> for row in process_file('path/to/myfile'):
...     print row
>>> 
>>> {'genes':100.00, 'S1':-0.243, 'S2':-0.021, 'S3':-0.205,  'S4': -1.283, 'S5': 0.411}
>>> {'genes':10000.00, 'S1':-1.178, 'S2':-0.79, 'S3':0.063,  'S4': -0.878, 'S5': 0.011}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜