开发者

Better way to filter the rows with python

import pprint

full_key_list = set(["F1", "F2", "F3", "F4", "F5"]) # all expected field
filt_key_list = set(["F2", "F5"])                   # fields should not be included

cont_list = []                                 # stores all filtered documents

read_in_cont1 = { "F1" : 1, "F2" : True,  "F3" : 'abc', "F4" : 130, "F5" : 'X1Z'} # document1
read_in_cont2 = { "F1" : 2, "F2" : False, "F3" : 'efg', "F4" : 100, "F5" : 'X4Z'} # document1
read_in_cont3 = { "F1" : 3, "F2" : True,  "F3" : 'acd', "F4" : 400, "F5" : 'X2Z'} # document1

# assume that read_in_conts contains list of documents
read_in_conts = [read_in_cont1, read_in_cont2, read_in_cont3]

for one_item in read_in_conts: # for each document in the list
    cont_dict = {}
    for key, value in one_item.iteritems():
        if key not in fi开发者_运维百科lt_key_list: # if the field should be included
            cont_dict[key] = value   # add this field to the temporary document
    cont_list.append(cont_dict)

pprint.pprint(cont_list)

Output:

[{'F1': 1, 'F3': 'abc', 'F4': 130},
 {'F1': 2, 'F3': 'efg', 'F4': 100},
 {'F1': 3, 'F3': 'acd', 'F4': 400}]

Here is what I want to achieve:

Given an original raw collection of documents (i.e. read_in_conts for simulation), I need to filter the fields so that they are not included in further process. Above is my implementation in Python. However, I think it is too heavy and expect to see a clean solution for this task.

Thank you


cont_list = [dict((k,v) for k,v in d.iteritems() if k not in filt_key_list)
             for d in read_in_conts]

or if you want a slightly more factored version:

filter_out_keys = lambda d, x: dict((k,v) for k,v in d.iteritems() if k not in x)
cont_list = [filter_out_keys(d, filt_key_list) for d in read_in_conts]

P.S. I'd suggest making filt_key_list a set() instead - it will make in checks faster.


def filter_dict(d, keys):
    return dict((key, value) for key, value in d.iteritems() if key not in filt_key_list))

cont_list = [filter_dict(d, filt_key_list) for d in read_in_conts]


You code is fine. You can make it slightly shorter:

# sets can be faster if `ignored_keys` is actually much longer
ignored_keys = set(["F2", "F5"]) 

# the inline version of your loop
# a dict comprehension inside a list comprehension 
filtered = [{k : v for k,v in row.iteritems() if k not in ignored_keys}
            for row in read_in_conts]
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜