Better way to filter the rows with python
import pprint
full_key_list = set(["F1", "F2", "F3", "F4", "F5"]) # all expected field
filt_key_list = set(["F2", "F5"]) # fields should not be included
cont_list = [] # stores all filtered documents
read_in_cont1 = { "F1" : 1, "F2" : True, "F3" : 'abc', "F4" : 130, "F5" : 'X1Z'} # document1
read_in_cont2 = { "F1" : 2, "F2" : False, "F3" : 'efg', "F4" : 100, "F5" : 'X4Z'} # document1
read_in_cont3 = { "F1" : 3, "F2" : True, "F3" : 'acd', "F4" : 400, "F5" : 'X2Z'} # document1
# assume that read_in_conts contains list of documents
read_in_conts = [read_in_cont1, read_in_cont2, read_in_cont3]
for one_item in read_in_conts: # for each document in the list
cont_dict = {}
for key, value in one_item.iteritems():
if key not in fi开发者_运维百科lt_key_list: # if the field should be included
cont_dict[key] = value # add this field to the temporary document
cont_list.append(cont_dict)
pprint.pprint(cont_list)
Output:
[{'F1': 1, 'F3': 'abc', 'F4': 130},
{'F1': 2, 'F3': 'efg', 'F4': 100},
{'F1': 3, 'F3': 'acd', 'F4': 400}]
Here is what I want to achieve:
Given an original raw collection of documents (i.e. read_in_conts for simulation), I need to filter the fields so that they are not included in further process. Above is my implementation in Python. However, I think it is too heavy and expect to see a clean solution for this task.
Thank you
cont_list = [dict((k,v) for k,v in d.iteritems() if k not in filt_key_list)
for d in read_in_conts]
or if you want a slightly more factored version:
filter_out_keys = lambda d, x: dict((k,v) for k,v in d.iteritems() if k not in x)
cont_list = [filter_out_keys(d, filt_key_list) for d in read_in_conts]
P.S. I'd suggest making filt_key_list
a set()
instead - it will make in
checks faster.
def filter_dict(d, keys):
return dict((key, value) for key, value in d.iteritems() if key not in filt_key_list))
cont_list = [filter_dict(d, filt_key_list) for d in read_in_conts]
You code is fine. You can make it slightly shorter:
# sets can be faster if `ignored_keys` is actually much longer
ignored_keys = set(["F2", "F5"])
# the inline version of your loop
# a dict comprehension inside a list comprehension
filtered = [{k : v for k,v in row.iteritems() if k not in ignored_keys}
for row in read_in_conts]
精彩评论