How to use python re module to filter the int number by digital
I want to use python re module to filter the int number by the number digital.
1
700
76093
71365
35837
75671
^^
||--------------------- this position should not be 6,7,8,9,0
|---------------------- this position should not be 5,6,7
开发者_运维知识库
Code:
int_list=[1,700,76093,71365,35837,75671]
str_list = [str(x).zfill(5) for x in int_list]
reexp = r"\d[0-4,8-9][1-5]\d\d"
import re
p = re.compile(reexp)
result = [int("".join(str(y) for y in x)) for x in str_list if p.match(x)]
I have 2 questions:
1.Is it possible to generate the reexp string from below code:
thousand_position = set([1,2,3,4,5,1,1,1,1,1,1,1,1,1,1])
hundred_position = set([1,2,3,4,8,9,0,1,2,3,2,3,1,2])
2.how to make the reexp be more simple avoid below 0-prefixed bug?
00700
00500 <--- this will also drops into the reexp, it is a
bug because it has no kilo number
10700
reexp = r"\d[0-4,8-9][1-5]\d\d"
Thanks for your time
B.Rgs
PS: thanks for suggstion for the math solution below, I know it may be easy and faster, but I want the re based version to balance other thoughts.
Are you sure you want to be using the re
module? You can get at what you're trying to do with some simple math operations.
def valid_number(n):
return 0 < n%1000/100 < 6 and not 5 >= n%10000/1000 >= 7
int_list = [1,700,76093,71365,35837,75671,]
result = [x for x in int_list if valid_number(x)]
or alternatively:
result = filter(valid_number, int_list)
Ok, first, I'm going to post some code that actually does what you describe initially:
>>> int_list=[1, 700, 76093, 71365, 35837, 75671]
>>> str_list = [str(i).zfill(5) for i in int_list]
>>> filtered = [s for s in str_list if re.match('\d[0-4,8-9][1-5]\d\d', s)]
>>> filtered
['71365']
Edit: Ok, I think I understand your question now. Instead of using zfill
, you could use rjust
, which will insert spaces instead of zeros.
>>> int_list=[1,700,76093,71365,35837,75671,500]
>>> str_list = [str(i).rjust(5) for i in int_list]
>>> re_str = '\d' + str(list(set([0, 1, 3, 4, 8, 9]))) + str(list(set([1, 2, 3, 4, 5]))) + '\d\d'
>>> filtered = [s for s in str_list if re.match(re_str, s)]
>>> filtered
['71365']
I think doing this mathematically as yan suggests will be faster in the end, but perhaps you have your reasons for using regular expressions.
精彩评论