getting a substring out of a item in yahoo pipes
following situation:
item.
content => "This is a 48593 test"
title => "the title"
item.
content => "This is a 48593 test 3255252"
title => "the title"
item.
content => "This 35542 is a 48593 test"
title => "the title"
item.
content => "i havent exactly 5 digits 34567654"
title => "the title"
this is my current item in the console of pipe开发者_运维百科s
no i want to replace "content" with "the last match of a number that has exactly 5 digits. wanted result:
item.
content => "48593"
title => "the title"
item.
content => "48593"
title => "the title"
item.
content => "48593"
title => "the title"
item.
content => ""
title => "the title"
is there a way to do this in pypes 2?
please comment if something is unclear
Use the regex module like this:
In item.content replace (.*)
with X $1
In item.content replace .*\b(\d{5})\b.*
with $1
In item.content replace X .*
with nothing (leave field empty)
Here's an example pipe
Some Explanations
\d{5}
finds exactly five digits\b
word boundaries, so that numbers with more digits are not found- the
X
at the beginning marks strings where the regular expression doesn't match to delete them afterwards - finding the last number and not the first is the default behavior. Because
*
is a greedy operator.
sorry , i don't know anything else than Python
but as your problem interested me and that regexes are more or less the same in all the langages, I propose my solution in Python
import re
pat = re.compile("(?:.*((?<!\d)(?:\d{5})(?!\d))|\Z).*")
gh = ("This is a 48593 test",
"This is a 48593 test 3255252",
"This 35542 is a 48593 test",
"i havent exactly 5 digits 34567654")
for x in gh:
print x
print 'AAA'+pat.search(x).groups("")[0]+'ZZZ'
print
results
This is a 48593 test
AAA48593ZZZ
This is a 48593 test 3255252
AAA48593ZZZ
This 35542 is a 48593 test
AAA48593ZZZ
i havent exactly 5 digits 34567654
AAAZZZ
The 'AAA' and 'ZZZ' have no other utility to show that the 4th result gives ""
The "" in groups("") gives the default value "" when there is no match
Otherwise the 4th result would be None :
import re
pat = re.compile("(?:.*((?<!\d)(?:\d{5})(?!\d))|\Z).*")
gh = ("This is a 48593 test",
"This is a 48593 test 3255252",
"This 35542 is a 48593 test",
"i havent exactly 5 digits 34567654")
for x in gh:
print x
print pat.search(x).groups()[0]
print
results in
This is a 48593 test
48593
This is a 48593 test 3255252
48593
This 35542 is a 48593 test
48593
i havent exactly 5 digits 34567654
None
精彩评论