开发者

getting a substring out of a item in yahoo pipes

following situation:

item.
   content => "This is a 48593 test"
   title => "the title"

item.
   content => "This is a 48593 test 3255252"
   title => "the title"

item.
   content => "This 35542 is a 48593 test"
   title => "the title"

item.
   content => "i havent exactly 5 digits 34567654"
   title => "the title"

this is my current item in the console of pipe开发者_运维百科s

no i want to replace "content" with "the last match of a number that has exactly 5 digits. wanted result:

item.
   content => "48593"
   title => "the title"

item.
   content => "48593"
   title => "the title"

item.
   content => "48593"
   title => "the title"

item.
   content => ""
   title => "the title"

is there a way to do this in pypes 2?

please comment if something is unclear


Use the regex module like this:

In item.content replace (.*) with X $1

In item.content replace .*\b(\d{5})\b.* with $1

In item.content replace X .* with nothing (leave field empty)

Here's an example pipe

Some Explanations

  • \d{5} finds exactly five digits
  • \b word boundaries, so that numbers with more digits are not found
  • the X at the beginning marks strings where the regular expression doesn't match to delete them afterwards
  • finding the last number and not the first is the default behavior. Because * is a greedy operator.


sorry , i don't know anything else than Python

but as your problem interested me and that regexes are more or less the same in all the langages, I propose my solution in Python

import re

pat = re.compile("(?:.*((?<!\d)(?:\d{5})(?!\d))|\Z).*")

gh = ("This is a 48593 test",
      "This is a 48593 test 3255252",
      "This 35542 is a 48593 test",
      "i havent exactly 5 digits 34567654")

for x in gh:
    print x
    print 'AAA'+pat.search(x).groups("")[0]+'ZZZ'
    print

results

This is a 48593 test
AAA48593ZZZ

This is a 48593 test 3255252
AAA48593ZZZ

This 35542 is a 48593 test
AAA48593ZZZ

i havent exactly 5 digits 34567654
AAAZZZ

The 'AAA' and 'ZZZ' have no other utility to show that the 4th result gives ""

The "" in groups("") gives the default value "" when there is no match

Otherwise the 4th result would be None :

import re

pat = re.compile("(?:.*((?<!\d)(?:\d{5})(?!\d))|\Z).*")

gh = ("This is a 48593 test",
      "This is a 48593 test 3255252",
      "This 35542 is a 48593 test",
      "i havent exactly 5 digits 34567654")

for x in gh:
    print x
    print pat.search(x).groups()[0]
    print

results in

This is a 48593 test
48593

This is a 48593 test 3255252
48593

This 35542 is a 48593 test
48593

i havent exactly 5 digits 34567654
None
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜