Using Regex in Yahoo pipes to "clean" RSS feeds
Need some help in creating a Yahoo Pipe that strips certain elements from an rss feed. To clerify: I would use the regex code on Yahoo Pipes. I presume the regex syntax is universal?
I've broken the question up to some sub-questions:
What would be the regex for removing/striping a specific html tag (has its own class)? Content
How can I s开发者_如何学编程trip links from linked images but keep image markup?
How can I add sequential classes to all links found in a feed item? If there are 5 links in a single feed item, they would be given classes: link001, link002, link003, link004, link005...
Due to new account limitation code examples can be found here: Using Regex in Yahoo pipes
Regex is not exactly my forte... so any help would be greatly appreciated! Thanks a lot!
Regular expression syntax certainly isn't universal. See my regex flavor comparison. Unfortunately the Yahoo Pipes docs don't say what regex flavor they use. The examples look like Perl-style regexes, so that's what I'll use.
To remove a specific HTML tag (e.g. span
) with a specific class attribute (e.g. someclass
), search for:
(?si)<span[^<>]*class=["']?someclass["']?[^<>]*>(.*?)</span>
and replace with:
$1
The above regex will fail if the span
tag you're trying to remove contains a nested span
tag.
To delete any a
tag that has an img
tag as the first thing in its content, search for:
(?si)<a[^<>]*>(<img.*?)</a>
and replace with:
$1
The third item in your question cannot be done with regular expressions alone. You'll need a facility to increment the number in the replacement. I don't know if Yahoo Pipes supports something like that. You don't really need a regex. Simply search for the text <a
and replace with <a class="link001"
Of course, all the caveats about manipulating HTML/XML with regular expressions apply. The regexes work on the examples you gave, but they may not work as intended on every possible piece of HTML.
精彩评论