Processing RSS feed content in Drupal
I am currently working with a third party, who provide product information to us in an RSS feed.
Our Drupal 6 site imports from each feed (categories, essentially) the RSS stories (products), and displays the resulting nodes in our shop pages. We use the Feeds module for this.
So far, so good, and this system has been working for over a year now.
My question is, how would I go about extracting more of the RSS feed content?
What I mean is, at the moment the prices are part of the feed, but our site doesn't have the price as an entity in the database, it just has a blob of HTML.
I want to have the price in a custom CCK field so we can be a bit more clever with how we list things and so on.
I've never gone much beyond themeing Drupal, but I am comfortable with PHP / XPath / the DOM, so I'm sure this is possible if I can just work out how to hook in and parse the HTML content of the feed myself.
Rather than hack something together that may be sub-optimal 开发者_开发问答in some way, can anyone suggest how best should I do this? A custom Feed Import module? Some other hook in Drupal that post-processes nodes?
Edit:
To clarify, we currently use the Feeds module (6.x-1.0-beta), and map the RSS title, description, date etc to CCK fields.
What I would like to do is go one step further, and parse the HTML content of the RSS 'description' field.
Update:
http://drupal.org/project/feedapi_scraper
This looks like it does sort of what I'm after, but doesn't look widely used, which always makes me a bit nervous with Drupal modules. I'll give it a go and report back.
I'm pretty sure the Feed Element Mapper module will do it all for you :-)
From the module page:
Add-on module for FeedAPI that maps elements on a feed item such as tags or the author name to taxonomy or CCK fields. These mappings are configurable by point and click.
Looks like a good solution is this module:
http://drupal.org/project/feeds_xpathparser
It supports using arbitrary XPaths to extract information from your source feeds.
精彩评论