开发者

Reading custom values in Ebay RSS feed (XML::RSS module)

I've spent entirely way too long trying to figure this out. I'm using XML: RSS and Perl to read / parse an Ebay RSS feed. Within the <item></item> area, I see these entries:

<rx:BuyItNowPrice xmlns:rx="urn:ebay:apis:eBLBaseComponents">1395</rx:BuyItNowPrice>
<rx:CurrentPrice xmlns:rx="urn:ebay:apis:eBLBaseComponents">1255</rx:CurrentPrice>

However, I can't figure out how to grab the details during the loop. I wrote a regex to grab them:

@current_price = $item  =~ m/\<rx\:CurrentPrice.*\>(\d+)\<\/rx\:CurrentPrice\>/g;

Which works if you place the above 'CurrentPrice' entry into a standalone string, but not while the script is 开发者_运维知识库reading through the RSS feed.

I can grab most of the information I want out of the item->description area (# bids, auction end time, BIN price, thumbnail image, etc.), but it would be nicer if I could grab the info from the feed without me having to deal with grabbing all that information manually.

How to grab custom fields from an RSS feed (short of writing regexes to parse the entire feed w/o a module)?

Here's the code I'm working with:

$my_limit = 0;
use LWP::Simple;
use XML::RSS;

$rss = XML::RSS->new();
$data = get( $mylink );
$rss->parse( $data );

$channel = $rss->{channel};

$NumItems = 0;
foreach  $item (@{$rss->{'items'}}) {
if($NumItems > $my_limit){
last;
}

@current_price = $item =~ m/\<rx\:CurrentPrice.*\>(\d+)\<\/rx\:CurrentPrice\>/g;

print "$current_price[0]";

}


If you have the rss/xml document and want specific data you could use XPATH:

Perl CPAN XPATH

XPath Introduction


What is the way in which "it doesn't work" from an RSS feed? Do you mean no matches when there should be matches? Or one match where there should be several matches?

One thing that jumps out at me about your regular expression is that you use .*, which can sometimes be greedier than you want. That is, if $item contained the expression

<rx:BuyItNowPrice xmlns:rx="urn:...nts">1395</rx:BuyItNowPrice>
<rx:CurrentPrice xmlns:rx="urn:...nts">1255</rx:CurrentPrice>
<rx:BuyItNowPrice xmlns:rx="urn:...nts">1395</rx:BuyItNowPrice>
<rx:SomeMoreStuff xmlns:rx="urn:...nts">zzz</rx:BuyItNowPrice>
<rx:CurrentPrice xmlns:rx="urn:...nts">1255</rx:CurrentPrice>

then the first part of your regular expression (\<rx\:CurrentPrice.*\>) will wind up matching everything on lines 2, 3, and 4, plus the first part of line 5 (up to the >). Instead, you might want to use the regular expression1

m/\<rx:CurrentPrice[^>]*>(\d+)\<\/rx:CurrentPrice\>/

which will only match up to the closing </rx:CurrentPrice> tag after a single instance of an opening <rx:CurrentPrice> tag.

1 The other obvious answer is that you really don't want to use a regular expression at all, that regular expressions are inferior tools for parsing XML compared to customized parsing modules, and that all the special cases you will have to deal with using regular expressions will eventually render you unconscious from having repeatedly beaten your head against your desk. See Salgar's answer, for example.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜