开发者

Screen Scraping - still not working

I have browsed through many posts on this and have tried some of the suggestions but still not understanding it fully. I would like to scrape html pages that have some script running that usually 开发者_运维百科executes the script to display a link after clicking. Some mentioned firebug and others talked about reverse engineering the code I need. But after trying reverse engineering I still dont see how to get the data after tracing the script function.

jQuery('.category-selector').toggle(
        function() {
            var categoryList = jQuery('#category-list');
            categoryList.css('top', jQuery(this).offset().top+43);
            jQuery('.category-selector img').attr        ('src', '/images/up_arrow.png');
            categoryList.removeClass('nodisplay');
        },
        function() {
            var categoryList = jQuery('#category-list');
            jQuery('.category-selector img').attr('src', '/images/down_arrow.png');
            categoryList.addClass('nodisplay');
        }
    );

    jQuery('.category-item a').click(
        function(){

            idToShow = jQuery(this).attr('id').substr(9);
            hideAllExcept(jQuery('#category_' + idToShow));
            jQuery('.category-item a').removeClass('activeLink');
            jQuery(this).addClass('activeLink');
        }
    );

I am using vb.net and some sites were easy using firebug where looking at the script I was able to pull the data that I needed. What woudl I do in this scenario? the link is http://featured.typepad.com/ and the categories are what I am trying to access. Notice the url does not change. Appreciate any responses.


My best suggestion would be to use Selenium for screen scraping. It is normally used for automated website testing but would fit your case well. I've used to screen scrape AJAX pages on multiple occasions where the page was heavily Javascript dependent.

http://seleniumhq.org/projects/ide/

You can write your screen scraping code to run in .NET and it can use Firefox or IE to run your screen scraping with.

With selenium what you'll do is record a screen scraping session with the Selenium IDE in Firefox (look for the Firefox extension in the link above). That screen scraping session can either output an HTML template or C# code. It might be able to output VB as well.

You'll copy the C# or VB.NET output from the screen scrape into a selenium .NET project that you'll create and then run the Selenium project through Nunit.

I'd suggest looking online for some help with getting Selenium started and working but this should get you on your way.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜