Web scraping: how to get scraper implementation from text link?

2022-12-24 07:52 问答作者：

I'm building a java web media-scraping application for extracting content from a variety of popular websites: youtube, facebook, rapidshare, and so on.

The application will include a search capability to find content urls, but should also allow the user to paste a url into the application if they already where the media is. Youtube Downloader already does this for a variety of video sites.

When the program is supplied with a URL, it decides which kind of scraper to use to get the content; for example, a youtube watch link returns a YoutubeScraper, a Facebook fanpage link returns a FacebookScraper and so on.

Should I use the factory pattern to do this?

My idea is that the factory has one public method. It takes a String argument representing a link, and returns 开发者_如何学Ca suitable implementation of the Scraper interface. I guess the Factory would hold a list of Scraper implementations, and would match the link against each Scraper until it finds a suitable one. If there is no suitable one, it throws an Exception instead.

Sounds like a good idea. You most likely want a singleton with a create(URL url) method. I would recommend you use TDD to do this to get your requirements clearer in your mind.

A factory returning the stuff will be fine. To generalize the attempt, I recommend to use a map for holding implementations, i.e.:

Map<String, Class<Scraper>> scrapers = new HashMap<String, Scraper>();
scraper.put("facebook.com", FacebookScraper.class);
...

Later you can check the url with the keys of the map and instantiate the right class for that content.

继续阅读：factory screen-scraping

Web scraping: how to get scraper implementation from text link?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？