Schedule sending http requests to particular site

2023-03-06 23:13 问答作者：

I want some way to be notified whenever a new result appears for search query in particular site. The site does not provide any feature(via RSS, alerts ..etc) for this. One way I think to accomplish this would be to send http request (for search) and process http response to send mail for any new resu开发者_如何学Clt which comes up.The search parameters can be static or better taken from a source (like a csv file). Does anyone know of an existing solution/s preferably online which can accomplish this.

Thanks, Jeet

Try iHook, it allows you to schedule (as frequent as 1 minute) HTTP requests to public web resources, and receive rule-based email notifications. You can create notification rules around response status code and response body (via JSON expression and CSS selector).

That would depend on the particular site you want to query.

I know of no open-source solution "out of the box" to do this so I believe you'd need to write a custom spider/crawler to accomplish your task; it would need to provide the following services:

Scheduling - when the crawl should occur. Typically the 'cron' system service in Unix-like systems or the Task Scheduler in Windows are used.
Retrieval - retrieving targeted pages. Using either a scripting language like Perl or a dedicated system tool like 'curl' or 'wget'.
Extraction / Normalization - removing everything from the target (retrieved page) except the content of interest. Needed to compensate for changing sections of the target that are not germane to the task, like dates or advertising. Typically accomplished via a scripting language that supports regular expressions (for trivial cases) or an HTML parser library (for more specialized extractions).
Checksumming - converting the target into a unique identifier determined by its content. Used to determine changes to the target since the last crawl. Accomplished by a system tool (such as the Linux 'cksum' command) or a scripting language.
Change detection - comparing the previously saved checksum for the last retrieved target with the newly computed checksum for the current retrieval. Again, typically using a scripting language.
Alerting - informing users of identified changes. Typically via email or text message.
State management - storing target URIs, extraction rules, user preferences and target checksums from the previous run. Both configuration files or databases (like Mysql) are used.

Please note that this list of services attempts to describe the system in abstract and so sounds a lot more complicated than the actual tool you create will be. I've written several systems like this before so I expect a simple solution written in Perl (utilizing standard Perl modules) and running on Linux would require a hundred lines or so for a couple of target sites depending on extraction complexity.

继续阅读：automation http mashup search

Schedule sending http requests to particular site

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？