开发者

How to go about reading a web page lazily in Clojure

I and a friend recently implemented link grabbing in my Clojure IRC bot. When it sees a link, it slurps the page and grabs the title from the page. The problem is that it has t开发者_JS百科o slurp the ENTIRE page just to grab the link.

How does one go about reading a page lazily until the first </title>?


Use line-seq but don't forget to close the underlying stream when done.


I wouldn't count on the HTML necessarily being split into lines in a sensible way; without looking outside of our own backyard, e.g. Compojure (or Hiccup currently, I guess) doesn't bother inserting line breaks, I believe (update: just checked Hiccup -- no line breaks).

What I'd suggest instead is lazy XML parsing (with clojure.contrib.lazy-xml) on top of a java.io.BufferedInputStream.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜