Scraping an HTML table in Common Lisp?

2022-12-21 17:07 问答作者：

I'd like to extract some information from a web page that's contained in an HTML <table>. How can I extract all the table information into a nice | separated file?

Author|Book|Year|Comments
Bill Bryson|Short History of Nearly Everything|2004
Stephen Hawking|A Brief History of Time|1998|Still haven't read.

Ideally, I'd like to have a function that takes a URL and output file as parameters then gives the above output.

(defun extract-table (url file开发者_C百科name)
       (extract-from-html-table (fetch-web-page url)))

(extract-table "http://www.mypage.com" "output.txt")

Sample HTML input for the above output:

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head>
<title>Lisp</title>
</head>
<body>
<h1>Welcome to Lisp</h1>
<table class="any" style="font-size: 14px;">
  <TR class="header">
    <td>Author</td>
    <TD>Book</TD>
    <td>Year</td>
    <td>Comments</td>
  </TR>
  <tr class="odd">
    <td>Bill Bryson</td>
    <td>Short History of Nearly Everything</td>
    <td>2004</td>
  </tr>
  <tr>
    <td>Stephen Hawking</td>
    <td>A Brief History of Time</td>
    <td>1998</td>
    <td>Still haven't read.</td>
  </tr>
</table>
</body>
</html>

Start with Drakma for fetching the data. To parse the thing, you might find cxml helpful. Or better yet: you could use closure-html, which should parse arbitrary HTML 4. The Common-Lisp.net page of the closure-html package has a screen scraping example.

继续阅读：common-lisp

Scraping an HTML table in Common Lisp?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？