How parsing works

2023-01-17 15:18 问答作者：

I am trying the sample code for the piracy report. The line of code:

for incident in soup('td', width="90%"):

seraches the soup for an element td with the attribute width="90%", correct? It invokes the __init__ method of the BeautifulStoneSoup class, which eventually invokes SGMLParser.__init__(self)

Am I correct with the class flow above?

The soup looks like this in the report now:

<td class="fabrik_row___jos_fabrik_icc-ccs-piracymap2010___narrations" ><p>22.09.2010: 0236 UTC: Posn: 03:49.9N – 006:54.6E: Off Bonny River: Nigeria.<p/>
<p>About 21 armed pirates in three crafts boarded a pipe layer crane vessel undertow. All crew locked themselves in accommodations. Pirates were able to take one crewmember as hostage. Master called Nigerian naval vessel in vicinity. Later pirates released the crew and left the vessel. All crew safe.<p/></td>

There is no width markup in the te开发者_JS百科xt. I changed the line of code that is searching:

for incident in soup('td', class="fabrik_row___jos_fabrik_icc-ccs-piracymap2010___narrations"):

It appears that class is a reserved word, maybe?

How do I get the current example code to run, and has more changed in the application than just the HTML output?

The URL I am using:

urllib2.urlopen("http://www.icc-ccs.org/index.php?option=com_fabrik&view=table&tableid=534&calculations=0&Itemid=82")

There must be a better way....

import urllib2
from BeautifulSoup import BeautifulSoup
page = urllib2.urlopen("http://www.icc-ccs.org/index.php?option=com_fabrik&view=table&tableid=534&calculations=0&Itemid=82")
soup = BeautifulSoup(page)
soup.find("table",{"class" : "fabrikTable"})
list1 = soup.table.findAll('p', limit=50)
i = 0
imax = 0
for item in list1 :
    imax = imax + 1
while i < imax:
    Itime = list1[i]
    i = i + 2
    Incident = list1[i]
    i = i + 1
    Inext = list1[i] 
    print "Time    ", Itime 
    print "Incident", Incident
    print " " 
    i = i + 1

class is a reserved word and will not work with that method.

This method works but does not return the list:

soup.find("tr", { "class" : "fabrik_row___jos_fabrik_icc-ccs-piracymap2010___narrations" })

And I confirmed the class flow for the parse. The example will run, but the HTML must be parsed with different methods because the width='90%' is no longer in the HTML.

Still working on the proper methods; will post back when I get it working.

继续阅读：urllib2

How parsing works

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？