Python ClientForm Error

2022-12-10 03:37 问答作者：

import ClientForm from urllib2 import urlopen

page = urlopen('http://garciainteractive.com/blog/topic_view/topics/content/')
form = ClientForm.ParseResponse(page, backwards_compat=False)
print form[0]

The problem is that ClientForm parses the first html form the following way:

<POST http://garciainteractive.com/blog/topic_view/topics/content/ application/x-www-form-urlencoded
  <HiddenControl(ACT=1) (readonly)>
  <HiddenControl(RET=http://garciainteractive.com/blog/topic_view/topics/content/) (readonly)>
  <HiddenControl(URI=/blog/topic_view/topics/content/) (readonly)>
  <HiddenControl(PRV=) (readonly)>
  <HiddenControl(XID=d840927d4eaf95cef7aeca789009fb3991f574da) (readonly)>
  <HiddenControl(entry_id=42) (readonly)>
  <HiddenControl(site_id=1) (readonly)>
  &开发者_如何转开发lt;CheckboxControl(save_info=[yes])>
  <CheckboxControl(notify_me=[yes])>
  <TextControl(captcha=)>
  <SubmitControl(submit=Submit) (readonly)>>

Thus, not finding name, email and url inputs. How can I fix it? TIA

Update: Actually, I'm not using ClientForm separately, but as a part of mechanize, thus would prefer a solution allowing to fix without rewriting mechanize code

The problem is likely that the HTML itself is invalid - for example it re-uses the id="comment_form" over and over again, while there is only supposed to be one id of a given name per document.

Your best solution would probably be to use BeautifulSoup to parse your urlopen page result first, then pretty-print it back into a string for ClientForm - this is likely to get rid of most of the rough edges and give ClientForm a better chance of doing its thing.

If this doesn't work, get a pretty-print of the result out and work out what kind of transform you'll have to do on the HTML to make the form very simple for ClientForm - by removing extraneous tags and cruft.

As Richard suggested use BeautifulSoup.

from BeautifulSoup import BeautifulSoup, SoupStrainer
from StringIO import StringIO
from urllib2 import urlopen
import ClientForm

url='http://garciainteractive.com/blog/topic_view/topics/content/'           

html=urlopen(url).read()
forms_filter=SoupStrainer('form',id="comment_form")
soup = BeautifulSoup(html,parseOnlyThese=forms_filter)
forms = ClientForm.ParseFile(StringIO(soup),"", backwards_compat=False)
forms[0]['name']='Kalmi'
forms[0]['email']='kalmi@..com'

继续阅读：clientform python

Python ClientForm Error

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？