开发者

Why is my code shown as messy while it isn't? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.

Want to improve this question? Update the question so it focuses on one problem only by editing this post.

Closed 3 years ago.

Improve this question
class sss(webapp.RequestHandler):
  def get(self):
    url = "http://www.google.com/"
    result = urlfetch.fetch(url)    
    if result.status_code 开发者_JAVA技巧== 200:
        self.response.out.write(result.content)

When I change code to this:

if result.status_code == 200:
        self.response.out.write(result.content.decode('utf-8').encode('gb2312'))

It shows something strange. What should I do?

When I use this:

self.response.out.write(result.content.decode('big5'))

The page is different with the one I saw Google.com.

How to get Google.com that I saw?


Google is probably serving you ISO-8859-1. At least, that is what they serve me for the User-Agent "AppEngine-Google; (+http://code.google.com/appengine)" (which urlfetch uses). The Content-Type header value is:

text/html; charset=ISO-8859-1

So you would use:

result.content.decode('ISO-8859-1')

If you check result.headers["Content-Type"], your code can adapt to changes on the other end. You can generally pass the charset (ISO-8859-1 in this case) directly to the Python decode method.


how to get google.com that i saw ?

It's probably using relative URLs to images, javascript, CSS, etc, that you're not changing into absolute URLs into google's site. To confirm this: your logs should be showing 404 errors ("page not found") as the browser to which you're serving "just the HTML" tries locating the relative-addressed resources that you're not supplying.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜