Why is my code shown as messy while it isn't? [closed]
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this questionclass sss(webapp.RequestHandler):
def get(self):
url = "http://www.google.com/"
result = urlfetch.fetch(url)
if result.status_code 开发者_JAVA技巧== 200:
self.response.out.write(result.content)
When I change code to this:
if result.status_code == 200:
self.response.out.write(result.content.decode('utf-8').encode('gb2312'))
It shows something strange. What should I do?
When I use this:
self.response.out.write(result.content.decode('big5'))
The page is different with the one I saw Google.com.
How to get Google.com that I saw?
Google is probably serving you ISO-8859-1. At least, that is what they serve me for the User-Agent "AppEngine-Google; (+http://code.google.com/appengine)" (which urlfetch uses). The Content-Type header value is:
text/html; charset=ISO-8859-1
So you would use:
result.content.decode('ISO-8859-1')
If you check result.headers["Content-Type"]
, your code can adapt to changes on the other end. You can generally pass the charset (ISO-8859-1 in this case) directly to the Python decode method.
how to get google.com that i saw ?
It's probably using relative URLs to images, javascript, CSS, etc, that you're not changing into absolute URLs into google's site. To confirm this: your logs should be showing 404 errors ("page not found") as the browser to which you're serving "just the HTML" tries locating the relative-addressed resources that you're not supplying.
精彩评论