Python3.2.2如何获取网页编码方式?
我用如下方式获取网页并解析:
f = urllib.request.urlopen('http://news.163.com/')print (f.info())contype = f.headers['Content-Type']pos = contype.find('=')if -1 != pos: contype = contype[pos+1:len(contype)]print (contype)my = MyHTMLParser()data = f.read().decode(contype,'ignore')
E:\codes\komodoprj>c:\python32\python.exe temp.pyhttp://news.163.comServer: nginxDate: Wed, 21 Dec 2011 01:28:57 GMTContent-Type: text/html; charset=GBKContent-Length: 330690Last-Modified: Wed, 21 Dec 2011 01:28:02 GMTVary: Accept-EncodingExpires: Wed, 21 Dec 2011 01:30:17 GMTCache-Control: max-age=80Via: zw5136Accept-Ranges: bytesAge: 41Powered-By-ChinaCache: HIT from CHN-HZ-8-3P2Connection: closeGBK==================================================http://news.qq.comServer: nginx/1.0.6Date: Wed, 21 Dec 2011 01:29:38 GMTContent-Type: text/html; charset=GB2312Transfer-Encoding: chunkedConnection: closeVary: Accept-EncodingExpires: Wed, 21 Dec 2011 01:44:38 GMTCache-Control: max-age=900Vary: Accept-EncodingContent-Encoding: gzipX-Cache: HIT from rainny.qq.comGB2312