[python]抓取源码print为“?”
同一个网站的网页,有的能正常获取源码,有的获取不到或获取到内容却print为“?”
以下是获取网页源码函数:
def GetMainUrlSource (self, urlTail):
"""
获取网页源码
urlTail:网址
self.m_main_URL:域名
"""
Gconn = httplib.HTTPConnection(self.m_main_URL)
endFlag = False; #出逃标志
floopNum = 0;
while (not endFlag):
try:
Gconn.request("GET", urlTail)
r1 = Gconn.getresponse()
break
except:
#可能链接断了,重试
print unicode("下载失败,重试第%d次..", "utf-8").encode("cp936") % (floopNum + 1)
Gconn.close()
Gconn = httplib.HTTPConnection(self.m_main_URL)
floopNum += 1
if (floopNum == 10): #十次重试后出逃
endFlag = True
continue
try:
if (r1.status == 200):
return r1.read()
else:
return ""
except:
return ""