首页 诗词 字典 板报 句子 名言 友答 励志 学校 网站地图
当前位置: 首页 > 教程频道 > 开发语言 > perl python >

python 抓取网页数据做成排行

2013-10-22 
python 抓取网页数据做成排名针对http://ulive.univs.cn/event/event/template/index/265.shtml中间部分的

python 抓取网页数据做成排名

针对    http://ulive.univs.cn/event/event/template/index/265.shtml   中间部分的排名,抓取网站名字,跟点开链接后所获得的票数制成一张网站跟票数对应的表


调用requests模块跟 BeautifulSoap模块来处理。十分迅速


python 抓取网页数据做成排行


 输出时就命令行里重定向一下就好了 

python QueryVote > 20131021.txt


不过这时候会碰到编码的问题。

异常: 'ascii' codec can't encode characters 

字符集的问题,在文件前加两句话: 

reload(sys) 
sys.setdefaultencoding( "utf-8" ) 

然后再加一条 Schools.sort(key = lamba x : x[1]) 就排序完了。。


最终的代码是:

from bs4 import BeautifulSoupimport requestsimport sysimport timefrom operator import itemgetter, attrgetterimport rereload(sys)sys.setdefaultencoding("utf-8")#coding:utf-8Target='http://ulive.univs.cn/event/event/template/index/265.shtml'Host = 'http://ulive.univs.cn's = requests.session()r1 = s.get(Target)soup=BeautifulSoup(r1.text)Schools=[]Ar=soup.findAll(attrs={'title':True})Ar=Ar[:-2]for oc in Ar:    string=str(oc)    string=string[string.index("href")+6:]    URL=string[:string.index('"')]    newURL=Host+URL    voteId=newURL[-12:-6]    newRequests=s.get(newURL)    if newRequests:        newSoap = BeautifulSoup(newRequests.text)        if newSoap:            QueryPiao=newSoap.find("span",id="voteNum265-"+voteId)    if QueryPiao :QueryPiao=QueryPiao.text            QueryPiao=eval(QueryPiao)            schoolName=oc.text            Schools.append((oc.text,QueryPiao))    else:print oc.text + "QueryPiao is error"        else:            print oc.text+" soap is error"    else:        print oc.text + " is error"Schools.sort(key = lambda x:x[1])print "Current Time is : "print time.strftime('"%Y-%m-%d"',time.localtime(time.time()))i = len(Schools)-1t = 1while i >0 :i=i-1print ''print str(t)t = t + 1print Schools[i][0]print str(Schools[i][1])print ''exit(0)


热点排行