BeautifulSoup模块的应用

2013-01-09

BeautifulSoup模块的使用import urllibfrom bs4 import BeautifulSouphtml_surllib.urlopen(http://www.

BeautifulSoup模块的使用


import urllib
from bs4 import BeautifulSoup
html_s=urllib.urlopen("http://www.baidu.com").read()

li=BeautifulSoup(html_s).findAll('a')
for i in li:
    print i

输出：
<a href="http://www.hao123.com">hao123</a>
<a href="http://www.baidu.com/more/">更多>></a>
<a href="/" id="seth" onclick="h(this)" onmousedown="return ns_c({'fm':'behs','tab':'homepage','pos':0})">把百度设为主页</a>
<a href="http://ir.baidu.com" onmousedown="return ns_c({'fm':'behs','tab':'btlink','pos':5})">About Baidu</a>
<a href="/duty/" onmousedown="return ns_c({'fm':'behs','tab':'btlink','pos':6})">使用百度前必读</a>
<a href="http://www.miibeian.gov.cn" onmousedown="return ns_c({'fm':'behs','tab':'btlink','pos':7})" target="_blank">京ICP证030173号</a>

我想要href=""中的内容，就是个网址，不知道怎么用了
[解决办法]
看手册先，函数参数支持正则，不难过滤出你要的东西...
[解决办法]
还得用正则的

热点排行

perl python

BeautifulSoup模块的应用