BeautifulSoup 有关问题

2012-03-23

BeautifulSoup 问题htmllispan class moken08*****/spanspan class list*****/span/

BeautifulSoup 问题
<html>
<li>
*****
*****
</li>
<li>
*****
*****
</li>
</html>

我要在每个循环里找到的标签，有什么办法没有
soup.find("span",{"class":""}) 在大括号里的属性值能不能只匹配前面几个字母

[解决办法]
1. 学会看文档。

2. 没有完整文档时，记得python是一种交互式，有良好reflection支持的语言。下面是我通过试验找出答案的过程：

Python code

>>> from BeautifulSoup import BeautifulSoup>>> html = """<html>...   <li>...   <span class = "moken08">*****</span>...   <span class = "list">*****</span>...   </li>...   <li>...   <span class = "moken09">*****</span>...   <span class = "list">*****</span>...   </li>... </html>""">>> root = BeautifulSoup(html)>>> test = root.find("span", attrs = {"class", "moken*"})Traceback (most recent call last):  File "<pyshell#47>", line 1, in <module>    test = root.find("span", attrs = {"class", "moken*"})  File "/usr/lib/python2.7/dist-packages/BeautifulSoup.py", line 826, in find    l = self.findAll(name, attrs, recursive, text, 1, **kwargs)  File "/usr/lib/python2.7/dist-packages/BeautifulSoup.py", line 846, in findAll    return self._findAll(name, attrs, text, limit, generator, **kwargs)  File "/usr/lib/python2.7/dist-packages/BeautifulSoup.py", line 362, in _findAll    found = strainer.search(i)  File "/usr/lib/python2.7/dist-packages/BeautifulSoup.py", line 963, in search    found = self.searchTag(markup)  File "/usr/lib/python2.7/dist-packages/BeautifulSoup.py", line 928, in searchTag    for attr, matchAgainst in self.attrs.items():AttributeError: 'set' object has no attribute 'items'>>> test = root.find("span", attrs = {"class": "moken*"})>>> len(test)Traceback (most recent call last):  File "<pyshell#49>", line 1, in <module>    len(test)TypeError: object of type 'NoneType' has no len()>>> test = root.find("span", attrs = {"class": re.compile("moken*")})Traceback (most recent call last):  File "<pyshell#50>", line 1, in <module>    test = root.find("span", attrs = {"class": re.compile("moken*")})NameError: name 're' is not defined>>> import re>>> test = root.find("span", attrs = {"class": re.compile("moken*")})>>> len(test)11: 1>>> test[0]Traceback (most recent call last):  File "<pyshell#54>", line 1, in <module>    test[0]  File "/usr/lib/python2.7/dist-packages/BeautifulSoup.py", line 601, in __getitem__    return self._getAttrMap()[key]KeyError: 0>>> test12: <span class="moken08">*****</span>>>> test = root.findAll("span", attrs = {"class": re.compile("moken*")})>>> test13: [<span class="moken08">*****</span>, <span class="moken09">*****</span>]
[解决办法]
这种一般都上正则吧
Python codePython 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32Type "copyright", "credits" or "license()" for more information.>>> import re>>> s = '''<html>  <li>  <span class = "moken08">*****</span>  <span class = "list">*****</span>  </li>  <li>  <span class = "moken09">*****</span>  <span class = "list">*****</span>  </li></html>'''>>> res = r'<span class = "moken.*?>.*?<\/span>'>>> match = re.findall(res,s,re.S)>>> len(match)2>>> match['<span class = "moken08">*****</span>', '<span class = "moken09">*****</span>']>>> for m in match:    print m    <span class = "moken08">*****</span><span class = "moken09">*****</span>>>>

热点排行

perl python

BeautifulSoup 有关问题