BeautifulSoup 问题
<html>
<li>
<span class = "moken08">*****</span>
<span class = "list">*****</span>
</li>
<li>
<span class = "moken09">*****</span>
<span class = "list">*****</span>
</li>
</html>
我要在每个循环里找到<span class = "moken**">的标签,有什么办法没有
soup.find("span",{"class":""}) 在大括号里的属性值能不能 只匹配前面几个字母
[解决办法]
1. 学会看文档。
2. 没有完整文档时,记得python是一种交互式,有良好reflection支持的语言。下面是我通过试验找出答案的过程:
>>> from BeautifulSoup import BeautifulSoup>>> html = """<html>... <li>... <span class = "moken08">*****</span>... <span class = "list">*****</span>... </li>... <li>... <span class = "moken09">*****</span>... <span class = "list">*****</span>... </li>... </html>""">>> root = BeautifulSoup(html)>>> test = root.find("span", attrs = {"class", "moken*"})Traceback (most recent call last): File "<pyshell#47>", line 1, in <module> test = root.find("span", attrs = {"class", "moken*"}) File "/usr/lib/python2.7/dist-packages/BeautifulSoup.py", line 826, in find l = self.findAll(name, attrs, recursive, text, 1, **kwargs) File "/usr/lib/python2.7/dist-packages/BeautifulSoup.py", line 846, in findAll return self._findAll(name, attrs, text, limit, generator, **kwargs) File "/usr/lib/python2.7/dist-packages/BeautifulSoup.py", line 362, in _findAll found = strainer.search(i) File "/usr/lib/python2.7/dist-packages/BeautifulSoup.py", line 963, in search found = self.searchTag(markup) File "/usr/lib/python2.7/dist-packages/BeautifulSoup.py", line 928, in searchTag for attr, matchAgainst in self.attrs.items():AttributeError: 'set' object has no attribute 'items'>>> test = root.find("span", attrs = {"class": "moken*"})>>> len(test)Traceback (most recent call last): File "<pyshell#49>", line 1, in <module> len(test)TypeError: object of type 'NoneType' has no len()>>> test = root.find("span", attrs = {"class": re.compile("moken*")})Traceback (most recent call last): File "<pyshell#50>", line 1, in <module> test = root.find("span", attrs = {"class": re.compile("moken*")})NameError: name 're' is not defined>>> import re>>> test = root.find("span", attrs = {"class": re.compile("moken*")})>>> len(test)11: 1>>> test[0]Traceback (most recent call last): File "<pyshell#54>", line 1, in <module> test[0] File "/usr/lib/python2.7/dist-packages/BeautifulSoup.py", line 601, in __getitem__ return self._getAttrMap()[key]KeyError: 0>>> test12: <span class="moken08">*****</span>>>> test = root.findAll("span", attrs = {"class": re.compile("moken*")})>>> test13: [<span class="moken08">*****</span>, <span class="moken09">*****</span>]
[解决办法]
这种一般都上正则吧
Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32Type "copyright", "credits" or "license()" for more information.>>> import re>>> s = '''<html> <li> <span class = "moken08">*****</span> <span class = "list">*****</span> </li> <li> <span class = "moken09">*****</span> <span class = "list">*****</span> </li></html>'''>>> res = r'<span class = "moken.*?>.*?<\/span>'>>> match = re.findall(res,s,re.S)>>> len(match)2>>> match['<span class = "moken08">*****</span>', '<span class = "moken09">*****</span>']>>> for m in match: print m <span class = "moken08">*****</span><span class = "moken09">*****</span>>>>