HTMLParser .
??????? if tag == "a":
??????????? if len(attrs) == 0: pass
??????????? else:
??????????????? for (variable, value)? in attrs:
??????????????????? if variable == "href":
??????????????????????? self.links.append(value)
?
if __name__ == "__main__":
??? html_code = """
??? <a href="www.google.com"> google.com</a>
??? """
??? hp = MyHTMLParser()
??? hp.feed(html_code)
??? hp.close()
??? print(hp.links)
输出为:
??? ['www.google.com']
如果想抽取图形链接
??? <img src='http://www.google.com/intl/zh-CN_ALL/images/logo.gif' />
就要重定义 handle_startendtag( tag, attrs) 函数