vb.net2008 如何采集一个网站
vb.net2008 如何采集一个网站,如何采集到网页源码 ,分析html,找出需要的内容啊?
[解决办法]
WebClient client = new WebClient();
client.Encoding = Encoding.GetEncoding("UTF-8");
string html = client.DownloadString("");
[解决办法]
很早以前写的一个抓取网页的函数,看看对你有帮助没?
Public Shared Function getHtml(ByVal urlS As String) As String
If IsNothing(urlS) Or urlS = String.Empty Then
Return String.Empty
End If
Dim myRequest As HttpWebRequest = HttpWebRequest.Create(urlS)
Dim myResponse As HttpWebResponse
Try
'Dim str As String = "ISO-8859-1"
'MsgBox(myResponse.CharacterSet)
'If str = myResponse.CharacterSet Then
'search.keyWords.Encoding = System.Text.Encoding.Default
myRequest.Timeout = 20000
myResponse = CType(myRequest.GetResponse(), HttpWebResponse)
myRequest.AllowAutoRedirect = False
If myResponse.StatusCode = HttpStatusCode.BadRequest Then
myResponse.Close()
myRequest.Abort()
Return String.Empty
End If
Dim sr As New StreamReader(myResponse.GetResponseStream, system.encoding.default)
Dim StrB As New StringBuilder()
StrB.Append(Trim(sr.ReadToEnd))
sr.Close()
myResponse.Close()
myRequest.Abort()
Return StrB.ToString
Catch
If Not IsNothing(myRequest) Then
myRequest.Abort()
If Not IsNothing(myResponse) Then
myResponse.Close()
End If
End If
Return String.Empty
End Try
End Function
[解决办法]
立即这个代码就行了
WebClient client = new WebClient();
client.Encoding = Encoding.GetEncoding("UTF-8");
string html = client.DownloadString("");
留名 方便以后有需要
[解决办法]
第一步:把所有页面下载到本地
第二步:分析页面结构
第三步:通过正则表达式不断去掉没用的内容,找到规律做成2纬数组
第四步:当有二维数组的时候,就什么都有了。