获取网页源代码然后以HTML格式保存后,然后点击打开为什么显示的是一堆乱码?
我使用的是vb.net2005,在网上找过不少获取网页源代码的方法
如:
1)使用webbrowser控件,代码如下
RichTextBox1.Text = WebBrowser1.Document.Body.InnerHtml
(以下是以.html格式保存)
Me.SaveFileDialog1.Filter = "web档案,单个文件(*.mht)|*.mht|web档案,仅HTML(*.htm,*.html)|*.html"
Me.SaveFileDialog1.ShowDialog()
Me.SaveFileDialog1.FileName = Me.WebBrowser1.Document.Title.ToString
Dim path As String = Me.SaveFileDialog1.FileName + ".html"
Dim fs As New StreamWriter(path)
fs.Write(RespHTML)
点击打开后为什么是一堆乱码啊?
2)用System.Net命空间下的HttpWebRequest来取得,代码如下
Dim url As String="http://www.163.com/"
Dim httpReq As System.Net.HttpWebRequest
Dim httpResp As System.Net.HttpWebResponse
Dim httpURL As New System.Uri(url)
httpReq = CType(WebRequest.Create(httpURL), HttpWebRequest)
httpReq.Method = "GET"
httpResp = CType(httpReq.GetResponse(), HttpWebResponse)
httpReq.KeepAlive = False
Dim reader As StreamReader =New StreamReader(httpResp.GetResponseStream,System.Text.Encoding.GetEncoding(-0))
Dim respHTML As String = reader.ReadToEnd()
(以下是以.html格式保存)
Me.SaveFileDialog1.Filter = "web档案,单个文件(*.mht)|*.mht|web档案,仅HTML(*.htm,*.html)|*.html"
Me.SaveFileDialog1.ShowDialog()
Me.SaveFileDialog1.FileName = Me.WebBrowser1.Document.Title.ToString
Dim path As String = Me.SaveFileDialog1.FileName + ".html"
Dim fs As New StreamWriter(path)
fs.Write(RespHTML)
按道理说 System.Text.Encoding.GetEncoding(-0))这一段代码就是用来自动调整网页的编码,可是出来的效果依然是一堆乱码
现在都试了好多种了,全部都是一样的效果,实在无奈的说……请各位大哥麻烦帮忙纠正一下代码里面的错误或者发一下另外的获取网页源代码的方法。有时候觉得很麻烦,现在网络上哪怕是vb.net2003的代码,拿到vb.net2005里面用也会出现错误……而这些错误就连vb.net2005里面的纠错功能也无能为力……为了获取网页源代码这个问题我搞了整整一个月都搞不好……真烦哪……
[解决办法]
汗,困了
明天看
用webbrowers去获得body.innerhtml只是body的html,并非是整个源代码