asp数据采集
想做一个网站,而且网站的部分内容是从别的网站抓取数据的,请那个高手给一个实例!
目前我做出来!能给我发的我的邮箱里 吗?谢谢了!zhang_qi_ao@126.com
[解决办法]
time1=timerdim reg,vUrl,VBody,code,time1,time2,titlevUrl=inputbox("请输入有效网址,必须以http://开头","请输入网址","")path=left(Wscript.ScriptFullName,len(Wscript.ScriptFullName)-len(Wscript.ScriptName)) str=""reg="\<meta.+ charset= {0,}([^\"" \>\/]*).+\/{0,1}\>"VBody=GetResStr(vUrl)code=GetCode(VBody,reg)title=GetCode(VBody,"\<title\>(.*)\<\/title\>")time2=timertim=formatnumber((time2-time1)*1000,2)&"MS"str=str&"页面标题:"&title&vbcrlfstr=str&"使用时间:"&tim&vbcrlffname=path&"\"&getname()&".html"WritFile replaceStr(VBody),fname'WritFile VBody,fname '若不清除img标签请使用这个str=str&"文件已经成功保存到"&fnameWScript.echo strFunction WritFile(str,file) '写入文件函数 SavePath=file Set objAso = CreateObject("ADODB.Stream") objAso.Type=2 objAso.Mode = 3 objAso.Open objAso.Position =0 objAso.Writetext str objAso.SaveToFile SavePath,2 objAso.Close set objAso=nothingEnd Function'函数名:GetResStr'作用:获取指定URL的HTML代码'参数:URL-要获取的URLfunction GetResStr(URL)err.cleardim ResBody,ResStr,PageCode,ReturnStrSet Http=createobject("MiCROSOFT.XMLHTTP") Http.open "GET",URL,False Http.Send() If Http.Readystate =4 Then If Http.status=200 Then ResStr=http.responseText ResBody=http.responseBody PageCode=GetCode(ResStr,reg) ReturnStr=BytesToBstr(http.responseBody,PageCode) GetResStr=ReturnStr End If End If End Function'函数名:BytesToBstr'作用:转换二进制数据为字符'参数:Body-二进制数据,Cset-文本编码方式Function BytesToBstr(Body,Cset) Dim Objstream Set Objstream = CreateObject("adodb.stream") objstream.Type = 1 objstream.Mode =3 objstream.Open objstream.Write body objstream.Position = 0 objstream.Type = 2 objstream.Charset =Cset BytesToBstr = objstream.ReadText objstream.Close set objstream = nothing End Function '函数名:GetCode'作用:转换二进制为字符'参数:str-待查询字符串,regstr-正则表达式Function GetCode(str,regstr)Dim Reg,serStrset Reg= new RegExpReg.IgnoreCase = TrueReg.MultiLine = TrueReg.Pattern =regstrif Reg.test(str) then '若查询到匹配项 Set Cols = Reg.Execute(str) serStr=Cols(0).SubMatches(0) '使用匹配到的第一个匹配项else '否则给个默认值gb2312,有点省懒法,如果页面没给出编码格式,想知道确实有点麻烦 serStr="gb2312"end ifGetCode=serStrend function'函数名:getname'作用:获得随机文件名'参数:无function getname() dim y,m,d,h,mm,s,r Randomize() y=year(now) m=month(now):if m<10 then m="0"&m d=day(now):if d<10 then d="0"&d h=hour(now):if h<10 then h="0"&h mm=minute(now):if mm<10 then mm="0"&mm s=second(now):if s<10 then s="0"&s r=cint(rnd()*10):if r<10 then r="0"&r getname=y&m&d&h&mm&s&rend function'函数名:replaceStr'作用:替换指定字符'参数:strcontent-待替换字符串Function replaceStr(strcontent) dim re Set re=new RegExp re.IgnoreCase =true re.Global=True re.pattern="\<img[^\<\>\/].*(\/|)\>" '祛除所有img标签 strcontent=re.replace(strcontent,"") set re=Nothing replaceStr=strcontent End Function