首页 诗词 字典 板报 句子 名言 友答 励志 学校 网站地图
当前位置: 首页 > 教程频道 > 开发语言 > VB >

怎么对这样的网页源文件进行正则处理

2012-01-19 
如何对这样的网页源文件进行正则处理。VB6。两个文本框,一个命令按钮。Text1.text中的网页源文件:计算:分数

如何对这样的网页源文件进行正则处理。
VB6。两个文本框,一个命令按钮。 
Text1.text中的网页源文件: 

计算:分数 </DIV> </DIV>"> 
</form> 
<div class="content"> 
<div class="pic"> <img src="http://bbs.sina.com/and/di.djd?iid=12345566666666666" alt="英语" width="131" height="176" > </div> 
<div class="info"> 
<p class="title">英语 <a href="http://bbs.sa.com/and/drr=1234556retertretr6666666" onclick="asdsfdsffddffdf;"> <img src=images2009/buyebook.gif style='vertical-align:middle;margin-left:15px'> </a> <a href="http://b.a.com/ad"> <img src=http://b.a.com/ad.ja style='vertical-align:middle;margin-left:15px'> </a> </p> 
<p class="st">计算:分数 <span>| </span>听力:98 <span>| </span>写作:99 

最终想在Text2.text中显示出 
英语|计算:分数|听力:98|写作:99 


求准确的正则表达式。

部分源码参见
http://topic.csdn.net/u/20100307/05/c0dcf6b9-4848-49d1-96c7-e9573b9b77ae.html


[解决办法]
其实就是加个()把要提取的SubMatches标识出来

VB code
Private Sub Form_Load()    MsgBox (RegExpTest2(">([^<]+)<", Text1.Text))   '> 和<间的内容End SubFunction RegExpTest2(patrn, strng)  Dim regEx, Match As Match, Matches      ' Create variable.  Dim RetStr As String, strSubValue  Set regEx = New RegExp         ' Create a regular expression.  regEx.Pattern = patrn         ' Set pattern.  regEx.IgnoreCase = True         ' Set case insensitivity.  regEx.Global = True         ' Set global applicability.  Set Matches = regEx.Execute(strng)   ' Execute search.  For Each Match In Matches      ' Iterate Matches collection.    strSubValue = Trim$(Match.SubMatches(0))    '去首尾空格    strSubValue = Replace$(strSubValue, vbCrLf, "")   '去回车换行符    If (Len(strSubValue) > 0) Then  '有有效内容'        RetStr = RetStr & "Match found at position "'        RetStr = RetStr & Match.FirstIndex & ". Value is '"'        RetStr = RetStr & strSubValue & "'." & vbCrLf        RetStr = RetStr & strSubValue    End If  Next  RegExpTest2 = RetStrEnd Function
[解决办法]
试试这个:
VB code
Private Sub Form_Load()Dim s As String, ss As Strings = "<dslkjfljfjfk><aklsdjljfdjfoiuofiufoiuofiuofiuofiuofuofuoi??>sdk<dkjofjoifuoiufoiufoufoiuofuofiu><jdoiufoiufouffoiufoiu>地方军阀三剑客《<dlkfjjfsjfl>"Do    ss = CutString(1, s, "<", ">")    s = Replace(s, ss, "")    s = Replace(s, "<>", "", , 1)Loop While CutString(1, s, "<", ">") <> ""Debug.Print sEnd Sub'文本截取函数Public Function CutString(StartNum As Long, InPutString As String, LeftString As String, RightString As String)    On Error Resume Next    Dim StrLine As Long, StrLine2 As Long    StrLine = InStr(StartNum, InPutString, LeftString) + Len(LeftString)    StrLine2 = InStr(StrLine, InPutString, RightString)    CutString = Mid(InPutString, StrLine, StrLine2 - StrLine)End Function
[解决办法]
探讨
上面的显示不正确,用下面的。
HTML code计算:分数</DIV></DIV>"></form><divclass="content"><divclass="ttt"><imgsrc="http://ddddd.dddd.com/ddddd/ddddd.dll?ddd=D98313330373ttt63162626ttt6160tttt659" alt="英语" width="131" height="176"></div><divclass="ttt"><pclass="title">英语<ahref="http://ttttt.tttttttt.com/ttttinfo.tttt?tttttttttttt" onclick="tttVitttnfo(t);"><tttsrc=tttttttttt/butttttt.gifstyle='vertical-align:middle;margin-left:15px'></a><ahref="http://tttt.tttttttt.com/ttttttttpingtttt.aspx?tttt=tttttttt"><tttsrc=tttttttttt/ttttt_ttttt_-if.gifstyle='vertical-align:middle;margin-left:15px'></a></p><pclass="tt">计算:分数<span>|</span>>听力:98<span>|</span>>写作:99<



[解决办法]
按照你原来的思路 不必替换那些标签其实也可以的啊,你越走越远了
上面不能完全替换 可能只是你取其中一部分不规范造成的,如果完整的一个页面可能就ok了
你继续试试下面的代码,应该可以达到你的要求

VB code
Private Sub Form_Load()    Dim strData$, strResult$    Dim reg As Object    Dim matchs As Object, match As Object    Set reg = CreateObject("vbscript.regexp")        strData = "网页代码"    reg.Global = True    reg.IgnoreCase = True    reg.Pattern = "[\u4e00-\u9fa5:]+\d*"        Set matchs = reg.Execute(strData)        For Each match In matchs        If InStr(strResult, match & "|") = 0 Then strResult = strResult & match & "|"    Next    If strResult <> "" Then strResult = Left(strResult, Len(strResult) - 1)        MsgBox strResultEnd Sub 

热点排行