如何对这样的网页源文件进行正则处理。
VB6。两个文本框,一个命令按钮。
Text1.text中的网页源文件:
计算:分数 </DIV> </DIV>">
</form>
<div class="content">
<div class="pic"> <img src="http://bbs.sina.com/and/di.djd?iid=12345566666666666" alt="英语" width="131" height="176" > </div>
<div class="info">
<p class="title">英语 <a href="http://bbs.sa.com/and/drr=1234556retertretr6666666" onclick="asdsfdsffddffdf;"> <img src=images2009/buyebook.gif style='vertical-align:middle;margin-left:15px'> </a> <a href="http://b.a.com/ad"> <img src=http://b.a.com/ad.ja style='vertical-align:middle;margin-left:15px'> </a> </p>
<p class="st">计算:分数 <span>| </span>听力:98 <span>| </span>写作:99
最终想在Text2.text中显示出
英语|计算:分数|听力:98|写作:99
求准确的正则表达式。
部分源码参见
http://topic.csdn.net/u/20100307/05/c0dcf6b9-4848-49d1-96c7-e9573b9b77ae.html
[解决办法]
其实就是加个()把要提取的SubMatches标识出来
Private Sub Form_Load() MsgBox (RegExpTest2(">([^<]+)<", Text1.Text)) '> 和<间的内容End SubFunction RegExpTest2(patrn, strng) Dim regEx, Match As Match, Matches ' Create variable. Dim RetStr As String, strSubValue Set regEx = New RegExp ' Create a regular expression. regEx.Pattern = patrn ' Set pattern. regEx.IgnoreCase = True ' Set case insensitivity. regEx.Global = True ' Set global applicability. Set Matches = regEx.Execute(strng) ' Execute search. For Each Match In Matches ' Iterate Matches collection. strSubValue = Trim$(Match.SubMatches(0)) '去首尾空格 strSubValue = Replace$(strSubValue, vbCrLf, "") '去回车换行符 If (Len(strSubValue) > 0) Then '有有效内容' RetStr = RetStr & "Match found at position "' RetStr = RetStr & Match.FirstIndex & ". Value is '"' RetStr = RetStr & strSubValue & "'." & vbCrLf RetStr = RetStr & strSubValue End If Next RegExpTest2 = RetStrEnd Function
[解决办法]
试试这个:
Private Sub Form_Load()Dim s As String, ss As Strings = "<dslkjfljfjfk><aklsdjljfdjfoiuofiufoiuofiuofiuofiuofuofuoi??>sdk<dkjofjoifuoiufoiufoufoiuofuofiu><jdoiufoiufouffoiufoiu>地方军阀三剑客《<dlkfjjfsjfl>"Do ss = CutString(1, s, "<", ">") s = Replace(s, ss, "") s = Replace(s, "<>", "", , 1)Loop While CutString(1, s, "<", ">") <> ""Debug.Print sEnd Sub'文本截取函数Public Function CutString(StartNum As Long, InPutString As String, LeftString As String, RightString As String) On Error Resume Next Dim StrLine As Long, StrLine2 As Long StrLine = InStr(StartNum, InPutString, LeftString) + Len(LeftString) StrLine2 = InStr(StrLine, InPutString, RightString) CutString = Mid(InPutString, StrLine, StrLine2 - StrLine)End Function
[解决办法]
Private Sub Form_Load() Dim strData$, strResult$ Dim reg As Object Dim matchs As Object, match As Object Set reg = CreateObject("vbscript.regexp") strData = "网页代码" reg.Global = True reg.IgnoreCase = True reg.Pattern = "[\u4e00-\u9fa5:]+\d*" Set matchs = reg.Execute(strData) For Each match In matchs If InStr(strResult, match & "|") = 0 Then strResult = strResult & match & "|" Next If strResult <> "" Then strResult = Left(strResult, Len(strResult) - 1) MsgBox strResultEnd Sub