使用正则来读取table问题
我读取一个html文件到一个变量里
如何使用正则或者其他方式来获取信息.信息格式如下:
id image
1 t3.jpg
2 t4.jpg
html文件如下:
<table border="2" width=100% > <th><p align="center">Id</p></th> <th><p align="center">image</p></th> <tr> <td>1</td> <td><a href="i3.jpg" target="_blank"><img src = "t3.jpg"></a></td></tr> <tr> <td>2</td> <td><a href="i3.jpg" target="_blank"><img src = "t4.jpg"></a></td></tr></table>
void Main(){var html = @"<table border=""2"" width=100% > <th><p align=""center"">Id</p></th> <th><p align=""center"">image</p></th> <tr> <td>1</td> <td><a href=""i3.jpg"" target=""_blank""><img src = ""t3.jpg""></a></td></tr> <tr> <td>2</td> <td><a href=""i3.jpg"" target=""_blank""><img src = ""t4.jpg""></a></td></tr></table>";var i=0; foreach(Match m in Regex.Matches(html,@"(?i)(?<=(?:>|<img\ssrc\s*=\s*""))[^<>\s]+(?=<|"")")) { i++; Console.Write("{0}\t",m.Value); if(i%2==0) Console.WriteLine(); } /* Id image 1 t3.jpg 2 t4.jpg */}
[解决办法]
string str = @"<table border=""2"" width=100% > <th><p align=""center"">Id</p></th> <th><p align=""center"">image</p></th> <tr> <td>1</td> <td><a href=""i3.jpg"" target=""_blank""><img src = ""t3.jpg""></a></td></tr> <tr> <td>2</td> <td><a href=""i3.jpg"" target=""_blank""><img src = ""t4.jpg""></a></td></tr></table>"; Regex reg = new Regex("(?is)<tr[^>]*?>.*?<td>(?<num>\\d+)</td>.*?<td><a[^>]*?><img[^>]*?\"(?<url>.*?)\"></a></td>.*?</tr>"); foreach (Match item in reg.Matches(str)) { Response.Write(string.Format("num:{0},url:{1}<hr/>", item.Groups["num"].Value, item.Groups["url"].Value)); } Response.Write("--------------------------下面的是字段名称-----------------------------<br/>"); foreach (Match item in Regex.Matches(str, "(?is)<th><p[^>]*?>(?<column>.*?)</p></th>")) { Response.Write(string.Format("column:{0}<hr/>", item.Groups["column"].Value)); }