开发者

Scraping html tables in .NET and taking care of colspans

I am trying to scrape HTML tables in my .NET application, however I came across tables that are aggressively using colspan and rowspan attributes on cells causing me headache. I was wondering if there is a library available that can convert a table into an array of strings and taking care of colspan e.g if colspan=5 on a TD element then it will use the value of the TD for the next 5 elements

<table>
<tr>
 <td>1</td>
 <td>2</td>
 <td>3</td>
 <td>4</td>
 <td>5</td>
</tr>
<tr>
  <td colspan=4>1</td>
  <td>2</td>
</tr></table>

the output would be an array of the following:

[1,2,3,4,5] [1,开发者_高级运维1,1,1,2]


you may be able to use ParseControl, which would make the whole thing fairly trivial, since you can access the Colspan property.


You could put it in a XmlDocument and then loop through it. Not sure if that's the best solution, but it works. Maybe LINQ to XML?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜