Scraping html tables in .NET and taking care of colspans
I am trying to scrape HTML tables in my .NET application, however I came across tables that are aggressively using colspan and rowspan attributes on cells causing me headache. I was wondering if there is a library available that can convert a table into an array of strings and taking care of colspan e.g if colspan=5 on a TD element then it will use the value of the TD for the next 5 elements
<table>
<tr>
 <td>1</td>
 <td>2</td>
 <td>3</td>
 <td>4</td>
 <td>5</td>
</tr>
<tr>
  <td colspan=4>1</td>
  <td>2</td>
</tr></table>
the output would be an array of the following:
[1,2,3,4,5] [1,开发者_高级运维1,1,1,2]
you may be able to use ParseControl, which would make the whole thing fairly trivial, since you can access the Colspan property.
You could put it in a XmlDocument and then loop through it. Not sure if that's the best solution, but it works. Maybe LINQ to XML?
 
         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论