Get all html between two elements
Problem:
Extract all html between two headers including the headers html. The header text is known, but not the formatting, tag name, etc. They are not within the same parent and might (well, almost for sure) have sub children within it's own children).To clarify: headers could be inside a <h1>
or <div>
or any other tag. They may also be surrounded by <b>
, <i>
, <font>
or more <div>
tags. The key is: the only text within the element is the header text.
The tools I have available are: C# 3.0 utilizing a WebBrowser control, or Jquery/Js.
I've taken the Jquery route, traversing the DOM, but I've ran into the issue of children and adding them appropriately. Here is the code so far:
function getAllBetween(firstEl,lastEl) {
var collection = new Array(); // Collection of Elements
var fefound =false;
$('body').find('*').each(function(){
var curEl = $(this);
if($(curEl).text() == firstEl)
fefound=true;
if($(curEl).text() == lastEl)
return false;
// need something to add children children
// otherwise we get <table></table><tbody></tbody><tr></tr> etc
if (fefound)
collection.push(curEl);
});
var div = document.createElement("DIV");
for (var i=0,len=collection.length;i<len;i++){
$(div).append(collection[i]);
}
return($(div).html());
}
Should I be continueing down this road? With some sort of recursive function check开发者_高级运维ing/handling children, or would a whole new approach be better suited?
For the sake of testing, here is some sample markup:
<body>
<div>
<div>Start</div>
<table><tbody><tr><td>Oops</td></tr></tbody></table>
</div>
<div>
<div>End</div>
</div>
</body>
Any suggestions or thoughts are greatly appreciated!
My thought is a regex, something along the lines of
.*<(?<tag>.+)>Start</\1>(?<found_data>.+)<\1>End</\1>.*
should get you everything between the Start and end div tags.
Here's an idea:
$(function() {
// Get the parent div start is in:
var $elie = $("div:contains(Start)").eq(0), htmlArr = [];
// Push HTML of that div to the HTML array
htmlArr.push($('<div>').append( $elie.clone() ).html());
// Keep moving along and adding to array until we hit END
while($elie.find("div:contains(End)").length != 1) {
$elie = $elie.next();
htmlArr.push($('<div>').append( $elie.clone() ).html());
};
// htmlArr now has the HTML
// let's see what it is:
alert(htmlArr.join(""));
});
Try it out with this jsFiddle example
This takes the entire parent div
that start
is in. I'm not sure that's what you want though. The outerHTML is done by $('<div>').append( element.clone() ).html()
, since outerHTML support is not cross browser yet. All the html is stored in an array, you could also just store the elements in the array.
精彩评论