开发者

Get all html between two elements

Problem:

Extract all html between two headers including the headers html. The header text is known, but not the formatting, tag name, etc. They are not within the same parent and might (well, almost for sure) have sub children within it's own children).

To clarify: headers could be inside a <h1> or <div> or any other tag. They may also be surrounded by <b>, <i>, <font> or more <div> tags. The key is: the only text within the element is the header text.

The tools I have available are: C# 3.0 utilizing a WebBrowser control, or Jquery/Js.

I've taken the Jquery route, traversing the DOM, but I've ran into the issue of children and adding them appropriately. Here is the code so far:

function getAllBetween(firstEl,lastEl) {
    var collection = new Array(); // Collection of Elements
    var fefound =false;
    $('body').find('*').each(function(){
        var curEl = $(this);
        if($(curEl).text() == firstEl) 
            fefound=true;
        if($(curEl).text() == lastEl) 
            return false;

        // need something to add children children
        // otherwise we get <table></table><tbody></tbody><tr></tr> etc
        if (fefound)
            collection.push(curEl);
    });
    var div = document.createElement("DIV");
    for (var i=0,len=collection.length;i<len;i++){
        $(div).append(collection[i]);
    }
    return($(div).html());
}

Should I be continueing down this road? With some sort of recursive function check开发者_高级运维ing/handling children, or would a whole new approach be better suited?

For the sake of testing, here is some sample markup:

<body>
<div>
<div>Start</div>
<table><tbody><tr><td>Oops</td></tr></tbody></table>
</div>
<div>
<div>End</div>
</div>
</body>

Any suggestions or thoughts are greatly appreciated!


My thought is a regex, something along the lines of

.*<(?<tag>.+)>Start</\1>(?<found_data>.+)<\1>End</\1>.*

should get you everything between the Start and end div tags.


Here's an idea:

$(function() {
      // Get the parent div start is in:
    var $elie = $("div:contains(Start)").eq(0), htmlArr = [];

      // Push HTML of that div to the HTML array
    htmlArr.push($('<div>').append( $elie.clone() ).html());

      // Keep moving along and  adding to array until we hit END
    while($elie.find("div:contains(End)").length != 1) {
        $elie = $elie.next();       
        htmlArr.push($('<div>').append( $elie.clone() ).html());
    };

      // htmlArr now has the HTML
      // let's see what it is:
    alert(htmlArr.join(""));
});​

Try it out with this jsFiddle example


This takes the entire parent div that start is in. I'm not sure that's what you want though. The outerHTML is done by $('<div>').append( element.clone() ).html(), since outerHTML support is not cross browser yet. All the html is stored in an array, you could also just store the elements in the array.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜