开发者

Find specific div with RegEx and print content

I'm trying to pull some text from an external website using this script.

It works perfectly, but it gets the entire page. I want to take only the content inside a specific div with the class 'content'. The entire page is put inside the variable 'data', and then this function is created to strip some tags:

function filterData(data){
  data = data.replace(/<?\/body[^>]*>/g,'');
  data = data.replace(/[\r|\n]+/g,'');
  data = data.replace(/<--[\S\s]*?-->/g,'');
  data = data.replace(/<noscript[^>]*>[\S\s]*?<\/noscript>/g,'');
  data = data.replace(/<script[^>]*>[\S\s]*?<\/script>/g,'');
  data = data.replace(/<script.*\/>/,'');
  return data;
}

How would I go about finding the div with the class 'content' and only viewing the content inside that?

UPDATE: Sorry about using RegExes — can you help me to get the content without using RegEx? So, this is my HTML file:

<a href="http://www.eurest.dk/kantiner/228/all.asp?a=9" class="ajaxtrigger">erg</a>
<div id="target" style="width:200px;height:500px;"></div>
开发者_C百科<div id="code" style="width:200px;height:200px;"></div>
<script src="http://code.jquery.com/jquery.min.js"></script>
<script>
$(document).ready(function(){
var container = $('#target');
$('.ajaxtrigger').click(function(){
doAjax($(this).attr('href'));
return false;
});
function doAjax(url){
if(url.match('^http')){
$.getJSON("http://query.yahooapis.com/v1/public/yql?"+
            "q=select%20*%20from%20html%20where%20url%3D%22"+
            encodeURIComponent(url)+
            "%22&format=xml'&callback=?",
    function(data){
      if(data.results[0]){
        var tree = string2dom(data.results[0]);
        container.html($("div.content", tree.doc));tree.destroy();
      } else {
        var errormsg = '<p>Error: could not load the page.</p>';
        container.html(errormsg);
      }
    }
  );
} else {
  $('#target').load(url);
}
}
function filterData(data){

return tree;
}
});
</script>


Try something like this:

var matches = data.match(/<div class="content">([^<]*)<\/div>/);

if (matches) 
    return matches[1]; // div content


try this:

<div\b[^>]*class="content"[^>]*>([\s\S]*?)<\/div>


Here try this :

<div[^>]*?class='content'[^>]*?>(.*?)</div>

Captured reference /1 will have your content. Although you shouldn't be doing this with regexes :)


this may help you:

    var divtxt = match(/<div[^>]*class="content"[^>]>.*<\/div>/);

but it may stop at the wrong .

you should use jquery or prototype to make it a dom-object and use selectors to find the right div. using jquery you would do something like:

    var divtxt = $(data).find(".content").first().html();

remember to load the jquery library first.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜