Optimal way to extract a URL from web page loaded via XMLHTTPRequest?
Problem Overview
- I have a dynamically produced web page,
X
, which consists of search results linking to web pages,Y1
,Y2
,Y3
etc. Y1
contains a resource URLR1
,Y2
contains resource URLR2
, and so on.- I would like to dynamically enhance page
X
with links to resourcesR1
,R2
etc.
Possible Solution
I'm currently thinking of using JavaScript and XMLHTTPRequest to retrieve the HTML from web pages Y1
, Y2
, etc., and to then use a regular expression to extract the URL.
Pages Y1
, Y2
, etc. are in the region of 30-100KB HTML each.
Does this sound like a good plan? Or would I be better retrieving each web page in JSON format and extracting the resource URL from there? If HTML is the way to go, do you have any suggested optimisations/short cuts for searching 30-100 KB of text?
You don't want to use regex to extract the URL. I suggest using jQuery to perform the AJAX request, and then use jQuery to parse and filter out the URLs from the HTML that is returned from the server.
jQuery.ajax({
url: "http://my.url.here",
dataType: "html";
...
success: function(data) {
jQuery("a", data).each(function() {
var $link = jQuery(this);
...
...
});
}
...
});
If jQuery is not an option, you can do something like this when you get your response back:
var html = XHR.responseText;
var div = document.createElement("div");
div.innerHTML = html;
//you can now search for nodes inside your div.
//The following gives you all the anchor tags
div.getElementsByTagName('a');
...
精彩评论