Fetch multiple, external URLs with GM_xmlhttpRequest, add page <H1> to links?
SOLVED thanks to Hellion's help!
Here is the code:
// ==UserScript==
// @name Facebook Comment Moderation Links
// @description Appends story titles to Facebook Comment Moderation "Visit Website" links
// @include http*://developers.facebook.com/tools/*
// ==/UserScript==
var allLinks, thisLink, expr, pageTitle, myURL, myPage, pageContent, title;
// grabbing URLs
function fetchPage(myPage, targetLink) {
GM_xmlhttpRequest({
method: 'GET',
url: myPage,
onload: function(response){
// get the HTML content of the page
pageContent = response.responseText;
// use regex to extract its h1 tag
pageTitle = pageContent.match(/<h1.*?>(.*?)<\/h1>/g)[0];
// strip html tags from the result
pageTitle = pageTitle.replace(/<.*?>/g, '');
// append headline to Visit Website link
title = document.createElement('div');
title.style.backgroundColor = "yellow";
title.style.color = "#000";
title.appendChild(document.createTextNode(pageTitle));
targetLink.parentNode.in开发者_Go百科sertBefore(title, targetLink.nextSibling);
}
});
}
function processLinks() {
// define which links to look for
expr = "//a[contains (string(), 'Visit Website')]";
allLinks = document.evaluate(
expr,
document,
null,
XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE,
null);
// loop through the links
for (var i = 0; i < allLinks.snapshotLength; i++) {
thisLink = allLinks.snapshotItem(i);
myURL = thisLink.getAttribute('href');
// follow Visit Website link and attach corresponding headline
fetchPage(myURL, thisLink);
}
}
// get the ball rolling
processLinks();
--- EARLIER STUFF BELOW ---
I am trying to make a Greasemonkey script that fetches the URL from each of a set of links and appends the contents of the page's h1 tag to the end of the link.
So far, I can get it to show the URL itself, which doesn't require a page request, but not the page's h1 tag contents, which does.
I understand from other questions on this site that GM_xmlhttpRequest is asynchronous and I am pretty sure this is at least part of the cause. However I cannot find the solution to this specific problem.
Below is the code I have so far. It is for Facebook's website comment moderation tool -- in the Moderator View, each comment has a link, "Visit Website," that takes you to the article the comment is on.
As it is written right now, it would append the HTTP status code, not the page title, and then the URL to each "Visit Website" link. The status code part is just a placeholder. I plan on adding the HTML parsing, etc. to get the h1 tag later.
Right now I am just trying to get the GM_xmlhttpRequest and the content insertion to match up.
Any help is sorting this out would be greatly appreciated. Thank you!
var allLinks, thisLink, expr, pageTitle, myURL, pageContent, title;
// define which links to process
expr = "//a[contains (string(), 'Visit Website')]";
allLinks = document.evaluate(
expr,
document,
null,
XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE,
null);
// cycle through links
for (var i = 0; i < allLinks.snapshotLength; i++) {
thisLink = allLinks.snapshotItem(i);
myURL = thisLink.getAttribute('href');
GM_xmlhttpRequest({
method: 'GET',
url: myURL,
onload: function(responseDetails){
pageTitle = responseDetails.status;
}
});
// append info to end of each link
title = document.createElement('div');
title.style.backgroundColor = "yellow";
title.style.color = "#000";
title.appendChild(document.createTextNode(
' [' + pageTitle + ' - ' + thisLink.getAttribute('href') + ']'));
thisLink.parentNode.insertBefore(title, thisLink.nextSibling);
}
As it's written, yes, you suffer from the asynchronous nature of the GM_xmlhttpRequest()
call. The loop will fire off and start fetching all the pageTitle values, but will immediately continue on, not waiting for the requests to complete, and so pageTitle (which you didn't declare anywhere, by the way) is null when you use it for the textNode.
The first step you need to take to rectify the situation is to move all of the stuff that currently follows the GM_xmlhttpRequest()
call to the inside of the onload: function()
definition. Then, only after each page has been retrieved will you continue on with modifying your links. (There may be other issues with needing to pass in or reacquire the thislink
value too, I'm not sure.)
You can change the following 3 lines to only 1 line:
// get the HTML content of the page pageContent = response.responseText; // use regex to extract its h1 tag pageTitle = pageContent.match(/<h1.*?>(.*?)<\/h1>/g)[0]; // strip html tags from the result pageTitle = pageTitle.replace(/<.*?>/g, '');
pageTitle = $('h1', response.response).text();
精彩评论