开发者

decode tinyurl in R to get full url path?

开发者_开发问答Is there a way to decode tinyURL links in R so that I can see which web pages they actually refer to?


Below is a quick and dirty solution, but should get the job done:

library(RCurl)

decode.short.url <- function(u) {
  x <- try( getURL(u, header = TRUE, nobody = TRUE, followlocation = FALSE) )
  if(class(x) == 'try-error') {
    return(u)
  } else {
    x <- strsplit(x, "Location: ")[[1]][2]
    return(strsplit(x, "\r")[[1]][1])
  }
}

The variable 'u' below contains one shortend url, and one regular url.

u <- c("http://tinyurl.com/adcd", "http://www.google.com") 

You can then get the expanded results by doing the following.

 sapply(u, decode.short.url) 

The above should work for most services which shorten the URL, not just tinyURL. I think.

HTH

Tony Breyal


I don't know R but in general you need to make a http request to the tinyurl-url. You should get back a 301 response with the actual url.


I used Tony Breyal's code, but the function returned NA values for those URLs where there was no URL redirection. Even though Tony listed "google.com" in his example, I think Google redirects you in any case to some sort of localized version of google.com.

Here is how I modified Tony's code to deal with that:

decode.short.url <- function(u) {
  x <- try( getURL(u, header = TRUE, nobody = TRUE, followlocation = FALSE) )
  if(class(x) == 'try-error') {
    print(paste("***", u, "--> ERORR!!!!"))    
    return(u)
  } else {
    x <- strsplit(x, "Location: ")[[1]][2]
    x.2  <- strsplit(x, "\r")[[1]][1]
    if (is.na(x.2)){
      print(paste("***", u, "--> No change."))
      return(u)
    }else{
      print(paste("***", x.2, "--> resolved in -->", x.2))  
      return(x.2)
    }
  }
}


u <- list("http://www.amazon.com", "http://tinyurl.com/adcd") 
urls <- sapply(u, decode.short.url)


library(RCurl)

decode.short.url <- function(u) {
  x <- try( getURL(u, header = TRUE, nobody = TRUE, followlocation = FALSE) )
  if(class(x) == 'try-error') {
    return(u)
  } else {
    x <- strsplit(x, "Location: ")[[1]][2]
    return(strsplit(x, "\r")[[1]][1])
  }
}


( u <- c("http://tinyurl.com/adcd", "http://tinyurl.com/fnqsh") )
( sapply(u, decode.short.url) )
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜