Download tarball from repository

2023-03-07 18:28 问答作者：

I am currently working on a project for scraping source code from SourceForge. I would like to download the tarball from the code repository.

An example link is given below: http://wurfl.cvs.sourceforge.net/viewvc/wurfl/?view=tar

The problems I faced while downloading is that, I am unable to use conventional URLConnection, HttpClient, HtmlUnit, Jsoup, etc API's to download the file. The specified link does not contain any filename or extension, this makes the download process even more complicated.

Can you suggest a means by which given a set of tarball links as parameters, I should be able to download them to my disk? Also, I was able to download it using wget. Is there a way I can pr开发者_运维百科ogramatically do it in Java in Windows?

Before you go any further with your efforts, carefully read the Sourceforge Terms of Use page. If you don't understand the ToS, contact Sourceforge and ask them if you are allowed to do what you are proposing.

The problems i faced while downloading is that, I am unable to use conventional url, http, htmlunit, jsoup apis etc to download the file.

Your assumption is incorrect.

You CAN use APIs such as the standard HttpURLConnection API or the Apache HttpClient APIs to do this kind of thing. If it is not working, it is because

you are doing something the wrong way (e.g. you haven't configured your Java app to use your local HTTP proxy), or
Sourceforge are using some technical means to stop you doing this; see the ToS.

If you post some details on what is happening when you try these approaches, maybe we can help you.

(HtmlUnit and Jsoup are probably inappropriate because they target HTML content.)

The specified link does not contain any filename or extension, this makes the download process even more complicated.

You can get the source filename and / or content type from the response headers. Refer to the HTTP specifications for details.

In the case that you really DO want to perhaps violate SourceForges ToS, then this may help.

You need wget.exe, as you wanted.

ProcessBuilder pb = new ProcessBuilder("wget.exe","http://wurfl.cvs.sourceforge.net/viewvc/wurfl/?view=tar", "no-proxy");
Process p = pb.start();

This will work as long as you have wget.exe in the same directory as the class file.

You may also want to check if the file DOES exist, in which case you would do something among the lines of:

ProcessBuilder pb = new ProcessBuilder("wget.exe","http://wurfl.cvs.sourceforge.net/viewvc/wurfl/?view=tar", "no-proxy");
       Process p = pb.start();
       int exitValue = p.waitFor();
       BufferedReader reader;
       // System.out.println("Exit Value" + exitValue);
       if (exitValue == 0) {
               reader = new BufferedReader(new InputStreamReader(p
                               .getInputStream()));
       } else {
               reader = new BufferedReader(new InputStreamReader(p
                               .getErrorStream()));
       }
       StringBuffer sb = new StringBuffer();
       String temp = reader.readLine();
       while (temp != null) {
               sb.append(temp);
               temp = reader.readLine();
       }

       reader.close();
       System.out.println(sb.toString());
if(sb.toString().indexOf("404") != -1) {
//means that the file does not exist
System.out.println("File does not exist, or access is denied");
} else {
if(sb.toString().indexOf("200") != -1) {
//file exists, download it
System.out.println("File exists, downloading...");
ProcessBuilder pb = new ProcessBuilder("wget.exe","http://wurfl.cvs.sourceforge.net/viewvc/wurfl/?view=tar", "no-proxy");
    Process p = pb.start();
}

But I reccomend NOT scraping SourceForge, unless its your own code that you are scraping (I did that once as an updater program). If you do, and my example helps, please kindly don't mention me. =]

Hope I helped!

继续阅读：download sourceforge web-scraping wget

Download tarball from repository

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？