Apache HTTPClient returns an empty page

2023-01-01 06:05 问答作者：

I am using the Apache HTTPClient for Java and I'm facing a really strange issue. Sometimes when I try to get a dynamically generated page it returns its actual content, but other times (with another parameter) all I get is a short sequence of \t,\r and \n.

How could I track what's going on on the different cases in order to find where is the bug?

My usage of the library is pretty straightforward, all I do is this few calls on an initialized HTTPClient object:

String content = "/pageIwant.jsp?parameter=10101010";
HttpG开发者_开发技巧et request = new HttpGet(content);
HttpResponse response = client.execute(targetHost, request);
HttpEntity entity = response.getEntity();
String page = EntityUtils.toString(entity);

The way I would approach this to start by attempting to fetch the same page using a web browser. If you cannot get that to work, it is probably safe to conclude that the real problem is with the server. You'll need to talk to the server's support staff.

If a browser works, try and repeat the process using the wget utility. If wget gives you problems, go back to your browser and find out exactly what headers the browser is sending in the HTTP request and try to get wget to use the same headers. Once you've got wget to work, make a note of the headers.

Finally return to your Java code, and modify it so that the HTTP request headers it sends are the same as those that work for wget.

Yes, I have to authenticate using the proxy of my university and then I am able to access all the data. The proxy authentication is working flawlessly for the 'journal page' and even for other sites, so I'd exclude that the problem is related to that.

I think you may have excluded the real problem. @BalasC is not talking about proxy authentication. Rather he is talking about authentication at the IEEE site. And just because one part of the site appears to work without authentication does not mean it all will. (However, I'd have thought that the site would respond with a "FORBIDDEN" or "AUTHORIZATION REQUIRED" error rather than delivering strange content.)

Another possibility is that the site trying to prevent "screen scraping" of their content using automatic tools. Check the "Terms of Service" for the site to see if what you are trying to do is allowed. (You may choose to ignore the ToS and circumvent the technical measures, but then you might find yourself or your organization IP blocked, or you might be on the end of cease-and-desist letters talking about copyright violation.)

I found the solution to my problem, I was missing some header informations that apparently are required just from part of the dynamic page.

To solve my issue I first used wireshark to see the communications between the browser and the server and then I added all the headers I was missing.

I found out that in my case I needed to specify the 'Accept-Language' data

继续阅读：httpclient

Apache HTTPClient returns an empty page

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？