开发者

How to get hidden data from a website in java [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 11 years ago.

I developing a software in java and I want to get some text from a website. The problem is that it is shown in the browser and hidden when I got though code.

update: I am reading through InputStreamReader from a website the comments field is not shown it is also not shown in the source code of the page. When I open that page in the browser the comments field is there and publicly availab开发者_JAVA百科le.

update: The URL is http://www.alarabiya.net/articles/2011/07/20/158410.html


Exactly which comments are you not seeing? The following code gets the comments as far as I can tell:

URL url = new URL("http://www.alarabiya.net/articles/2011/07/20/158410.html");
HttpURLConnection urlConnection = (HttpURLConnection) url.openConnection();
urlConnection.setRequestMethod("GET");
urlConnection.connect();
InputStream in = urlConnection.getInputStream();
byte[] data = new byte[8192];
int length;
while ((length = in.read(data)) != -1) {
    System.out.print(new String(data, 0, length));
}
in.close();
urlConnection.disconnect();

Note: the above code isn't production grade--just an example.


Here is a blog post describing how to get HTML from a url using the Java SDK, or Apache Commons HttpClient. Once you get the HTML, there is lots you can do to it.

  1. Extract the Text from the Markup
  2. Extract Links
  3. Change Links
  4. Collect Email Addresses
  5. Collect Images
  6. Add Syntax Highlighting
  7. Diff Two Sources

READ HTML WITH JAVA – THEN 7 FUN THINGS TO DO TO IT


If you are building desktop applications, you can use XULRunner and inject Javascript to show the result. I have done a project working with mal-formated webpages. If you use jdom, you will get plenty of errors, but XULRunner is very good at handling theses pages.

An easier way to do the same thing is by using JavaScript Bookmarklets. For example: http://www.mattcutts.com/blog/javascript-bookmarklet-basics/

Embed your Javascript in the URL and send result via AJAX to your Java server.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜