How can i get all page Content?
I want get all page content of website Examp : http://academic.research.microsoft.com/Author/1789765/hoang-kiem?query=hoang%20kiem
I used this code:
String getResults(URL source) throws IOException {
InputStream in = source.openStream();
StringBuffer sb = new StringBuffer();
byte[] buffer = new byte[256];
while(true) {
int bytesRead = in.read(buffer);
if(bytesRead == -1) break;
for (int i=0; i<bytesRead; i++)
sb.append((char)buffer[i]);
}
return 开发者_如何学Pythonsb.toString();
}
But the result missing some information such as information some hints about the author as shown below
can you give me some advice ! Thanks
The author details are loaded by ajax calls (click the "Net" tab in firebug and reload the page). If you want to get these details you will have to load the page in an environment that will execute javascript (ie: a browser).
I am pretty sure these contents are loaded into the page per JavaScript, and there's not really anything you can do about that when retrieving the page text from Java. You'll probably want to get a browser-plugin instead (Firefox has the largest repository of addons).
精彩评论