开发者

How can i get all page Content?

I want get all page content of website Examp : http://academic.research.microsoft.com/Author/1789765/hoang-kiem?query=hoang%20kiem

I used this code:

 String getResults(URL source) throws IOException {

        InputStream in = source.openStream();
        StringBuffer sb = new StringBuffer();
        byte[] buffer = new byte[256];
        while(true) {
            int bytesRead = in.read(buffer);
            if(bytesRead == -1) break;
            for (int i=0; i<bytesRead; i++)
                sb.append((char)buffer[i]);
        }
        return 开发者_如何学Pythonsb.toString();
    }

But the result missing some information such as information some hints about the author as shown below

How can i get all page Content?

can you give me some advice ! Thanks


The author details are loaded by ajax calls (click the "Net" tab in firebug and reload the page). If you want to get these details you will have to load the page in an environment that will execute javascript (ie: a browser).


I am pretty sure these contents are loaded into the page per JavaScript, and there's not really anything you can do about that when retrieving the page text from Java. You'll probably want to get a browser-plugin instead (Firefox has the largest repository of addons).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜