开发者

fetching a webpage using java socket class

Good evening to all of you

I want to fetch a webpage using the socket class in java and i have done this as

import java.net.*;
import java.io.*;

class htmlPageFetch{
        public static void main(String[] args){
                try{
                        Socket s = new Socket("127.0.0.1", 80);
                        DataInputStream dIn = new DataInputStream(s.getInputStream());
                        DataOutputStream dOut = new DataOutputStream(s.getOutputStream());
                        dOut.write("GET /index.php HTTP/1.0\n\n".getBytes());
                        boolean more_data = true;
                        String str;
                        while(more_data){
                                str = dIn.readLine();
if(str==null)
more_data = false;
                                System.out.println(str);
                        }
                }catch(IOException e){

                }
        }
}

But it is just giving the null's.

Output

 HTTP/1.1 302 Found
Date: Wed, 01 Dec 2010 13:49:02 GMT
Server: Apache/2.2.11 (Unix) DAV/2 mod_ssl/2.2.11 OpenSSL/0.9.8k PHP/5开发者_开发知识库.2.9 mod_apreq2-20051231/2.6.0 mod_perl/2.0.4 Perl/v5.10.0
X-Powered-By: PHP/5.2.9
Location: http://localhost/xampp/
Content-Length: 0
Content-Type: text/html

null


I'm not sure if this is causing your problem, but HTTP expects carriage return and line feed for a newline:

dOut.write("GET /index.php HTTP/1.0\r\n\r\n".getBytes());

Also, it wouldn't hurt to flush and close your DataOutputStream:

dOut.flush();
dOut.close();

If you plan on doing anything more with this code than just connecting to simple test cases, I'd recommending using HttpURLConnection for this instead of implenting HTTP in a socket yourself. Otherwise, the result will contain more than just the web page. It will also contain the HTTP response including status codes and headers. Your code would need to parse that.

Update:

Looking at the response you added, that 302 response along with the Location: header indicate that the page you are looking for moved to http://localhost/xampp/ (see HTTP 302) and there is no longer any content at the original URL. This is something that is can be set to be handled automatically by HttpURLConnection or another library like Apache HttpClient. You will need to parse the status code, parse the headers, open a new socket to the response Location and get the page. Depending upon the exact requirements of your assignment, you will probably want to familiarize yourself with the HTTP 1.0 Specification, and the HTTP 1.1 Specification as well.


I think the code is working, except maybe that you don't see the output because it's swamped by all the nulls you print. You should stop the while after the first null. More generally, DataInputStream and DataOutputStream are not the right classes for this job. Try this code.

public static void main(String[] args) throws IOException {
    Socket s = new Socket("127.0.0.1", 80);
    BufferedReader dIn = new BufferedReader(new InputStreamReader(s.getInputStream()));
    PrintStream dOut = new PrintStream(s.getOutputStream());
    dOut.println("GET /index.php HTTP/1.0");
    dOut.println();
    String str = null;
    do {
        str = dIn.readLine();
        System.out.println(str);
    } while (str != null);
}


Why are you using socket directly to perform HTTP connection? This is fine exercise but it requires deep knowledge of internals of HTTP protocol. Why not just to use classes URL, and URLConnection?

BufferedReader dIn = new BufferedReader(new URL("http://127.0.0.1:80").openConnection().getInputStream());
do {
        str = dIn.readLine();
        System.out.println(str);
    } while (str != null);
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜