开发者

Android gets HTTPS page truncated

I am fetching a web page on Android using HTTPS (ignoring the certificate as it is both self-signed and outdated, as seen here - don't ask, it's not my server :)).

I've defined my

public class MyHttpClient extends DefaultHttpClient {


    public MyHttpClient() {
        super();
        final HttpParams params = getParams();
        HttpConnectionParams.setConnectionTimeout(params,
                REGISTRATION_TIMEOUT);
        HttpConnectionParams.setSoTimeout(params, REGISTRA开发者_高级运维TION_TIMEOUT);
        ConnManagerParams.setTimeout(params, REGISTRATION_TIMEOUT);
    }

    @Override
    protected ClientConnectionManager createClientConnectionManager() {
        SchemeRegistry registry = new SchemeRegistry();
        registry.register(new Scheme("http", PlainSocketFactory
                .getSocketFactory(), 80));
        registry.register(new Scheme("https", new UnsecureSSLSocketFactory(), 443));
        return new SingleClientConnManager(getParams(), registry);
    }
}

where the UnsecureSSLSocketFactory mentioned is based on the suggestion given on the aforementioned topic.

I'm then using this class to fecth a page

public class HTTPHelper {

    private final static String TAG = "HTTPHelper";
    private final static String CHARSET = "ISO-8859-1";

    public static final String USER_AGENT = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 (.NET CLR 3.5.30729)";
    public static final String ACCEPT_CHARSET = "ISO-8859-1,utf-8;q=0.7,*;q=0.7";
    public static final String ACCEPT = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";


    /**
     * Sends an HTTP request
     * @param url
     * @param post
     * @return
     */
    public String sendRequest(String url, String post) throws ConnectionException {

        MyHttpClient httpclient = new MyHttpClient();

        HttpGet httpget = new HttpGet(url);
        httpget.addHeader("User-Agent", USER_AGENT);
        httpget.addHeader("Accept", ACCEPT);
        httpget.addHeader("Accept-Charset", ACCEPT_CHARSET);

        HttpResponse response;
        try {
            response = httpclient.execute(httpget);
        } catch (Exception e) {
            throw new ConnectionException(e.getMessage());
        }

        HttpEntity entity = response.getEntity();

        try {
            pageSource = convertStreamToString(entity.getContent());
        } catch (Exception e) {
            throw new ConnectionException(e.getMessage());
        }
        finally {
            if (entity != null) {
                try {
                    entity.consumeContent();
                } catch (IOException e) {
                    throw new ConnectionException(e.getMessage());
                }
            }
        }

        httpclient.getConnectionManager().shutdown();
        return pageSource;

    }

    /**
     * Converts a stream to a string
     * @param is
     * @return
     */
    private static String convertStreamToString(InputStream is) 
    {
        try {
            BufferedReader reader = new BufferedReader(new InputStreamReader(is, CHARSET));
            StringBuilder stringBuilder = new StringBuilder();
            String line = null;
            try {
                while ((line = reader.readLine()) != null) {
                    stringBuilder.append(line + "\n");
                }
            } catch (IOException e) {
                Log.d(TAG, "Exception in convertStreamToString", e);
            } finally {
                try {
                    is.close();
                } catch (IOException e) {}
            }
            return stringBuilder.toString();
        } catch (Exception e) {
            throw new Error("Unsupported charset");
        }
    }

}

The page I get is truncated after about a hundred of lines. It's truncated at a precise point, where a '_' (underscore) char is followed by a 'r' char. It's not the first underscore in the page.

I thought it might have been an encoding issue, so I tried both UTF-8 and ISO-8859-1, but it's still truncated. If I open the page with Firefox, it reports the encoding being ISO-8851-1.

In case you are wondering, the webpage is https://ricarichiamoci.dsu.pisa.it/ and it gets truncated at line 169,

function ChangeOffset(NewOffset) {
  document.mainForm.last

where it should instead be

function ChangeOffset(NewOffset) {
  document.mainForm.last_record.value = NewOffset;

Does anyone have an idea of why the page is truncated?


I figured out the page downloaded is not truncated, but the function I'm using to print it out (Log.d) does truncate the string.

So the method to download the page source code is working fine, but Log.d() is probably not meant to print that much amount of text.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜