Jsoup Cookies for HTTPS scraping
I am experimenting with this site to gather my username on the welcome page to learn Jsoup and Android. Using the following code
Connection.Response res = Jsoup.connect("http://www.mikeportnoy.com/forum/login.aspx")
.data("ctl00$ContentPlaceHolder1$ctl00$Login1$UserName", "username", "ctl00$ContentPlaceHolder1$ctl00$Login1$Password", "password")
.method(Method.POST)
.execute();
String sessionId = res.cookie(".ASPXAUTH");
Document doc2 = Jsoup.connect("http://www.mikeportnoy.com/forum/default.aspx")
.cookie(".ASPXAUTH", sessionId)
.get();
My cookie (.ASPXAUTH) always ends up NULL. If I delete this cookie in a webbrowser, I lose my connection. So I am sure it is the correct cookie. In addition, if I change the code
.cookie(".ASPXAUTH", "jkaldfjjfasldjf") Using the correct values of course
I am able to scrape my login name from this page. Thi开发者_JAVA技巧s also makes me think I have the correct cookie. So, how come my cookie comes up Null? Are my username and password name fields incorrect? Something else?
Thanks.
I know I'm kinda late by 10 months here. But a good option using Jsoup is to use this easy peasy piece of code:
//This will get you the response.
Response res = Jsoup
.connect("url")
.data("loginField", "login@login.com", "passField", "pass1234")
.method(Method.POST)
.execute();
//This will get you cookies
Map<String, String> cookies = res.cookies();
//And this is the easieste way I've found to remain in session
Documente doc = Jsoup.connect("url").cookies(cookies).get();
Though I'm still having trouble connection to SOME websites, I connect to a whole lot of them with the same basic piece of code. Oh, and before I forget.. What I figured my problem is, is SSL certificates. You have to properly manage them in a way I still haven't quite figured out.
I always do this in two steps (like normal human),
- Read login page (by GET, read cookies)
- Submit form and cookies (by POST, without cookie manipulation)
Example:
Connection.Response response = Jsoup.connect("http://www.mikeportnoy.com/forum/login.aspx")
.method(Connection.Method.GET)
.execute();
response = Jsoup.connect("http://www.mikeportnoy.com/forum/login.aspx")
.data("ctl00$ContentPlaceHolder1$ctl00$Login1$UserName", "username")
.data("ctl00$ContentPlaceHolder1$ctl00$Login1$Password", "password")
.cookies(response.cookies())
.method(Connection.Method.POST)
.execute();
Document homePage = Jsoup.connect("http://www.mikeportnoy.com/forum/default.aspx")
.cookies(response.cookies())
.get();
And always set cookies from previuos request to next using
.cookies(response.cookies())
SSL is not important here. If you have problem with certifcates then execute this method for ignore SSL.
public static void trustEveryone() {
try {
HttpsURLConnection.setDefaultHostnameVerifier(new HostnameVerifier() {
public boolean verify(String hostname, SSLSession session) {
return true;
}
});
SSLContext context = SSLContext.getInstance("TLS");
context.init(null, new X509TrustManager[]{new X509TrustManager() {
public void checkClientTrusted(X509Certificate[] chain, String authType) throws CertificateException { }
public void checkServerTrusted(X509Certificate[] chain, String authType) throws CertificateException { }
public X509Certificate[] getAcceptedIssuers() {
return new X509Certificate[0];
}
}}, new SecureRandom());
HttpsURLConnection.setDefaultSSLSocketFactory(context.getSocketFactory());
} catch (Exception e) { // should never happen
e.printStackTrace();
}
}
What if you try fetching and passing all cookies without assuming anything like this: Sending POST request with username and password and save session cookie
If you still have problems try looking in to this: Issues with passing cookies to GET request (after POST)
精彩评论