How can i extract some data from website and email it
As an example suppose i want my program to
- Vist stackoverflow everyday
- Find the 开发者_运维百科most question in some tag for that day
- Format it and then send it to my email address
I don't know how to do it , i know php more , but i have some understnading of Java , j2ee , spring MVC but not java network programming
Any guidelines how should i go
I'd start by looking at the Stack Exchange API.
What you can possibly do is extract the contents of url and write it to a string buffer and then using JSOUP.jar (used to parse html elements) parse the html string to get the content of your choice.I have a small sample which does exactly that i read all the contents of the url into a string and then parse the content based on the CLASS TAG (here in this case it is question-hyperlink)
package com.tps.examples;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;
public class URLGetcontent {
public static void main(String args[]) {
try {
URL url = new URL("http://stackoverflow.com/questions");
URLConnection conn = url.openConnection();
conn.setDoOutput(true);
// Get the response
BufferedReader rd = new BufferedReader(new InputStreamReader(conn.getInputStream()));
String line;
StringBuffer str = new StringBuffer();
while ((line = rd.readLine()) != null) {
// System.out.println(line);
str.append(line);
}
Document doc = Jsoup.parse(str.toString());
Elements content = doc.getElementsByClass("question-hyperlink");
for (int i = 0; i < content.size(); i++) {
System.out.println("Question: " + (i + 1) + ". " + content.eq(i).text());
System.out.println("");
}
System.out.println("*********************************");
} catch (Exception e) {
}
}
}
Once the data is extracted you can use javamail
class to send the content in email.
As you're wanting to retrieve data from a website (i.e. over HTTP), you probably want to look into using one of many HTTP clients already written in Java or PHP.
Apache HTTP Client is a good Java client used by many people. If you're invoking a RESTful interface, Jersey has a nice client library.
On the PHP side of things, someone already mentioned file_get_contents... but you could also look into the cURL library
As far as emailing goes, there's a JavaMail API (I'll admit I'm not familiar with it), or depending on your email server you might jump through other hoops (for example Exchange can send email through their SOAP interfaces.)
with file_get_contents()
in PHP you cal also fetch files via HTTP:
$stackoverflow = file_get_contents("http://stackoverflow.com");
Then you have to parse this. For many sites there are special APIs which you can request via JSON or XML.
If you know shell scripting (that's the way i do it for many sites - works great with a cronjob :)) then you can use sed
, wget
, w3m
, grep
, mail
to do it...
StackOverflow and other stackexchange sites provide a simple API (stackapps). You Please check out.
精彩评论