开发者

How can i extract some data from website and email it

As an example suppose i want my program to

  1. Vist stackoverflow everyday
  2. Find the 开发者_运维百科most question in some tag for that day
  3. Format it and then send it to my email address

I don't know how to do it , i know php more , but i have some understnading of Java , j2ee , spring MVC but not java network programming

Any guidelines how should i go


I'd start by looking at the Stack Exchange API.


What you can possibly do is extract the contents of url and write it to a string buffer and then using JSOUP.jar (used to parse html elements) parse the html string to get the content of your choice.I have a small sample which does exactly that i read all the contents of the url into a string and then parse the content based on the CLASS TAG (here in this case it is question-hyperlink)

package com.tps.examples;

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;

public class URLGetcontent {

    public static void main(String args[]) {

        try {

            URL url = new URL("http://stackoverflow.com/questions");
            URLConnection conn = url.openConnection();
            conn.setDoOutput(true);

            // Get the response
            BufferedReader rd = new BufferedReader(new InputStreamReader(conn.getInputStream()));
            String line;
            StringBuffer str = new StringBuffer();

            while ((line = rd.readLine()) != null) {
                // System.out.println(line);
                str.append(line);
            }

            Document doc = Jsoup.parse(str.toString());
            Elements content = doc.getElementsByClass("question-hyperlink");

            for (int i = 0; i < content.size(); i++) {
                System.out.println("Question: " + (i + 1) + ".  " + content.eq(i).text());
                System.out.println("");
            }
            System.out.println("*********************************");

        } catch (Exception e) {

        }
    }
}

Once the data is extracted you can use javamail class to send the content in email.


As you're wanting to retrieve data from a website (i.e. over HTTP), you probably want to look into using one of many HTTP clients already written in Java or PHP.

Apache HTTP Client is a good Java client used by many people. If you're invoking a RESTful interface, Jersey has a nice client library.

On the PHP side of things, someone already mentioned file_get_contents... but you could also look into the cURL library

As far as emailing goes, there's a JavaMail API (I'll admit I'm not familiar with it), or depending on your email server you might jump through other hoops (for example Exchange can send email through their SOAP interfaces.)


with file_get_contents() in PHP you cal also fetch files via HTTP:

$stackoverflow = file_get_contents("http://stackoverflow.com");

Then you have to parse this. For many sites there are special APIs which you can request via JSON or XML.

If you know shell scripting (that's the way i do it for many sites - works great with a cronjob :)) then you can use sed, wget, w3m, grep, mail to do it...


StackOverflow and other stackexchange sites provide a simple API (stackapps). You Please check out.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜