开发者

Extract portion of a webpage

I'm doing an application on Android.

I have the content of a web (all the HTML) in a String, and i need extract all the text inside the paragraphs开发者_Go百科 (p elements) with the class="content".

Example:

<p class="content">La la la</p>
<p class="another">Le le le</p>
<p class="content">Li li li</p>

Result:

La la la
Li li li

What is the best approach to do this?


A regular expression would be your best bet.

http://download-llnw.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html


import java.io.DataInputStream;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;


public class Test {
    void readScreen () //reads from server
      {
        try
        {
          URL                url;
          URLConnection      urlConn;
          DataInputStream    dis;

          //Open url
          url = new URL("http://somewebsite.com");

          // Note:  a more portable URL:
          //url = new URL(getCodeBase().toString() + "/ToDoList/ToDoList.txt");

          urlConn = url.openConnection();
          urlConn.setDoInput(true);
          urlConn.setUseCaches(false);

          dis = new DataInputStream(urlConn.getInputStream());
          String s;

          while ((s = dis.readLine()) != null)
          {
            System.out.println(s); //this is where it reads from the screen
          }
            dis.close();
          }

          catch (MalformedURLException mue) {}
          catch (IOException ioe) {}
        }

    public static void main(String[] args){

        Test thisTest = new Test();
        thisTest.readScreen();

    }
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜