Extract portion of a webpage
I'm doing an application on Android.
I have the content of a web (all the HTML) in a String, and i need extract all the text inside the paragraphs开发者_Go百科 (p elements) with the class="content".
Example:
<p class="content">La la la</p>
<p class="another">Le le le</p>
<p class="content">Li li li</p>
Result:
La la la
Li li li
What is the best approach to do this?
A regular expression would be your best bet.
http://download-llnw.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html
import java.io.DataInputStream;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;
public class Test {
void readScreen () //reads from server
{
try
{
URL url;
URLConnection urlConn;
DataInputStream dis;
//Open url
url = new URL("http://somewebsite.com");
// Note: a more portable URL:
//url = new URL(getCodeBase().toString() + "/ToDoList/ToDoList.txt");
urlConn = url.openConnection();
urlConn.setDoInput(true);
urlConn.setUseCaches(false);
dis = new DataInputStream(urlConn.getInputStream());
String s;
while ((s = dis.readLine()) != null)
{
System.out.println(s); //this is where it reads from the screen
}
dis.close();
}
catch (MalformedURLException mue) {}
catch (IOException ioe) {}
}
public static void main(String[] args){
Test thisTest = new Test();
thisTest.readScreen();
}
}
精彩评论