Jsoup image tag extraction
i need to extract an image tag using jsoup from this html
<div class="picture">
<img src="http://asdasd/aacb.jpgs" title="picture" alt="picture" />
</div>
i need to extract the src of this img tag ... i am using this code i am getting null value
Element masthead2 = doc.select("div.picture").first();
String linkText = masthead2.outerHtml();
Document doc1 = Jsoup.parse(linkText);
Element masthead3 = doc1.select("img[src]").first();
开发者_如何学编程String linkText1 = masthead3.html();
Here's an example to get the image source attribute:
public static void main(String... args) {
Document doc = Jsoup.parse("<div class=\"picture\"><img src=\"http://asdasd/aacb.jpgs\" title=\"picture\" alt=\"picture\" /></div>");
Element img = doc.select("div.picture img").first();
String imgSrc = img.attr("src");
System.out.println("Img source: " + imgSrc);
}
The div.picture img
selector finds the image element under the div.
The main extract methods on an element are:
attr(name)
, which gets the value of an element's attribute,text()
, which gets the text content of an element (e.g. in<p>Hello</p>
, text() is "Hello"),html()
, which gets an element's inner HTML (<div><img></div>
html() =<img>
), andouterHtml()
, which gets an elements full HTML (<div><img></div>
html() =<div><img></div>
)
You don't need to reparse the HTML like in your current example, either select the correct element in the first place using a more specific selector, or hit the element.select(string)
method to winnow down.
<tr> <td class="blackNoLine" nowrap="nowrap" valign="top" width="25" align="left"><b>CAST: </b></td> <td class="blackNoLine" valign="top" width="416">Jay, Shazahn Padamsee </td> </tr>
You can use:
Document doc = Jsoup.parse(...);
Elements els = doc.select("td[class=blackNoLine]");
Element el= els.get(1);
String castName = el.text();
With the following code I can extract the image correctly:
Document doc = Jsoup.parse("<div class=\"picture\"> <img src=\"http://asdasd/aacb.jpgs\" title=\"picture\" alt=\"picture\" /> </div>");
Element elem = doc.select("div.picture img").first();
System.out.println("elem: " + elem.attr("src"));
I'm using jsoup release 1.2.2, the latest one.
Maybe you're trying to print the inner html of an empty tag like img.
From the documentation: "html() - Retrieves the element's inner HTML".
For the second portion of html you can use:
Document doc2 = Jsoup.parse("<tr> <td class=\"blackNoLine\" nowrap=\"nowrap\" valign=\"top\" width=\"25\" align=\"left\"><b>CAST: </b></td> <td class=\"blackNoLine\" valign=\"top\" width=\"416\">Jay, Shazahn Padamsee </td> </tr>");
Elements trElems = doc2.select("tr");
if (trElems != null) {
for (Element element : trElems) {
Element secondTd = element.select("td").get(1);
System.out.println("name: " + secondTd.text());
}
}
which prints "Jay, Shazahn Padamsee".
精彩评论