Java - Parsing HTML - get text

2023-03-25 11:26 问答作者：

I am tring to get text from a website; when you c开发者_Go百科hange the language the html url have an "/en" inside, but the page that have the information that i want don't have.

http://www.wippro.at/module/gallery/index.php?limitstart=0&picno=0&gallery_key=92

html tags: (the text contains the description of the photo)
<div id="redx_gallery_pic_title"> text text </div>

The problem is that the website is in german and i want the text in english, and my script gets only the german version

Any ideas how can i do it?

java code:
...
URL oracle = new URL(x);
BufferedReader in = new BufferedReader(new InputStreamReader(oracle.openStream()));
    String inputLine=null;
    StringBuffer theText = new StringBuffer();
    while ((inputLine = in.readLine()) != null)
            theText.append(inputLine+"\n");
    String html = theText.toString();
    in.close();

String[] name = StringUtils.substringsBetween(html, "redx_gallery_pic_title\">", "</div>");

That site is internationalized with German as default. You need to tell the server what language you're accepting by specifying the desired ISO 639-1 language code in the Accept-Language request header.

URLConnection connection = new URL(url).openConnection();
connection.setRequestProperty("Accept-Language", "en");
InputStream input = connection.getInputStream();
// ...

Unrelated to the concrete problem, may I suggest you to have a look at Jsoup as a HTML parser? It's much more convenient with its jQuery-like CSS selector syntax and therefore much less bloated than your attempt as far:

String url = "http://www.wippro.at/module/gallery/index.php?limitstart=0&picno=0&gallery_key=92";
Document document = Jsoup.connect(url).header("Accept-Language", "en").get();
String title = document.select("#redx_gallery_pic_title").text();
System.out.println(title); // Beech, glazing V3

That's all.

继续阅读：parsing

Java - Parsing HTML - get text

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？