开发者

Modify large string

I have a large string in the following format -

   <a href="12345.html"><a href="12345.html"><a href="12345.html"><a href="12345.html">
   <a href="12345.html"><a href="12345.html"><a href="12345.html"><a href="12345.html">

Id like to store all occurances of the value that occurs before .html. So above html becomes something like 12345.html,12345.html,12345.html,12345.html,12345.html,12345.html,12345.html,12345.html

Do I need a regular expression? or some kind of replace method.

Tha开发者_如何学编程nks


You don't actually need a regular expression, but you could use the underlying Matcher class:

final String searchString = "12345.html";
final String txt =
"<a href=\"12345.html\"><a href=\"12345.html\"><a href=\"12345.html\"><a href=\"12345.html\">\n"
+ "<a href=\"12345.html\"><a href=\"12345.html\"><a href=\"12345.html\"><a href=\"12345.html\">";
final Matcher matcher = Pattern.compile(searchString, Pattern.LITERAL).matcher(txt);
final StringBuilder sb = new StringBuilder();
while(matcher.find()){
    if(sb.length() > 0) sb.append(',');
    sb.append(matcher.group());
}
System.out.println(sb.toString());

Output:

12345.html,12345.html,12345.html,12345.html,12345.html,12345.html,12345.html,12345.html


You can use an HTML parser like Jsoup.

Document doc = Jsoup.parse(yourString);
Elements els = doc.select("a");
for(Element el: els){
    //this only if needs the number without the HTML
    //if not, only el.attr("href")
    if(el.attr("href").contains(".html")){
         String[] parts = el.attr("href").split(".html");
         System.out.println(parts[0]);
    }          
}

Don't use regex to parse HTML.


If you are accessing this string inside the java code, you can split the string on "=' delimeter. It will result in a bunch of strings. One string will look like "

So the steps are: 1. split the string which will result in string array. 2. Iterate over the resulting array and look for the pattern ">

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜