Modify large string
I have a large string in the following format -
<a href="12345.html"><a href="12345.html"><a href="12345.html"><a href="12345.html">
<a href="12345.html"><a href="12345.html"><a href="12345.html"><a href="12345.html">
Id like to store all occurances of the value that occurs before .html. So above html becomes something like 12345.html,12345.html,12345.html,12345.html,12345.html,12345.html,12345.html,12345.html
Do I need a regular expression? or some kind of replace method.
Tha开发者_如何学编程nks
You don't actually need a regular expression, but you could use the underlying Matcher class:
final String searchString = "12345.html";
final String txt =
"<a href=\"12345.html\"><a href=\"12345.html\"><a href=\"12345.html\"><a href=\"12345.html\">\n"
+ "<a href=\"12345.html\"><a href=\"12345.html\"><a href=\"12345.html\"><a href=\"12345.html\">";
final Matcher matcher = Pattern.compile(searchString, Pattern.LITERAL).matcher(txt);
final StringBuilder sb = new StringBuilder();
while(matcher.find()){
if(sb.length() > 0) sb.append(',');
sb.append(matcher.group());
}
System.out.println(sb.toString());
Output:
12345.html,12345.html,12345.html,12345.html,12345.html,12345.html,12345.html,12345.html
You can use an HTML parser like Jsoup.
Document doc = Jsoup.parse(yourString);
Elements els = doc.select("a");
for(Element el: els){
//this only if needs the number without the HTML
//if not, only el.attr("href")
if(el.attr("href").contains(".html")){
String[] parts = el.attr("href").split(".html");
System.out.println(parts[0]);
}
}
Don't use regex to parse HTML.
If you are accessing this string inside the java code, you can split the string on "=' delimeter. It will result in a bunch of strings. One string will look like "
So the steps are: 1. split the string which will result in string array. 2. Iterate over the resulting array and look for the pattern ">
精彩评论