Regex to remove spaces from file name
I have some html strings which contains images. I need to remove spaces from image name because some tablets do not accept them. (I already renamed all im开发者_运维技巧age resources). I think the only fix part is ...
src="file:///android_asset/images/ ?? ?? .???"
because those links are valid links.
I spent half day on it and still struggling on performance issue. The following code works but really slow...
public static void main(String[] args) {
String str = "<IMG height=286 alt=\"eye_anatomy 1.jpg\" src=\"file:///android_asset/images/eye_anatomy 1 .jpg\" width=350 border=0></P> fd ssda f \r\n"
+ "fd <P align=center><IMG height=286 alt=\"eye_anatomy 1.jpg\" src=\"file:///android_asset/images/ eye_anato my 1 .bmp\" width=350 border=0></P>\r\n"
+ "\r\n<IMG height=286 alt=\"eye_anatomy 1.jpg\" src=\"file:///android_asset/images/eye_anatomy1.png\" width=350 border=0>\r\n";
Pattern p = Pattern.compile("(.*?)(src=\"file:///android_asset/images/)(.*?\\s+.*?)(\")", Pattern.DOTALL);
Matcher m = p.matcher(str);
StringBuilder sb = new StringBuilder("");
int i = 0;
while (m.find()) {
sb.append(m.group(1)).append(m.group(2)).append(m.group(3).replaceAll("\\s+", "")).append(m.group(4));
i = m.end();
}
sb.append(str.substring(i, str.length()));
System.out.println(sb.toString());
}
So the real question is, how can I remove spaces from image name efficiently using regex.
Thank you.
Regex is as regex does. :-) Serious the regex stuff is great for really particular cases, but for stuff like this I find myself writing lower-level code. So the following isn't a regex; it's a function. But it does what you want and does it much faster than your regex. (That said, if someone does comes up with a regex that fits the bill and performs well I'd love to see it.)
The following function segments the source string using spaces as delimiters, then recognizes and cleans up your alt and src attributes by not appending spaces while assembling the result. I did the alt attribute only because you were putting file names there too. One side effect is that this will collapse multiple spaces into one space in the rest of the markup, but browsers do that anyway. You can optimize the code a bit by re-using a StringBuilder. It presumes double-quotes around attributes.
I hope this helps.
private String removeAttrSpaces(final String str) {
final StringBuilder sb = new StringBuilder(str.length());
boolean inAttribute = false;
for (final String segment : str.split(" ")) {
if (segment.startsWith("alt=\"") || segment.startsWith("src=\"")) {
inAttribute = true;
}
if (inAttribute && segment.endsWith("\"")) {
inAttribute = false;
}
sb.append(segment);
if (!inAttribute) {
sb.append(' ');
}
}
return sb.toString();
}
Here's a function that should be faster http://ideone.com/vlspF:
private static String removeSpacesFromImages(String aText){
Pattern p = Pattern.compile("(?<=src=\"file:///android_asset/images/)[^\"]*");
StringBuffer result = new StringBuffer();
Matcher matcher = p.matcher(aText);
while ( matcher.find() ) {
matcher.appendReplacement(result, matcher.group(0).replaceAll("\\s+",""));
}
matcher.appendTail(result);
return result.toString();
}
精彩评论