开发者

Removing up to 4 spaces from a string

I have an array of Strings Im looping 开发者_如何学Pythonthrough. For each string, I need to remove up to 4 spaces from the beginning. In other words, if there are only 2 spaces, I remove 2. If there are 6 spaces I remove 4. How can I specify this in the loop?

for(int i=0; i<stringArray.length; i++) {
    newString = REMOVE UP TO 4 SPACES FROM stringArray[i];
    stringArray[i] = newString;
}

Thanks!


Try this:

stringArray[i] = stringArray[i].replaceFirst ("^ {0,4}", "");


Regex is the best tool for this job. I will explain it step by step:

Introduction

System.out.println(
    "# one ## two ### three #### four"
        .replaceAll("##", "@@")
);
//  "# one @@ two @@# three @@@@ four"

The above snippet should give you a good idea of how replaceAll works: it replaces all occurrences of "##" with "@@".

Pattern

As it turns out, replaceAll is a regex-based method: the first argument is a special pattern string, and the second argument is a special replacement string. The next snippet illustrates the idea:

System.out.println(
    "# one ## two ### three #### four"
        .replaceAll("#{2}", "@@")
);
//  "# one @@ two @@# three @@@@ four"

Now we used "#{2}" as the first argument. Rather intuitively, in regex this means "# repeated exactly twice"; this is exactly the same pattern we had before, which is why we also get the same result.

Range

The bounded repetition syntax in regex is actually quite expressive: instead of exact repetition, we can also define a range as follows:

System.out.println(
    "# one ## two ### three #### four"
        .replaceAll("#{1,3}", ":")
);
//  ": one : two : three :: four"

Rather intuitively, #{1,3} means "# repeated between 1 and 3 times".

Greed

Now note that regex repetition by default is greedy: it tries to match more if possible. This is illustrated by the following:

System.out.println(
    "# one ## two ### three #### four"
        .replaceAll("#{2,3}", ":")
);
//  ": one : two : three :# four"

Note that #### is replaced into :#. This is because the first 3 was taken by the first replacement, leaving only 1 left. Had #{2,3} only taken 2 # the first time, there would've been another # the second time, but since it's greedy, it took 3 # the first time, leaving it no chance to take the last #!

First

Now let's try another example as follows:

System.out.println(
    "=====5====4===3==2=1"
        .replaceAll("={1,4}", ":")
);
//  "::5:4:3:2:1"

Now let's say that we only want the first ={1,4} match to be replaced with ":".

System.out.println(
    "=====5====4===3==2=1"
        .replaceFirst("={1,4}", ":")
);
//  ":=5====4===3==2=1"

Voila! Everything works as expected!

Anchor

Now let's look at the next example:

System.out.println(
    "0=====5====4===3==2=1"
        .replaceFirst("={1,4}", ":")
);
//  "0:=5====4===3==2=1"

The replacement is still doing what it's supposed to do, but let's suppose that we only to match ={1,4} at the beginning of the string. Fortunately for us, regular expression has a way to express this: we can anchor the pattern at the beginning of the string, which is denoted by ^.

System.out.println(
    "0=====5====4===3==2=1"
        .replaceFirst("^={1,4}", ":")
);
//  "0=====5====4===3==2=1"

System.out.println(
    "=====5====4===3==2=1"
        .replaceFirst("^={1,4}", ":")
);
//  ":=5====4===3==2=1"

System.out.println(
    "===3==2=1"
        .replaceFirst("^={1,4}", ":")
);
//  ":3==2=1"

Voila! Everything works as expected!


Going back to the answer

And now we have enough information to answer the original question!

stringArray[i] = stringArray[i].replaceFirst("^ {1,4}", "");

So the pattern ^ {1,4} means:

  • Anchored at the beginning of the string with ^...
  • ...between 1 to 4 space characters, taking more if possible

We then replace the first occurrence of such a match with the empty string, essentially removing it.


More learning resources

That was a beginner's introduction to regular expressions basics. There are still many aspects to this wonderful world that haven't been explored yet.

References

  • Java Tutorials/Essential Classes/Regular Expressions
  • regular-expressions.info
    • Repetition
    • Anchors

Related questions

  • Coming eventually


Do you care about trailing spaces?

You could check it against a regular expression to see if it matches 4 or more spaces in front, and use substring to clip off the first 4 spaces. Else if there are less than 4 spaces in front (does not match RegEx) just use string.trim() in Java.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜