How to extract a substring using regex
I have a string that has two single quotes in it, the '
character. In between the single quotes is the data I want.
How can I write a regex to extract "the data i want" from the following text?
mydata = "some string with 'the data i want' inside开发者_StackOverflow中文版";
Assuming you want the part between single quotes, use this regular expression with a Matcher
:
"'(.*?)'"
Example:
String mydata = "some string with 'the data i want' inside";
Pattern pattern = Pattern.compile("'(.*?)'");
Matcher matcher = pattern.matcher(mydata);
if (matcher.find())
{
System.out.println(matcher.group(1));
}
Result:
the data i want
You don't need regex for this.
Add apache commons lang to your project (http://commons.apache.org/proper/commons-lang/), then use:
String dataYouWant = StringUtils.substringBetween(mydata, "'");
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) {
Pattern pattern = Pattern.compile(".*'([^']*)'.*");
String mydata = "some string with 'the data i want' inside";
Matcher matcher = pattern.matcher(mydata);
if(matcher.matches()) {
System.out.println(matcher.group(1));
}
}
}
There's a simple one-liner for this:
String target = myData.replaceAll("[^']*(?:'(.*?)')?.*", "$1");
By making the matching group optional, this also caters for quotes not being found by returning a blank in that case.
See live demo.
Since Java 9
As of this version, you can use a new method Matcher::results
with no args that is able to comfortably return Stream<MatchResult>
where MatchResult
represents the result of a match operation and offers to read matched groups and more (this class is known since Java 1.5).
String string = "Some string with 'the data I want' inside and 'another data I want'.";
Pattern pattern = Pattern.compile("'(.*?)'");
pattern.matcher(string)
.results() // Stream<MatchResult>
.map(mr -> mr.group(1)) // Stream<String> - the 1st group of each result
.forEach(System.out::println); // print them out (or process in other way...)
The code snippet above results in:
the data I want another data I want
The biggest advantage is in the ease of usage when one or more results is available compared to the procedural if (matcher.find())
and while (matcher.find())
checks and processing.
Because you also ticked Scala, a solution without regex which easily deals with multiple quoted strings:
val text = "some string with 'the data i want' inside 'and even more data'"
text.split("'").zipWithIndex.filter(_._2 % 2 != 0).map(_._1)
res: Array[java.lang.String] = Array(the data i want, and even more data)
String dataIWant = mydata.replaceFirst(".*'(.*?)'.*", "$1");
as in javascript:
mydata.match(/'([^']+)'/)[1]
the actual regexp is: /'([^']+)'/
if you use the non greedy modifier (as per another post) it's like this:
mydata.match(/'(.*?)'/)[1]
it is cleaner.
String dataIWant = mydata.split("'")[1];
See Live Demo
In Scala,
val ticks = "'([^']*)'".r
ticks findFirstIn mydata match {
case Some(ticks(inside)) => println(inside)
case _ => println("nothing")
}
for (ticks(inside) <- ticks findAllIn mydata) println(inside) // multiple matches
val Some(ticks(inside)) = ticks findFirstIn mydata // may throw exception
val ticks = ".*'([^']*)'.*".r
val ticks(inside) = mydata // safe, shorter, only gets the first set of ticks
Apache Commons Lang provides a host of helper utilities for the java.lang API, most notably String manipulation methods. In your case, the start and end substrings are the same, so just call the following function.
StringUtils.substringBetween(String str, String tag)
Gets the String that is nested in between two instances of the same String.
If the start and the end substrings are different then use the following overloaded method.
StringUtils.substringBetween(String str, String open, String close)
Gets the String that is nested in between two Strings.
If you want all instances of the matching substrings, then use,
StringUtils.substringsBetween(String str, String open, String close)
Searches a String for substrings delimited by a start and end tag, returning all matching substrings in an array.
For the example in question to get all instances of the matching substring
String[] results = StringUtils.substringsBetween(mydata, "'", "'");
you can use this i use while loop to store all matches substring in the array if you use
if (matcher.find())
{
System.out.println(matcher.group(1));
}
you will get on matches substring so you can use this to get all matches substring
Matcher m = Pattern.compile("[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\\.[a-zA-Z0-9-.]+").matcher(text);
// Matcher mat = pattern.matcher(text);
ArrayList<String>matchesEmail = new ArrayList<>();
while (m.find()){
String s = m.group();
if(!matchesEmail.contains(s))
matchesEmail.add(s);
}
Log.d(TAG, "emails: "+matchesEmail);
add apache.commons dependency on your pom.xml
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-io</artifactId>
<version>1.3.2</version>
</dependency>
And below code works.
StringUtils.substringBetween(String mydata, String "'", String "'")
Some how the group(1) didnt work for me. I used group(0) to find the url version.
Pattern urlVersionPattern = Pattern.compile("\\/v[0-9][a-z]{0,1}\\/");
Matcher m = urlVersionPattern.matcher(url);
if (m.find()) {
return StringUtils.substringBetween(m.group(0), "/", "/");
}
return "v0";
精彩评论