Iterate an Array and test a regular expression to each value (Java)
I'm quite new to Java and I'm facing a situation I can't solve. I have some html code and I'm trying to run a regular expression to store all matches into an array. Here's my code:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;
public class RegexMatch{
boolean foundMatch = false;
public String[] arrayResults;
public String[] TestRegularExpression(String sourceCode, String pattern){
try{
Pat开发者_开发问答tern regex = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);
Matcher regexMatcher = regex.matcher(sourceCode);
while (regexMatcher.find()) {
arrayResults[matches] = regexMatcher.group();
matches ++;
}
} catch (PatternSyntaxException ex) {
// Exception occurred
}
return arrayResults;
}
}
I'm passing a string containing html code and the regular expression pattern to extract all meta tags and store them into the array. Here's how I instantiate the method:
RegexMatch regex = new RegexMatch();
regex.TestRegularExpression(sourceCode, "<meta.*?>");
String[] META_TAGS = regex.arrayResults;
Any hint? Thanks!
Firstly, parsing HTML with regular expressions is a bad idea. There are alternatives which will convert the HTML into a DOM etc - you should look into those.
Assuming you still want the "match multiple results" idea though, it seems to me that a List<E>
of some form would be more useful, so you don't need to know the size up-front. You can also build that in the method itself, rather than having state. For example:
import java.util.*;
import java.util.regex.*;
public class Test
{
public static void main(String[] args)
throws PatternSyntaxException
{
// Want to get x10 and x5 from this
String text = "x10 y5 x5 xyz";
String pattern = "x\\d+";
List<String> matches = getAllMatches(text, pattern);
for (String match : matches) {
System.out.println(match);
}
}
public static List<String> getAllMatches(String text, String pattern)
throws PatternSyntaxException
{
Pattern regex = Pattern.compile(pattern);
List<String> results = new ArrayList<String>();
Matcher regexMatcher = regex.matcher(text);
while (regexMatcher.find()) {
results.add(regexMatcher.group());
}
return results;
}
}
It's possible that there's something similar to this within the Matcher
class itself, but I can't immediately see it...
With Jsoup, you could do something as simple as...
import java.util.ArrayList;
import java.util.List;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class GetMeta {
private static final String META_QUERY = "meta";
public static List<String> parseForMeta(String htmlText) {
Document jsDocument = Jsoup.parse(htmlText);
Elements metaElements = jsDocument.select(META_QUERY);
List<String> metaList = new ArrayList<String>();
for (Element element : metaElements) {
metaList.add(element.toString());
}
return metaList;
}
}
For example:
import java.io.IOException;
import java.net.*;
import java.util.*;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class GetMeta {
private static final String META_QUERY = "meta";
private static final String MAIN_URL = "http://www.yahoo.com";
public static void main(String[] args) {
try {
Scanner scan = new Scanner(new URL(MAIN_URL).openStream());
StringBuilder sb = new StringBuilder();
while (scan.hasNextLine()) {
sb.append(scan.nextLine() + "\n");
}
List<String> metaList = parseForMeta(sb.toString());
for (String metaStr : metaList) {
System.out.println(metaStr);
}
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
public static List<String> parseForMeta(String htmlText) {
Document jsDocument = Jsoup.parse(htmlText);
Elements metaElements = jsDocument.select(META_QUERY);
List<String> metaList = new ArrayList<String>();
for (Element element : metaElements) {
metaList.add(element.toString());
}
return metaList;
}
}
精彩评论