Regular expression to find a value in a webpage
I need to find a regular expression which pulls out a value from a table cell in a html doc.
Example contents of this table cell are "Result: 40 mins".
I need a regular expression to match the actual number (40).
This is in java, thanks i开发者_高级运维n advance.
I've tried to do this with regular expressions before, and it is a pain in the hole.
It is MUCH easier to use something like an XPath expression, where you can specify the location by its place in the DOM hierarchy. The Apache libraries can do this (specifically Xalan) wihich can be found here: http://xml.apache.org/xalan-j/
You can use the Firefox addon XPath Checker to help you out with this.
The area you're talking about is called "web scraping" by the way, if you're looking for other tools/information.
You want to use DOM/XPATH, but if you really need regex for simple cases, try
/\<\s*td[^\>]*\>\s*result: (\d+) mins\s*\<\/td\>/i
again, will probably work for most HTML, but regex won't work for all HTML.
If it's not a one-off situation, use XPath to retrieve the contents of a certain HTML element ("Result: 40 min") and then a simple regexp to get what you need: "result: (\d+) mins"
(to adapt what OverClocked wrote). If the HTML is (as is likely) incorrect, you can clean it up with something like JTidy.
In the simplest case, you could simply look for the expression in the complete page: ".*result: (\d+) mins.*"
BTW, the web page you pointed to does not contain any kind of "Results": if you ment "Routes", you should be fine with something like this:
String pageContent = ...
Pattern p = java.util.regex.Pattern.compile("Route: ((\\d*) hour )*(\\d*) mins");
Matcher m = p.matcher(pageContent);
m.find();
System.out.println(m.group{1});
System.out.println(m.group{2});
精彩评论