Java Regex Parse URL
I have a FTP URL, and have to parse the URL to get the username, password, server name and the directory.开发者_运维问答 What can be the regular exp to do it?
ex: ftp://userName:password@someServer/direcory-name
Use java.net.URI
. It will be more robust, and will probably be faster.
The problems with using a Regex include:
either too simple to deal with edge cases, or too complicated / expensive because it deals with those cases, and
it is unlikely to handle %-encoding correctly.
For example, the (original) regex tendered by @Larry doesn't deal with cases where the URL doesn't have userInfo, etcetera.
As the comments stated, a URL is a URI but not (necessarily) vice-versa. But the reasons that I recommend java.net.URI
not java.net.URL
are:
- it has a better parser, and
- it has a better API for examining the parts of the parsed url.
Whenever I think of regexes, I think "Perl" and write a quick and dirty pattern (qr{xxx}x) and test it against test input.
In your case, assuming that user name, password, server, and directory name all need to be parsed out (and are mandatory), I'd use the following. Add question marks for "optional" parts of your pattern if you wish to modify this:
qr{
^ # Start of text
ftp: # Protocol
// # Double slash
([^:]+) # $1 = User Name
: # Colon
([^@]+) # $2 = Password
@ # AT sign
(.*?) # $3 = Server name
/ # Single slash
(.*?) # $4 = Directory name
(\?.*)? # Question mark ends URI
$ # End of text
}x;
Now that we have the pattern, simply double the backslash (in the "Question mark" portion), remove spaces and comments (if you wish), and place into a Java String:
"^ftp://([^:]+):([^@]+)@(.*?)/(.*?)(\\?.*)?$";
Use that with Pattern/Matcher and you should be able to extract things nicely.
精彩评论