开发者

Java Regex Parse URL

I have a FTP URL, and have to parse the URL to get the username, password, server name and the directory.开发者_运维问答 What can be the regular exp to do it?

ex: ftp://userName:password@someServer/direcory-name


Use java.net.URI. It will be more robust, and will probably be faster.

The problems with using a Regex include:

  • either too simple to deal with edge cases, or too complicated / expensive because it deals with those cases, and

  • it is unlikely to handle %-encoding correctly.

For example, the (original) regex tendered by @Larry doesn't deal with cases where the URL doesn't have userInfo, etcetera.


As the comments stated, a URL is a URI but not (necessarily) vice-versa. But the reasons that I recommend java.net.URI not java.net.URL are:

  • it has a better parser, and
  • it has a better API for examining the parts of the parsed url.


Whenever I think of regexes, I think "Perl" and write a quick and dirty pattern (qr{xxx}x) and test it against test input.

In your case, assuming that user name, password, server, and directory name all need to be parsed out (and are mandatory), I'd use the following. Add question marks for "optional" parts of your pattern if you wish to modify this:

qr{
    ^           # Start of text
    ftp:        # Protocol
    //          # Double slash
    ([^:]+)     # $1 = User Name
    :           # Colon
    ([^@]+)     # $2 = Password
    @           # AT sign
    (.*?)       # $3 = Server name
    /           # Single slash
    (.*?)       # $4 = Directory name
    (\?.*)?     # Question mark ends URI
    $           # End of text
}x;

Now that we have the pattern, simply double the backslash (in the "Question mark" portion), remove spaces and comments (if you wish), and place into a Java String:

"^ftp://([^:]+):([^@]+)@(.*?)/(.*?)(\\?.*)?$";

Use that with Pattern/Matcher and you should be able to extract things nicely.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜