开发者

C++ - Splitting Filename and File Extension

Ok, first of all I don't want to use Boost, or any external libraries. I just want to use the C++ Standard Library. I can easily split strings with a given delimiter with my split() function:

void split(std::string &string, std::vector<std::string> &tokens, const char &delim) {
    std::string ea;
    std::stringstream stream(string);
    while(getline(stream, ea, delim))
        tokens.push_back(ea);
}

I do this on filenames. But there's a problem. There are files that have extensions like: tar.gz, tar.bz2, etc. Also there are some filenames that have extra dots. Some.file.name.tar.gz. I wish to separate Some.fi开发者_如何学JAVAle.name and tar.gz Note: The number of dots in a filename isn't constant.

I also tried PathFindExtension but no luck. Is this possible? If so, please enlighten me. Thank you.

Edit: I'm very sorry about not specifying the OS. It's Windows.


I think you could use std::string find_last_of to get the index of the last ., and substr to cut the string (although the "complex extensions" involving multiple dots will require additional work).


There is no way of doing what you want that does not involve a database of extensions for your purpose. There's nothing magical about extensions, they are just part of a filename (if you gunzip foo.tar.gz you'll likely get a foo.tar, so for this application .gz actually is "the extension"). So, in order to do what you want, build a database of extensions that you want to look for and fall back on "last dot" if you don't find one.


There's nothing in the C++ standard library -- that is, it's not in the Standard --, but every operating system I know of provides this functionality in a variety of ways.

In Windows you can use _splitpath(), and in Linux you can use dirname() & basename()


The problem is indeed filenames like *.tar.gz, which can not be split consistently, due to the fact that (at least in Windows) the .tar part isn't part of the extension. You'll either have to keep a list for these special cases and use a one-dot string::rfind for the rest or find some pre-implemented way. Note that the .tar.* extensions aren't infinite, and very much standardized (there's about ten of them I think).


You could create a look-up table of file extensions that you think you might encounter. And also add a command line option to add a new one to the look-up table if you encounter anything new. Then parse through the file name to see if it any entry in the look-up table is a sub-string in the file name.

EDIT: You can also refer to this question: C++/STL string: How to mimic regex like function with wildcards?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜