C++ - Splitting Filename and File Extension
Ok, first of all I don't want to use Boost, or any external libraries. I just want to use the C++ Standard Library. I can easily split strings with a given delimiter with my split()
function:
void split(std::string &string, std::vector<std::string> &tokens, const char &delim) {
std::string ea;
std::stringstream stream(string);
while(getline(stream, ea, delim))
tokens.push_back(ea);
}
I do this on filenames. But there's a problem. There are files that have extensions like: tar.gz
, tar.bz2
, etc. Also there are some filenames that have extra dots. Some.file.name.tar.gz
. I wish to separate Some.fi开发者_如何学JAVAle.name
and tar.gz
Note: The number of dots in a filename isn't constant.
I also tried PathFindExtension
but no luck. Is this possible? If so, please enlighten me. Thank you.
Edit: I'm very sorry about not specifying the OS. It's Windows.
I think you could use std::string
find_last_of
to get the index of the last .
, and substr
to cut the string (although the "complex extensions" involving multiple dots will require additional work).
There is no way of doing what you want that does not involve a database of extensions for your purpose. There's nothing magical about extensions, they are just part of a filename (if you gunzip foo.tar.gz
you'll likely get a foo.tar, so for this application .gz actually is "the extension"). So, in order to do what you want, build a database of extensions that you want to look for and fall back on "last dot" if you don't find one.
There's nothing in the C++ standard library -- that is, it's not in the Standard --, but every operating system I know of provides this functionality in a variety of ways.
In Windows you can use _splitpath(), and in Linux you can use dirname() & basename()
The problem is indeed filenames like *.tar.gz
, which can not be split consistently, due to the fact that (at least in Windows) the .tar
part isn't part of the extension. You'll either have to keep a list for these special cases and use a one-dot string::rfind
for the rest or find some pre-implemented way. Note that the .tar.*
extensions aren't infinite, and very much standardized (there's about ten of them I think).
You could create a look-up table of file extensions that you think you might encounter. And also add a command line option to add a new one to the look-up table if you encounter anything new. Then parse through the file name to see if it any entry in the look-up table is a sub-string in the file name.
EDIT: You can also refer to this question: C++/STL string: How to mimic regex like function with wildcards?
精彩评论