How do I print the string which __FILE__ expands to correctly?
Consider this program:
#include <stdio.h>
int main() {
printf("%s\n", __FILE__);
return 0;
}
Depending on the name of the file, this program works - or not. The issue I'm facing is that I'd like to print the name of the current file in an encoding-safe way. However, in case the file has funny 开发者_JAVA技巧characters which cannot be represented in the current code page, the compiler yields a warning (rightfully so):
?????????.c(3) : warning C4566: character represented by universal-character-name '\u043F' cannot be represented in the current code page (1252)
How do I tackle this? I'd like to store the string given by __FILE__
in e.g. UTF-16 so that I can properly print it on any other system at runtime (by converting the stored UTF-16 representation to whatever the runtime system uses). To do so, I need to know:
- What encoding is used for the string given by
__FILE__
? It seems that, at least on Windows, the current system code page (in my case, Windows-1252) is used - but this is just guessing. Is this true? - How can I store the UTF-8 (or UTF-16) representation of that string in my source code at build time?
My real life use case: I have a macro which traces the current program execution, writing the current sourcecode/line number information to a file. It looks like this:
struct LogFile {
// Write message to file. The file should contain the UTF-8 encoded data!
void writeMessage( const std::string &msg );
};
// Global function which returns a pointer to the 'active' log file.
LogFile *activeLogFile();
#define TRACE_BEACON activeLogFile()->write( __FILE__ );
This breaks in case the current source file has a name which contains characters which cannot be represented by the current code page.
Use can use the token pasting operator, like this:
#define WIDEN2(x) L ## x
#define WIDEN(x) WIDEN2(x)
#define WFILE WIDEN(__FILE__)
int main() {
wprintf("%s\n", WFILE);
return 0;
}
__FILE__
will always expand to character string literal, thus in essence it will be compatible to char const*
. This means that a compiler implementation has not much other choice than using the raw byte representation of the source file name as it presents itself at compile time.
Whether or not this is something sensible in the current locale or not doesn't matter, you could have a source file name that contains basically garbage, as long as your run time system and compiler accept it as a valid file name.
If you, as a user, have a different locale with different encoding than is used in your file system, you will see a lot of ???? or alike.
But if both your locales agree upon the encoding, a plain printf
should suffice and your terminal (or whatever you use to look at the output) should be able to print the characters correctly.
So the short answer is, it will only work if your system is consistent w.r.t encoding. Otherwise your out of luck, since guessing encodings is a quite difficult task.
As for the encoding, I'm going to guess it's what's used by the filesystem, probably Unicode.
As for dealing with it, how 'bout changing you code it something like:
#define TRACE_BEACON activeLogFile()->write( FixThisString(__FILE__ ));
std::string FixThisString(wchar_t* bad_string) { .....}
(Implementation of FixThisString is left as an exercise for the student.)
The best solution is to use source filenames in the portable filename character set [A-Za-z0-9._-]
. Since Windows does not support UTF-8, there's no way for arbitrary non-ASCII characters to be represented in ordinary strings without dependence on your configured local language.
gcc probably does not care; it treats all filenames as 8bit strings and so if the filename is accessible to gcc, its name will be representable. (I know cygwin provides a UTF-8 environment by default, and modern *nix will normally be UTF-8.) For MSVC, you might be able to use the preprocessor to prepend L
to expansion of __FILE__
and use %ls
to format it.
In MSVC, you can turn on Unicode and get UTF-16 encoded strings. It's in the project properties somewhere. In addition, you should just use wcout/cout not printf/wprintf. Windows needed Unicode before Unicode existed, so they had a custom multi-byte character encoding, which is the default. However, Windows does support UTF16- it's for example, C#.
#include <iostream>
int main() {
std::wcout << __WFILE__;
}
精彩评论