开发者

Perl: managing path encodings on Windows

I am struggling working with a path containing non English characters (Activestate Perl, Windows XP). How do I open, write, copy, etc. a file located in a path with let say Greek/Russian/French accented characters? Let's say the directory I want to copy my text.txt file to is: C:\Documents and Settings\στα\Desktop

use File::Spec;
my $save = File::Spec->canonpath( $mw->chooseDirectory() );

my $file = catfile($save 开发者_开发问答, "renamed_text.txt");

my $input = "üüü\text.txt";
copy ($input, $file) or die "File cannot be copied.";


I don't have privileges to vote up the answer from Chris Dolan but I have resolved this problem for path names here in Japan with the same solution based on Win32::Codepage.

This probably needs confirmation but I think Perl assumes UTF8 for all non-ASCII path names. On Linux and OS X, this works fine because the OS pathnames are encoded in UTF8. But, on older versions of Windows (pre Windows 7?) pathnames are encoded in the locale of the country (e.g. Shift-jis here in Japan). So, all Perl calls that return pathnames with non-ASCII characters get messed up.

The solution that I used was to find the locale encoding using Win32:Codepage and then encode that to UTF8 when reading files. Then, when writing (or updating) files, I would decode back to the locale encoding.


I had this same problem in a project a few years back (our PAR-packed GUI app had to work under Shift-JIS encoding). I tried LOTS of techniques to make Perl 5.8 do this right automatically. In the end, my tedious-but-effective solution was to encode EVERY filename just before passing it to the builtins.

First, set up the utility function:

use Encode;
use Win32::Codepage;
my $encoding = Win32::Codepage::get_encoding() || q{};
if ($encoding) {
    $encoding = Encode::resolve_alias($encoding) || q{};
}
sub encode_filename {
    my ($filename) = @_;
    return $encoding ? encode($encoding, $filename) : $filename;
}

Then, use it everywhere:

next if (! -d encode_filename($tmpldir));
my $file = SWF::File->new(encode_filename($dest));
@entries = File::Slurp::read_dir(encode_filename($srcdir));
etc...

I even wrote a little checker to make sure I used it everywhere!

egrep "\-[a-zA-Z] |open[^_]|[^ ]parse|unlink|symlink|mkdir[^_]|mkpath|rename[^\']|File::Copy::copy|rmtree|getTemplate[^D]|write_file|read_file|read_dir" *.pl `find lib -name '*.pm'` | grep -
v encode_filename | egrep -v '^[^:]+: *(\#|_announce|debug)'

If you miss even one, you'll get the "Wide-character" warning at runtime...


Perl's native functions cannot be used in this case. Use functions in Win32 module which support Unicode characters. Win32 was first released with perl v5.8.7.


I discovered I had to disable UAC (User Access Control) on Microsoft Windows Vista before I could successfully install either Win32::Locale or Win32::Codepage. (Thank you, Chris Dolan, for writing the latter module.)


I also had problems with UAC (User Access Control) on Windows 7 and newer. I finally found out, that access to the required Registry key only has read permissions since WIndows Vista. You can easily patch Win32::Codepage to work without administrative privileges if you open the file in your favourite editor and replace:

  $codekey = Win32::TieRegistry->new($CODEPAGE_REGISTRY_KEY,
                                     { Delimiter => "/" }
                                    );

  $codekey = Win32::TieRegistry->new($CODEPAGE_REGISTRY_KEY,
                                     { Access=>"KEY_READ", Delimiter => "/" }
                                    );

This has helped on my installation.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜