Perl UTF8 CGI and DBI ... what's the correct workflow?

2023-02-03 20:01 问答作者：

I am having the pleasure of rebuilding a perl based web framework to UTF8 support. I took the following steps

for the main script:

use open IO => ":utf8",":std";

use utf8;

for the DBI Adapter:

$self->{dbh}->{'mysql_enable_utf8'} = 1;'

and in my request parser for POST and GET, based on CGI:

foreach (@val) { $_ = decode("UTF-8",$_); }

This, as far as I can tell, works just fine on my local Ubuntu with Perl 5.10.1, but on the webserver which runs 5.10, decoding POST or GET will mess up the text.

I must admit, I am very confused by the whole UTF8 thing. I need to

Read Templates

Get data from mySQL

Process POST and GET inser开发者_如何学运维t into mySQL

write Templates

Is there anything I'm forgetting here? What could cause the inconstant behaviour? Does every module I use in the main script need to specifically use utf8 or is it enough if the main script does that?

Thanks for any hints,

thomas

use utf8; is, as several people have said, a no-op as far as your i/o problems are concerned: all it says is 'treat my source code as utf8 encoded'.

MySQL/DBI approach is bang on the money.

For CGI, update to a recent CGI and set $CGI::PARAM_UTF8=1 and it'll do the decode() for you. (As a general tip, BTW, decode_utf8() is considerably faster!)

As for the other problem, you may want to compare your Apache server configs to see if AddDefaultCharset is set to some non-helpful value.

Also, see my talk at last year's London Perl Workshop for a more detailed look at Perl and Unicode.

The solution here is the ordering.

$dbh->{mysql_enable_utf8} = 1;
$dbh->connect ...
$dbh->do('SET NAMES \'utf8\';') || die;

Enjoy :)

Thomas,

With the risk of extra negative points, I don't know if this is still needed, but in the past I needed to make sure my DBI behaved properly with utf8 by doing:

my $dbh = DBI->connect(...); $dbh->{mysql_enable_utf8} = 1; $dbh->do("set names 'utf8';");

Maybe it can be of help

First of all my condolances about your latin->utf8 job. I did that for a large application a few years back and the wrinkles it got me still haven't worn off.

What I recommend you to do is turn everything into UTF8 and not try to do decoding and stuff. That will definitely screw up somewhere. storing utf8 data in a latin table is a recipe for disaster. I remember at one point having double and tripple encoded utf8 strings in my database and no way to tell how to get back the original string.

The steps you should take:

Create a secondary database structure with UTF8 collated table instead of latin
extract everything out of your primary database and insert into the new database (hoping you haven't stored any utf8 strings in there yet)
make sure the Mime headers your application sends the browser specifies the encoding is in utf8, all data you get back from these pages automatically take the encoding of the page itself
cross your fingers and take a vacation...

You shouldn't have to change much in your application since the DBI utf8 handling is fairly good at this time.

Good luck!

Rob

Have a look at this. It is fairly general but it will get your lexicon straight and though many examples are in python, per is also there. BTW, if you try to stuff latin-1 (or other) encoded stuff without decoding/reencoding, disaster will ensue.

For more help, post specifics.

Cheers

You'll find a complete (and tested) guide here.
It misses nothing out; Perl, DBI and MySQL. All utf8'd.
I had similar pain but got it all done in the end.

继续阅读：perl utf-8

Perl UTF8 CGI and DBI ... what's the correct workflow?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？