开发者

Can the French and Spanish special chars be held in a varchar?

French and Spanish have special chars in them that are not used in normal English (accented vowels and such).

Are those chars supported in a varchar? Or do I need a nvarchar for开发者_C百科 them?

(NOTE: I do NOT want a discussion on if I should use nvarchar or varchar.)


What SQL Implementation(s) are you talking about?

I can speak about Microsoft Sql Server; other SQL implementations, not so much.

For Microsoft SQL Server, the default collation is SQL_Latin1_General_CP1_CI_AS (Latin 1 General, case-preserving, case-insensitive, accent-sensitive). It allows the round-trip representation of most western European languages in single-byte form (varchar) rather than double-byte form (nvarchar).

It's built on the "Windows 1252" code page. That code page is effectively ISO-8859-1 with the code point range 0x80–0x9F being represented by an alternate set of glyphs, including the Euro symbol at 0x80. ISO-8859-1 specifies that code point range as control characters, which have no graphical representation.

ISO-8859-1 consists of the first 256 characters of Unicodes Basic Multilinigual Plane, covering the entire domain of an 8-bit character (0x00–0xFF). For details and comparison see

  • Unicode CO Controls and Basic Latin
  • Unicode C1 Controls and Latin-1 Supplement
  • Window 1252 Code Page
  • ISO-8859-1

Western European languages that will have a hard time with this collating sequence include (but aren't necessarily limited to) Latvian, Lithuanian, Polich, Czech and Slovak. If you need to support those, you'll either need to use a different collation (SQL Server offers a plethora of collations), or move to using nvarchar.

One should note that mixing collations within a database tends to cause problems. Deviating from the default collation should be done only when necessary and with an understanding of how you can shoot yourself in the foot with it.

I suspect Oracle and DB2 provide similar support. I don't know about MySQL or other implementations.


You have to use nvarchar.

http://theniceweb.com/archives/156

Most of the characters will fit in varchar but some won't, why take the risk.

Related Question

When must we use NVARCHAR/NCHAR instead of VARCHAR/CHAR in SQL Server?


The characters that can be stored in a varchar field depend entirely on what code page is defined for that particular field. If there are specific characters that you want to store, then you can choose a code page that will store those characters, and it should work. Badly.

My advice is to always use nvarchar to store strings in a SQL database. In fact, I would consider non-Unicode character encodings to be a bug, whether it is in a database or anywhere else.

Your operating system uses Unicode internally (whether Windows, Mac, Linux, or whatever). The JVM and the .NET Framework use Unicode internally. There is simply no point to doing code page conversions every time you query a database. There is no point to doing code page conversions every time you write to a database. Just use an nvarchar column, and your strings will go straight from your application to the database untouched—no character conversion lookups, no fallback encoding error handlers, no wierd characters or unexpected question marks.

By using nvarchar for all of your string data in your databases—and Unicode in general everywhere—you can stop concerning yourself with encodings and focus on the core functionality of your application, now and forever.

Today is the day to abandon legacy character encodings.

Do it for the maintainers who are coming after you. Do it for your children. Do it for yourself.


I'm not sure but one of these collations may fit both Spanish and French, this would have to be researched though.

http://dev.mysql.com/doc/refman/5.5/en/charset-charsets.html


Some excellent information, particularly from Nicholas Carey, but nobody directly gave a yes/no answer to your question...

Yes, you can use varchar to handle a mix of French and Spanish, providing your character set is Windows-1252 (or a similar modern superset of ISO-8859-1 with a few extra characters like the Euro symbol). In SQL Server, the character set is chosen by setting the collation (server-wide, per database or per column): Windows-1252 is used by the *Latin1* collations. In MySQL, Windows-1252 is called Latin1.

Note that if you try to store a character outside the repertoire of the chosen character set, the system may throw an error, or silently munge the character into a similar one from its repertoire. E.g. SQL Server will munge a Polish Ł to a simple L, but throw an error for a Japanese character.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜