Lazarus. Equivalent to Chr() for Unicode symbols
Is there any function in freepascal to show the Unicode symbol by its code (e.g. U+1D15E)? Unfortunately Chr()
works only with ANSI symbols (with codes less than 127).
?
or something else because they are absent in system fonts).Take a look at this page. I assume that Freepascal either uses UTF-16, in which it becomes a surrogate pair of two WideChars (see table) or UTF-8, in which it becomes a sequence of byte values (see table again).
UTF-8:
const
HalfNoteString = UTF8String(#$F0#$9D#$85#$9E);
UTF-16:
const
HalfNoteString = UnicodeString(#$D834#$DD5E);
The names of the string types may differ, as I don't know FreePascal very well. Perhaps AnsiString and WideString.
I have never used Free Pascal, but if I were you, I'd try
var
s: char;
begin
s := char($222b); // Just cast a word
or, if the compiler is really stubborn,
var
s: char;
begin
PWord(@s)^ := $222b; // Forcibly write a word
Current unicode status of FPC to my best knowledge
- The codepage of literals can be set with $codepage http://www.freepascal.org/docs-html/prog/progsu81.html
- FPC 2.4.x+ does have unicodestring (since it is +/- Kylix widestring) but only basic routine support. (pos and copy, not routines like format), but the "record" misses the codepage field.
- Lazarus widgets expect UTF8 in normal ansistrings (D7..D2007 ansistrings without codepage data), and programmers must manually insert conversions if necessary. So on Windows the widgets ARE mostly using unicode (-W) calls, but take ansistrings with UTF8 in it.
- FPC doesn't follow the utf8 in ansistring scheme , so for some string accepting routines in sysutils, there are special routines in Lazarus that assume UTF8 that call -W variants)
- FPC ansistring is the system default 1-byte encoding. ansi on Windows, utf8 on most other platforms.
- Trunk, 2.7.1, provides support for the new D2009+ ansistring (with codepages).
- There has been no discussion yet how to deal with the default stringtype (e.g. will "string" be utf8string on *nix and unicodestring on Windows, or unicodestring or utf8string everywhere?)
- Other unicodestring related enhancement (like encoding parameters to tstringlist.savetofile) are not implemented. Likewise for the pseudo objects (like TCharacter which are afaik mostly static)
Update: 2.7.1 has a variable encoding ansistring type, and lazarus has been fixed to keep working. Nothing is really taking advantage from it yet though, e.g. most of the RTL still uses -A calls, and prototypes of sysutils and system procedures that takes strings haven't changed to rawbytestring yet.
I assume the problem is to convert from UCS4 encoding (which is actually a Unicode codepoint number) to UTF16.
In Delphi, you can use UCS4StringToUnicodeString
function.
Warning: Be careful with UCS4String
type. It is actually a zero-terminated dynamic array, not a string (that means it is zero-based).
var
S1: UCS4String;
S: string;
begin
SetLength(S1, 2);
S1[0]:= UCS4Char($1D15E);
S1[1]:= UCS4Char(0);
S:= UCS4StringToUnicodeString(S1);
ShowMessage(Format('%d, %x, %x', [Length(S), Ord(S[1]), Ord(S[2])]));
end;
精彩评论