开发者

TIdHTTP Get Method showing weird characters while loading xml file (Not showing russian chars)

I am using TIdHTTP to download xml file (currency values) from url: http://nbt.tj/?c=4&id=28&lg=ru&d=21-02-2011&export=xmlout and it showing me weird characters

here is my code

UnicodeString s =serv->Get("http://nbt.tj/?c=4&id=28&lg=ru&d=13-10-2009&export=xmlout");
cxMemo1->Text=s;

I tried to set TIdHTTP charset property to windows-1251 but it's all the same here is the output

<?xml version="1.0" encoding="windows-1251" ?>
<ValCurs Date="13/10/2009" name="Êîòèðîâêè âàëþò óñòàíàâëèâàåìûå åæåäíåâíî">
 <Valute ID="036">
   <CharCode>AUD</CharCode> 
   <Nominal>1</Nominal> 
   <Name>Àâñòðàëèéñêèé äîëëàð</Name> 
   <Value>3,9651</Value> 
  </Valute>
 <Valute ID="944">
   <CharCode>AZN</CharCode> 
   <Nominal>1</Nominal> 
   <Name>Àçåðáàéäæàíñêèé ìàíàò</Name> 
   <Value>5,4526</Value> 
  </Valute>
 <Valute ID="826">
   <CharCode>GBP</CharCode> 
   <Nominal>1</Nominal> 
   <Name>Àíãëèéñêèé ôóíò ñòåðëèíãîâ</Name> 
   <Value>6,9160</Value> 
  </Valute>
 <Valute ID="051">
   <CharCode>AMD</CharCode> 
   <Nominal>100</Nominal> 
   <Name>Àðìÿíñêèõ äðàìîâ</Name> 
   <Value>1,1353</Value> 
  </Valute>
 <Valute ID="971">
   <CharCode>AFN</CharCode> 
   <Nominal>10</Nominal> 
   <Name>Àôãàíñêèõ àôãàíè</Name> 
   <Value>0,8816</Value> 
  </Valute>
 <Valute ID="974">
   <CharCode>BYR</CharCode> 
   <Nominal>100</Nominal> 
   <Name>Áåëîðóññêèõ ðóáëåé</Name> 
   <Value>0,1598</Value> 
  </Valute>
 <Valute ID="981">
   <CharCode>GEL</CharCode> 
   <Nominal>1</Nominal> 
   <Name>Ãðóçèíñêèé ëàðè</Name> 
   <Value>2,6111</Value> 
  </Valute>
 <Valute ID="208">
   <CharCode>DKK</CharCode> 
   <Nominal>1</Nominal> 
   <Name>Äàòñêàÿ êðîíà</Name> 
   <Value>0,8680</Value> 
  </Valute>
 <Valute ID="784">
   <CharCode>AED</CharCode> 
   <Nominal>1</Nominal> 
   <Name>Äèðõàì  ÎÀÝ</Name> 
   <Value>1,1926</Value> 
  </Valute>
 <Valute ID="840">
   <CharCode>USD</CharCode> 
   <Nominal>1</Nominal> 
   <Name>Äîëëàð ÑØÀ</Name> 
   <Value>4,3806</Value> 
  </Valute>
 <Valute ID="978">
   <CharCode>EUR</CharCode> 
   <Nominal>1</Nominal> 
   <Name>ÅÂÐÎ</Name> 
   <Value>6,4705</Value> 
  </Valute>
 <Valute ID="356">
   <CharCode>INR</CharCode> 
   <Nominal>10</Nominal> 
   <Name>Èíäèéñêèõ ðóïèé</Name> 
   <Value>0,9428</Value> 
  </Valute>
 <Valute ID="364">
   <CharCode>IRR</CharCode> 
   <Nominal>1000</Nominal> 
   <Name>Èðàíñêèõ ðèàëîâ</Name> 
   <Value>0,4413</Value> 
  </Valute>
 <Valute ID="352">
   <CharCode>ISK</CharCode> 
   <Nominal>10</Nominal> 
   <Name>Èñëàíäñêèõ êðîí</Name> 
   <Value>0,3504</Value> 
  </Valute>
 <Valute ID="398">
   <CharCode>KZT</CharCode> 
   <开发者_运维技巧;Nominal>10</Nominal> 
   <Name>Êàçàõñêèõ òåíãå</Name> 
   <Value>0,2906</Value> 
  </Valute>
 <Valute ID="124">
   <CharCode>CAD</CharCode> 
   <Nominal>1</Nominal> 
   <Name>Êàíàäñêèé äîëëàð</Name> 
   <Value>4,2337</Value> 
  </Valute>
 <Valute ID="417">
   <CharCode>KGS</CharCode> 
   <Nominal>10</Nominal> 
   <Name>Êèðãèçñêèõ ñîìîâ</Name> 
   <Value>1,0032</Value> 
  </Valute>
 <Valute ID="156">
   <CharCode>CNY</CharCode> 
   <Nominal>1</Nominal> 
   <Name>Êèòàéñêèé þàíü</Name> 
   <Value>0,6420</Value> 
  </Valute>
 <Valute ID="414">
   <CharCode>KWD</CharCode> 
   <Nominal>1</Nominal> 
   <Name>Êóâåéòñêèé äèíàð</Name> 
   <Value>15,2794</Value> 
  </Valute>
 <Valute ID="428">
   <CharCode>LVL</CharCode> 
   <Nominal>1</Nominal> 
   <Name>Ëàòâèéñêèé ëàò</Name> 
   <Value>9,1054</Value> 
  </Valute>
 <Valute ID="440">
   <CharCode>LTL</CharCode> 
   <Nominal>1</Nominal> 
   <Name>Ëèòîâñêèé ëèò</Name> 
   <Value>1,8712</Value> 
  </Valute>
 <Valute ID="458">
   <CharCode>MYR</CharCode> 
   <Nominal>1</Nominal> 
   <Name>Ìàëàéçèéñêèé ðèíããèò</Name> 
   <Value>1,2881</Value> 
  </Valute>
 <Valute ID="498">
   <CharCode>MDL</CharCode> 
   <Nominal>1</Nominal> 
   <Name>Ìîëäàâñêèé ëåé</Name> 
   <Value>0,3932</Value> 
  </Valute>
 <Valute ID="949">
   <CharCode>TRY</CharCode> 
   <Nominal>1</Nominal> 
   <Name>Íîâàÿ òóðåöêàÿ ëèðà</Name> 
   <Value>2,9934</Value> 
  </Valute>
 <Valute ID="578">
   <CharCode>NOK</CharCode> 
   <Nominal>1</Nominal> 
   <Name>Íîðâåæñêàÿ êðîíà</Name> 
   <Value>0,7749</Value> 
  </Valute>
 <Valute ID="586">
   <CharCode>PKR</CharCode> 
   <Nominal>10</Nominal> 
   <Name>Ïàêèñòàíñêèõ ðóïèé</Name> 
   <Value>0,5260</Value> 
  </Valute>
 <Valute ID="985">
   <CharCode>PLN</CharCode> 
   <Nominal>1</Nominal> 
   <Name>Ïîëüñêèé çëîòûé</Name> 
   <Value>1,5182</Value> 
  </Valute>
 <Valute ID="682">
   <CharCode>SAR</CharCode> 
   <Nominal>1</Nominal> 
   <Name>Ðèàë Ñàóäîâñêîé Àðàâèè</Name> 
   <Value>1,1681</Value> 
  </Valute>
 <Valute ID="810">
   <CharCode>RUB</CharCode> 
   <Nominal>10</Nominal> 
   <Name>Ðîññèéñêèõ ðóáëåé</Name> 
   <Value>1,4814</Value> 
  </Valute>
 <Valute ID="960">
   <CharCode>XDR</CharCode> 
   <Nominal>1</Nominal> 
   <Name>ÑÄÐ</Name> 
   <Value>6,9556</Value> 
  </Valute>
 <Valute ID="702">
   <CharCode>SGD</CharCode> 
   <Nominal>1</Nominal> 
   <Name>Ñèíãàïóðñêèé äîëëàð</Name> 
   <Value>3,1326</Value> 
  </Valute>
 <Valute ID="764">
   <CharCode>THB</CharCode> 
   <Nominal>1</Nominal> 
   <Name>Òàèëàíäñêèé áàò</Name> 
   <Value>0,1314</Value> 
  </Valute>
 <Valute ID="795">
   <CharCode>TMM</CharCode> 
   <Nominal>1000</Nominal> 
   <Name>Òóðêìåíñêèõ ìàíàòîâ</Name> 
   <Value>0,3074</Value> 
  </Valute>
 <Valute ID="860">
   <CharCode>UZS</CharCode> 
   <Nominal>100</Nominal> 
   <Name>Óçáåêñêèõ ñóìîâ</Name> 
   <Value>0,2918</Value> 
  </Valute>
 <Valute ID="980">
   <CharCode>UAH</CharCode> 
   <Nominal>1</Nominal> 
   <Name>Óêðàèíñêàÿ ãðèâíà</Name> 
   <Value>0,5313</Value> 
  </Valute>
 <Valute ID="752">
   <CharCode>SEK</CharCode> 
   <Nominal>1</Nominal> 
   <Name>Øâåäñêàÿ êðîíà</Name> 
   <Value>0,6266</Value> 
  </Valute>
 <Valute ID="756">
   <CharCode>CHF</CharCode> 
   <Nominal>1</Nominal> 
   <Name>Øâåéöàðñêèé ôðàíê</Name> 
   <Value>4,2555</Value> 
  </Valute>
 <Valute ID="233">
   <CharCode>EEK</CharCode> 
   <Nominal>1</Nominal> 
   <Name>Ýñòîíñêàÿ êðîíà</Name> 
   <Value>0,4130</Value> 
  </Valute>
 <Valute ID="392">
   <CharCode>JPY</CharCode> 
   <Nominal>10</Nominal> 
   <Name>ßïîíñêèõ èåí</Name> 
   <Value>0,4850</Value> 
  </Valute>
</ValCurs>

What do need to do ? Any suggestions?


You are using the version of TIdHTTP::Get() that returns a UTF-16 encoded UnicodeString. That version of Get() decodes the raw bytes from the received content's specified charset to UTF-16 (TIdHTTP recognizes various XML-based Content-Type values, and if detected then it extracts the charset from the XML prolog directly, windows-1251 in this case, regardless of what the HTTP server said the charset is). What you are seeing in your TMemo are decoded Unicode characters, not the original encoded Ansi octets.

In general, XML should not be treated this way. Proper byte encoding is important to XML. You should use the version of Get() that downloads the data to a TStream instead of a UnicodeString. Then you can use the original undecoded bytes as needed, such as pass them to a real XML parser like TXMLDocument, eg:

TMemoryStream *XML = new TMemoryStream;
serv->Get("http://nbt.tj/?c=4&id=28&lg=ru&d=13-10-2009&export=xmlout", XML);
XML->Position = 0;
// use XML as needed...
cxMemo1->Text = ReadStringFromStream(XML, Indy8BitEncoding()); 
delete XML;


Try "cp1251" or just "1251" as charset property. Or try using this function:

function RussianToUnicode(S: String): String;
var Wrd:Word;
  pW,pR:PWord;
  len:Integer;
begin
  pW:=@S[1];
  len:=Length(S);
  SetLength(Result,len);
  pR:=@Result[1];
  while Len<>0 do begin
    Wrd:=pW^;
    case Wrd of
      $C0..$DF,$E0..$FF:pR^:=Wrd+$0350;
      else pR^:=Wrd;
    end;
    inc(pW);
    inc(pR);
    dec(Len);
  end;
end;

Use it like this:

text:= RussianToUnicode(IdHTTP.Get('url'));


Thanks to @remy-lebeau-teamb my problem has been solved!!!

 TMemoryStream *XML = new TMemoryStream;
 serv->Get("http://nbt.tj/?c=4&id=28&lg=ru&d=13-10-2009&export=xmlout", XML);
 XML->Position = 0;
// use XML as needed...
 AnsiString s="";
 ReadStringFromStream(XML,s);
 cxMemo1->Text=s;
 delete XML;
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜