RTF CP1252 to Text UTF-8
Here is a file that I need to convert to plain text in MAC OSX zshell. http://narod.ru/disk/6431540001/Test_rtf.rtf.html
I've tried unrtf, rtf2txt, rtf2html = no result. They can't convert ru_cp1252. Also I've tried
unrtf file.rtf | iconv -f cp1252 -t UTF-8 No result.
I'll be happy with any solution: shell/perl/python/ruby
If you dont want to download the file there is a part of the rtf file as I see it in zshell with cat:
{\rtf1\adeflang1025\ansi\ansicpg10000\uc1\adeff0\deff0\stshfdbch0\stshfloch0\stshfhich0\stshfbi0\deflang1033\deflangfe1033\themelang1033\themela ngfe0\themelangcs0{\fonttbl{\f0\fbidi \fnil\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;}{\f1\fbidi \fnil\fcharset0\fprq2{\*\ panose 020b0604020202020204}Arial;}^M{\f1\fbidi \fnil\fcharset0\fprq2{\*\panose 020b0604020202020204}Arial;}{\flomajor\f31500\fbidi \fnil\fchars et0\fprq2{\*\panose 020b0604020202020204}Arial;}{\fdbmajor\f31501\fbidi \fnil\fcharset78\fprq2 \'82\'6c\'82\'72 \'83\'53\'83\'56\'83\'62\'83\'4e ;}^M{\fhimajor\f31502\fbidi \fnil\fcharset0\fprq2{\*\panose 020f0502020204030204}Calibri;}{\fbimajor\f31503\fbidi \fnil\fcharset0\fprq2{\*\panos e 02020603050405020304}Times New Roman;}^M{\flominor\f31504\fbidi \fnil\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;}{\fdbmin or\f31505\fbidi \fnil\fcharset78\fprq2 \'82\'6c\'82\'72 \'96\'be\'92\'a9;}^M{\fhiminor\f31506\fbidi \fnil\fcharset0\fprq2{\*\panose 020405030504 06030204}Cambria;}{\fbiminor\f31507\fbidi \fnil\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;}{\f487\fbidi \fnil\fcharset238\f prq2 Times New Roman CE;}^M{\f488\fbidi \fnil\fcharset204\fprq2 Times New Roman Cyr;}{\f490\fbidi \fnil\fcharset161\fprq2 Times New Roman Greek; }{\f491\fbidi \fnil\fcharset162\fprq2 Times New Roman Tur;}{\f492\fbidi \fnil\fcharset177\fprq2 Times New Roman (Hebrew);}^M{\f493\fbidi \fnil\f charset178\fprq2 Times New Roman (Arabid);}{\f494\fbidi \fnil\fcharset186\fprq2 Times New Roman Baltic;}{\f495\fbidi \fnil\fcharset87\fprq2 Time s New Roman (That);}{\f497\fbidi \fnil\fcharset238\fprq2 Arial CE;}^M{\f498\fbidi \fnil\fcharset204\fprq2 Arial Cyr;}{\f500\fbidi \fnil\fcharset 161\fprq2 Arial Greek;}{\f501\fbidi \fnil\fcharset162\fprq2 Arial Tur;}{\f502\fbidi \fnil\fcharset177\fprq2 Arial (Hebrew);}{\f503\fbidi \fnil\f charset178\fprq2 Arial (Arabid);}^M{\f504\fbidi \fnil\fcharset186\fprq2 Arial Baltic;}{\f505\fbidi \fnil\fcharset87\fprq2 Arial (That);}{\f497\f bidi \fnil\fcharset238\fprq2 Arial CE;}{\f498\fbidi \fnil\fcharset204\fprq2 Arial Cyr;}{\f500\fbidi \fnil\fcharset161\fprq2 Arial Greek;}^M{\f50 1\fbidi \fnil\fcharset162\fprq2 Arial Tur;}{\f502\fbidi \fnil\fcharset177\fprq2 Arial (Hebrew);}{\f503\fbidi \fnil\fcharset178\fprq2 Arial (Arab id);}{\f504\fbidi \fnil\fcharset186\fprq2 Arial Baltic;}{\f505\fbidi \fnil\fcharset87\fprq2 Arial (That);}^M{\flomajor\f31508\fbidi \fnil\fchars et238\fprq2 Arial CE;}{\flomajor\f31509\fbidi \fnil\fcharset204\fprq2 Arial Cyr;}{\flomajor\f31511\fbidi \fnil\fcharset161\fprq2 Arial Greek;}{\ flomajor\f31512\fbidi \fnil\fcharset162\fprq2 Arial Tur;}^M{\flomajor\f31513\fbidi \fnil\fcharset177\fprq2 Arial (Hebrew);}{\flomajor\f31514\fbi di \fnil\fcharset178\fprq2 Arial (Arabid);}{\flomajor\f31515\fbidi \fnil\fcharset186\fprq2 Arial Baltic;}{\flomajor\f31516\fbidi \fnil\fcharset8 7\fprq2 Arial (That);}^M{\fdbmajor\f31520\fbidi \fnil\fcharset0\fprq2 \'82\'6c\'82\'72 \'83\'53\'83\'56\'83\'62\'83\'4e Western;}{\fdbmajor\f315 18\fbidi \fnil\fcharset238\fprq2 \'82\'6c\'82\'72 \'83\'53\'83\'56\'83\'62\'83\'4e CE;}^M{\fdbmajor\f31519\fbidi \fnil\fcharset204\fprq2 \'82\'6 c\'82\'72 \'83\'53\'83\'56\'83\'62\'83\'4e Cyr;}{\fdbmajor\f31521\fbidi \fnil\fcharset161\fprq2 \'82\'6c\'82\'72 \'83\'53\'83\'56\'83\'62\'83\'4 e Greek;}^M{\fdbmajor\f31522\fbidi \fnil\fcharset162\fprq2 \'82\'6c\'82\'72 \'83\'53\'83\'56\'83\'62\'83\'4e Tur;}{\fdbmajor\f31525\fbidi \fnil\ fcharset186\fprq2 \'82\'6c\'82\'72 \'83\'53\'83\'56\'83\'62\'83\'4e Baltic;}^M{\fhimajor\f31528\fbidi \fnil\fcharset238\fprq2 Calibri CE;}{\fhim aj开发者_JAVA百科or\f31529\fbidi \fnil\fcharset204\fprq2 Calibri Cyr;}{\fhimajor\f31531\fbidi \fnil\fcharset161\fprq2 Calibri Greek;}{\fhimajor\f31532\fbidi \f nil\fcharset162\fprq2 Calibri Tur;}^M{\fhimajor\f31535\fbidi \fnil\fcharset186\fprq2 Calibri Baltic;}{\fhimajor\f31536\fbidi \fnil\fcharset87\fp rq2 Calibri (That);}{\fbimajor\f31538\fbidi \fnil\fcharset238\fprq2 Times New Roman CE;}^M{\fbimajor\f31539\fbidi \fnil\fcharset204\fprq2 Times New Roman Cyr;}{\fbimajor\f31541\fbidi \fnil\fcharset161\fprq2 Times New Roman Greek;}{\fbimajor\f31542\fbidi \fnil\fcharset162\fprq2 Times New Roman Tur;}^M{\fbimajor\f31543\fbidi \fnil\fcharset177\fprq2 Times New Roman (Hebrew);}{\fbimajor\f31544\fbidi \fnil\fcharset178\fprq2 Times New Roman (Arabid);}{\fbimajor\f31545\fbidi \fnil\fcharset186\fprq2 Times New Roman Baltic;}^M{\fbimajor\f31546\fbidi \fnil\fcharset87\fprq2 Times New Roman (That);}{\flominor\f31548\fbidi \fnil\fcharset238\fprq2 Times New Roman CE;}{\flominor\f31549\fbidi \fnil\fcharset204\fprq2 Times New Roman Cyr;}^M{\flominor\f31551\fbidi \fnil\fcharset161\fprq2 Times New Roman Greek;}{\flominor\f31552\fbidi \fnil\fcharset162\fprq2 Times New Ro man Tur;}{\flominor\f31553\fbidi \fnil\fcharset177\fprq2 Times New Roman (Hebrew);}^M{\flominor\f31554\fbidi \fnil\fcharset178\fprq2 Times New R oman (Arabid);}{\flominor\f31555\fbidi \fnil\fcharset186\fprq2 Times New Roman Baltic;}{\flominor\f31556\fbidi \fnil\fcharset87\fprq2 Times New Roman (That);}^M{\fdbminor\f31560\fbidi \fnil\fcharset0\fprq2 \'82\'6c\'82\'72 \'96\'be\'92\'a9 Western;}{\fdbminor\f31558\fbidi \fnil\fcharset2 ...................... }
my tip is to use a text editor that can handle different charsets. Open the file and than store as UTF-8.
I often use jEdit for kind of similar tasks. see jEdit's manual about Character Encodings
精彩评论