Issues with TCL encoding on Eggdrop
I have installed Eggdrop on a new Debian server with TCL8.5 and the latest version of eggdrop. Unfortunately there are issues with my script and the handling of special characters as é, J'aime, etc.
An example might be best to show you:
13:41 <@me> test
13:41 <@me> !tr nl This is a test
13:41 < bot> Dit is een test
13:41 <@me> !tr fr I am a stranger
13:41 < bot> Je suis un étranger
13:41 <@me> !tr fr I love you
13:42 < bot> Je t'aime
I have added the line that says convert-to utf-8 and eggdrop is running at utf-8 too and it seemed to make étranger readable in my irc client, however most characters (Chinese, Arabic) weren't close at all. The TCL code is as follows:
namespace eval gTranslator {
bind pub - !tr gTranslator::translate
proc translate { nick uhost handle chan text } {
package require http
package require json
set lngto [string tolower [lindex [split $text] 0]]
set text [::http::formatQuery q [join [lrange [split $text] 1 end]]]
set dturl "http://ajax.googleapis.com/ajax/services/language/det开发者_如何转开发ect?v=1.0&q=$text"
set res [::json::json2dict [::http::data [::http::geturl $dturl]]]
set lng [dict get $res responseData language]
if { $lng == $lngto } {
putserv "PRIVMSG $chan :\002Error\002 translating $lng to $lngto."
return 0
}
set trurl "http://ajax.googleapis.com/ajax/services/language/translate?v=1.0&langpair=$lng%7c$lngto&$text"
putlog $trurl
set res [::json::json2dict [::http::data [::http::geturl $trurl]]]
putlog $res
#putserv "PRIVMSG $chan :Language detected: $lng"
set translated [dict get $res responseData translatedText]
putserv "PRIVMSG $chan :[encoding convertto utf-8 $translated]"
}
}
Connecting via telnet gave the following additional information:
*** Me joined the party line.
[13:49:34] http://ajax.googleapis.com/ajax/services/language/translate?v=1.0&langpair=en%7cfr&q=I%20like%20cookies
[13:49:34] responseData {translatedText {J'aime les cookies}} responseDetails null responseStatus 200
[13:50:11] http://ajax.googleapis.com/ajax/services/language/translate?v=1.0&langpair=en%7cfr&q=I%20am%20a%20stranger
[13:50:11] responseData {translatedText {Je suis un étranger}} responseDetails null responseStatus 200
There are a number of issues going on here. One is that Google is delivering strings back that have entity encoding applied independent of JSON encoding. You'll have to decode that. Second, you've got a memory leak (tokens returned by http::geturl
need to be manually cleaned up) which it's best to address by writing a helper procedure:
namespace eval gTranslator {
# Factor this out into a helper
proc getJson url {
set tok [http::geturl $url]
set res [json::json2dict [http::data $tok]]
http::cleanup $tok
return $res
}
# How to decode _decimal_ entities; WARNING: high magic factor within!
proc decodeEntities str {
set str [string map {\[ {\[} \] {\]} \$ {\$} \\ \\\\} $str]
subst [regsub -all {&#(\d+);} $str {[format %c \1]}]
}
bind pub - !tr gTranslator::translate
proc translate { nick uhost handle chan text } {
package require http
package require json
set lngto [string tolower [lindex [split $text] 0]]
set text [http::formatQuery q [join [lrange [split $text] 1 end]]]
set dturl "http://ajax.googleapis.com/ajax/services/language/detect?v=1.0&q=$text"
set lng [dict get [getJson $dturl] responseData language]
if { $lng == $lngto } {
putserv "PRIVMSG $chan :\002Error\002 translating $lng to $lngto."
return 0
}
set trurl "http://ajax.googleapis.com/ajax/services/language/translate?v=1.0&langpair=$lng%7c$lngto&$text"
putlog $trurl
set res [getJson $trurl]
putlog $res
#putserv "PRIVMSG $chan :Language detected: $lng"
set translated [decodeEntities [dict get $res responseData translatedText]]
putserv "PRIVMSG $chan :[encoding convertto utf-8 $translated]"
}
}
(You already have the encoding convertto utf-8
applied to work around eggdrop's lack of proper understanding of encodings.)
I've checked the results of querying for an Arabic response, and it appears to be correct UTF-8 returned. As such, any problems you're having with it are in your client. (There may be an issue with some Chinese characters due to the fact that Tcl currently only handles the Basic Multilingual Plane – BMP – of Unicode. This is a known issue.)
精彩评论