开发者

Use google translate to translate printf-based string

I'm trying to use the web-based google translate to translate my english files to another language. They contains characters like %s and %d. Is there a way to protect them from being erroneously translated.

For instance, the text:

Athlete already exists with number %s

is translated to:

Athlète existe déjà avec nombre% s

while I would expect it to be translated to:

Athlète existe déjà avec nombre %s

(I'm processing the input and output so I could add characters around it to 'escape' the %s and %d strings. I thought already to replace %s by some word I'm sure google will not try to translate self, but I hope there 开发者_StackOverflow中文版is a nicer solution)


Strange idea, but..

Replace each format specifier with an unique number in underscores (or whatever survives translation unchanged and does not interfere with you usage of numerals), like:

Athlete already exists with number %s => Athlete already exists with number _001 _

Translate to chinese: 運動員已經存在的號碼 _001_

After that, check if the numbers are in the same order after translation if you had multiple format specifier in a format string translation and if yes, replace the specifier back.


A comment on the PHP article for sprintf also provides a neat solution to this problem.

http://www.php.net/manual/en/function.sprintf.php#93552

/**
 * Converts any sprintf to a Google Translate suitable string.
 */
function _toTranslateSafeString($original)
{
    $pattern = '/(?:%%|%(?:[0-9]+\$)?[+-]?(?:[ 0]|\'.)?-?[0-9]*(?:\.[0-9]+)?[bcdeufFosxX])/';       
    $escapeString = '<span class="notranslate">$0</span>';
    return preg_replace($pattern, $escapeString, $original);
}

/**
 * Converts any Google Translate suitable string to a sprintf string.
 */
function _fromTranslateSafeString($translated)
{
    $escapePattern = '/<span class="notranslate">([^<]*)<\/span>/';
    return preg_replace($escapePattern, '$1', $translated);
}


Have you restructured your program to use the msgcat package to handle the strings yet? The documentation for it covers most of the salient points, including how to handle varying order of replacement. The only vaguely tricky bit is that you'll need to handle the way that % symbols get moved around; if the amount of text being processed is small enough, you could even do that by hand or with a little mechanical assistance (vi, emacs and eclipse can all do the sort of match/replace required; other editors probably can too, but I don't use those).


I would recommend translating each part of the string individually, and then adding the c tokens. You may get less accurate translation, but that's the risk in using automated translators.

And there are always beta testers :)

Or better idea: change %d to an arbitrary integer, %s to an arbitrary Latin string that will not get translated by Google (using a rare family name usually do the trick), %d to an arbitrary number, etc.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜