开发者

Translating LOCs

Has anybody come across a situation where an existing code-base written (say) in Java and written by (say) French programmers had to be converted to code that English 开发者_如何学JAVAspeaking programmers could understand? The problem here is that variable/method/class names, comments etc would all be in that particular language.

Is there any any automated solution available already?

(I used the word Translate in title but obviously I don't mean porting the code to any other programming language neither do I mean i18n.)

Regards,


Well this certainly is a non trivial task.

My first idea was

  • Get some tool (parser) which parses your source code into an XML file (or an AST)
  • Do the translations on that intermediate format as you e.g. can use XPATH in the XML file to find the comments, variable names, etc.
  • Then the tool of course must support reconverting the XML file to javasource code

Problems:

  • Bad translations (translation program has no domain knowledge, translation program almost surely isn't able to translate computer/programming terms correctly, acronyms, misstyped words, camelcase method names etc.)
  • You can't just blindly translate ideally you would need to refactor. As else you might end up with source code which isn't valid anymore because (a: the translation matches several words to a single translation which could end up with classes/variables/methods having the same etc.
  • How to determine what not to translate (e.g. java standard library class names and so)


And remember that class path names are very sensitive in Java (and some other languages), doing a global "find and replace", which is what this sounds like, would most probably break important aspects of the software.

For example, I worked with a product for many years that still kicked-out stacktraces that included the Java classpath names from previous iterations of the company name (the company had been acquired and/or spun-off at least 4x) - so seeing an error that included "iConclude" indicated parts of the code that dated from its first name iteration (though it didn't guarantee the code would be "old"). Likewise, seeing an error that included "Opsware" indicated parts of the code that dated from after the first acquisition (or, at least code added that was under the previous naming scheme).


I don't think you can simply run this through some sort of "translation" software to do a dictionary-based replacement of the variable names and comments. I'm afraid you'll either need a translation software that does parse Java to the extent when it can separate out the comments, variable names, class names and potentially the message and only then apply a dictionary-based translation. Even in that case I doubt that the result will be very appealing given that said software is most likely lacking the domain knowledge that you'd need to idiomatically translate the terms.

I'm afraid the only solution that is going to produce something useful is to engage a programmer who is fluent in both natural languages and is familiar the problem domain to rewrite the software. Everything else is likely to create a big mess.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜