A different approach to translation of web applications
I have done a few websites and applications with support for multiple languages where I've used language x开发者_Go百科ml files and a keyword inside my code. For my web app, I believe that method sucks. I love to read and understand my HTML code. This code doesn't make any sense:
<h1><?= translate('main_headline'); ?></h1>
It looks good at the beginning but just ends up with unhappy programmers since the process of adding new features now requires putting stuff in XMLs all the time.
My solution (I just wrote a simple test PHP parser an hour ago, do you think this will work on a large scale project?)
I'm using english as my base language and this is how my source file looks:
<h1>{{ I love colors }}</h1>
My parser will use the text (a real text) as the key for a dictionary array. In this example, I'm translating from US english to UK english.
$dictionary['I love colors']['en_GB'] = 'I love colours'
No dictionary is required for my base language since it's already in the source file.
There is a lot more to it like cache, fallbacks, dictionary storage etc. Do you think it will work in a large scale projects? It there something I haven't considered?
One flaw with this is that two different parts of the application may require different translations for the same word/phrase. The most obvious example is a homograph, e.g. "close" (nearby) and "close (shut), but there are other possibilities too.
A contrived phrase example is:
In one part, "I love my colors" just refers to literal colors.
In another, it means "I love my flag"
Should:
$dictionary['I love my colors']['es_ES']
be "Me encantan mis colores" or "Me encanta mi bandera". It has to somehow be both.
That is why either a unique ID or line number is typically used in the message catalog.
Some considerations and ideas.
Keep phrase reuse to a minimum. My experience is that makes the maintaining of the translations a lot easier.
The syntax must be language agnostic, since you'll be likely to translate PHP, JS, HTML etc, with their own file types. In other words, not just PHP templates need parsing,
.js
files will probably contain texts too.{{ <img src="heading-en.png" alt="Heading" /> }} alert('{{ some text }}');
The
alert
example above would break if the translation text contained a'
, should be handled somehow.You must allow for variable data in the translations in some way. Please consider the example below.
{{ <?= $num ?> apples cost <span class="price"><?= $price ?></span> with <?= $discount ?>% discount }}
This would probably not work well, or at least not allow for variable name changing or inline expressions. The example below would be better.
{{ %num% apples cost <span class="price">%price%</span> with %discount%% discount } num:<?= $num ?> , price$:<?= $price*$discount ?> , discount:<?= round($discount*100) ?> }
..where price$ could imply that it is a price, and converted to the correct currency.
Currency should be handled.
Just a couple of things that sprang to mind. Good luck ;-)
Yes that's a good approach.
We use something like: ||4332||I love colors||
Then you can just parse your file, extract all the IDs (4332) and look up the translation in the database.
EDIT: other's responses are now better than mine :)
I don't know of any problem with the second option (but I also have no experience working with I18N).
The only potential problem I see with the second option is reversibility. If someone goes back and changes the text to say "I like colors", then someone has to always make sure that they go back to the translations, and change the english key. However, the fact that you're doing i18n makes me assume that there already is somebody whose job is to deal with tedium of translation, so I wouldn't forsee that as a problem.
I prefer to use made up tags for this sort of thing, that way I can include notes on context and meaning and such things are of great use to translators. For example:
<h1><l10n id="blah" notes="This is a header for a section on blah blah, title case">Blah Blah</l10n></h1>
Similarly, you can use made up attributes for alt
and title
text.
You need to be careful about the different contexts (HTML, <script>
, <style>
, PHP, various template languages, ...) though. You also need to be careful about word order and gender issues but those are standard L10N problems.
Then, you can preprocess all the translated files into separate directories (one per language) and avoid the overhead of producing translations on the fly.
精彩评论