Book translation data format

2023-02-20 14:20 问答作者：

I'm thinking of translating a book from English to my native language. I can translate just fine, and I'm happy with vim as a text editor. My problem is that I'd like to somehow preserve the semantics, i.e. which parts of my translation correspond to the original.

I could basically create a simple XML-based markup language, that'd look something like

<book>
  <chapter>
    <paragraph>
      <sentence>
        <original>This is an example sentence.</original>
        <translation lang="fi">Tämä on esimerkkilause.</translation>
      </sentence>
    </paragraph>
  </chapter>
</book>

Now, that would probably have its benefits but I don't think editing that would be very fun.

Another possibility that I can think of would be to keep the original and translation in separate files. If I 开发者_如何学编程add a newline after each translation chunk and keep line numbering consistent, editing would be easy and I'd be able to programmatically match the original and translation.

original.txt:
  This is an example sentence.
  In this format editing is easy.

translation-fi.txt:
  Tämä on esimerkkilause.
  Tässä muodossa muokkaaminen on helppoa.

However, this doesn't seem very robust. It would be easy to mess up. Probably someone has better ideas. Thus the question:

What would be the best data format for making a book translation with a text editor?

EDIT: added tag vim, since I'd prefer to do this with vim and believe that some vim guru might have ideas.

EDIT2: started a bounty on this. I'm currently leaning to the second idea I describe, but I hope to get something about as easy to edit (and quite easy to implement) but more robust.

One thought: if you keep each translatable chunk (one or more sentences) in its own line, vim's option scrollbind, cursorbind and a simple vertical split would help you keeping the chunks "synchronized". It looks very much like to what vimdiff does by default. The files should then have the same amount of lines and you don't even need to switch windows!

But, this isn't quite perfect because wrapped lines tend to mess up a little bit. If your translation wraps over two or three more virtual lines than the original text, the visual correlation fades as the lines aren't one-on-one anymore. I couldn't find a solution or a script for fixing that behavior.

Other suggestion I would propose is to interlace the translation into the original. This approaches the diff method of Benoit's suggestion. After the original is split up into chunks (one chunk per line), I would prepend a >> or similar on every line. A translation of one chunk would begin by o. The file would look like this:

  >> This is an example sentence.
  Tämä on esimerkkilause.
  >> In this format editing is easy.
  Tässä muodossa muokkaaminen on helppoa.

And I would enhance the readability by doing a :match Comment /^>>.*$/ or similar, whatever looks nice with your colorscheme. Probably it would be worthwhile to write a :syn region that disables spell checking for the original text. Finally, as a detail, I'd bind <C-j> to do 2j and <C-k> to 2k to allow easy jumping between the parts that matter.

Pros for this latter approach also include that you could wrap things in 80 columns if you feel like I do :) It would still be trivial to write <C-j/k> to jump between translations.

Cons: buffer-completion suffers as now it completes both original and translated words. English words don't hopefully occur in the translations that often! :) But this is as robust as it gets. A simple grep will peel the original text off after you are done.

Why not use a simplified diff format?

it is linewise which is suitable for whole sentences.
The first character is significant (space, special, + or -)
It will be quite compact
Maybe you needn't those @@ parts
Vim will support it and color the English sentence and the Finnish sentence in distinct colors.

Assuming you want to keep the 1 - 1 relationship between the original text and the translated text, a database table makes the most sense.

You'd have one table with the following columns:

id - Integer - Autonum
original_text - Text - Not null
translated_text - Text - Nullable

You'd need a process to load the original text, and a process to show you one line of the original text and allow you to type the translated text. Perhaps the second process could show you 5 lines (2 before, the line you want to translate, and 2 after) to give you context.

继续阅读：file-format translation vim

Book translation data format

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？