开发者

How to break page into 5000 chars piece for Google translation?

I am trying to translate a page using Google API. There is 5000 chars limit on data that you can send to Google at a time. So I am trying to break the page into pieces of 5000 chars. But as you know while doing this we have to keep in mind the HTML formatting that it should not be disturbed otherwise you will not get desired results.

For example you have to send this:

<a href="#" class="my开发者_运维知识库class">Link</a>

Instead of this:

<a href="#" class="myclas

I am able to solve it somehow (although not perfectly I guess) by checking if "<" this sign is coming after ">" this sign or not. If "<" this sign is coming after ">" then I go back to the point where I found ">" and cut string from there.

Anyway the point is I am still having some problems regarding HTML formatting and want to know how to do it efficiently. Is there any parser available that will solve this problem!?

Thanks


I had a very similar problem with a small automatic translation I had to do andI solved it by replacing all html expressions by small things like :

<a href="#" class="myclass">Link</a>

would become [0]link[0] and I'd store somewhere that [0] stands for a href.... To look for the HTML expressions, you should use regular expressions. That helped me that time, hope it helps you too.

David

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜