Natural Language Processing - Word Alignment

2022-12-22 20:00 问答作者：

I am looking for word alignment tools and algorithms.

I am dealing with bilingu开发者_JAVA技巧al English - Hindi text, and currently working on

DTW (Dynamic Time Warping) algorithm
CLA (Competitive Linking Algorithm)
NATools
Giza++

Could you please suggest any other algorithm/tool which is language independent and which could achieve Statistical word alignment for parallel English Hindi Corpora and its evaluation.

Some tools are best for certain languages; could you please tell me how true that is and, if so, could you please provide an example of what would be better suited for Asian languages like Hindi. Counter-examples of what one shouldn't I use for such languages is also welcome.

I have heard a bit about Uplug word aligner... Could someone tell me if this tool is useful for my purpose.

Thank you.. :)

The Berkeley Aligner is very good. By doing joint training of the IBM word alignment models, it's able to get a much lower alignment error rate (AER) than older packages like GIZA++.

It also supports some more advanced features such as syntactic distortion (i.e., using parse tree information to get better alignments). For this, you'll only need parse trees for one of the language pairs. So, you should be okay doing Hindi<->English, since there are plenty of freely available and good English parsers.

If you decide not to go with the Berkeley Aligner, you should probably just use GIZA++. For years, it has been essentially the standard word aligner in the machine translation community.

Uplug is a great tool, I have been using it for aligning English<->Macedonian texts. It essentially builds on the Giza++ by adding the so-called clue alignments. It's advanced setting actually combines the the clue alignments and Giza++ and performs 3 such iterations. The more clues (pos-tags, lemmas ... ) you provide better the results will be. But I have to mention that you should not expect to get fundamentally different results then by just using Giza++.

Anyway, if you plan to seriously study the topic of SMT, I suggest that you read the paper (phd thesis) about Uplug, it will be very beneficial for you.

Moses is a statistical machine translation suite you might want to take a look at. Its word alignment component is built on GIZA++ but may be tweaked to work better with certain language pairs than pure GIZA++. Their mailing list and the resources you can find on http://www.statmt.org/ may also be a better place to ask questions on this topic than SO. One thing you didn't say anything about but which I would consider even more problematic is where to get a parallel corpus Hindi <-> English.

You have a vague and broad question.

Try: http://scholar.google.com/scholar?q=algorithm+language+independent+statistical+word+alignment&hl=en&safe=off&client=firefox-a&hs=hJt&rls=com.ubuntu:en-US:official&um=1&ie=UTF-8&oi=scholart

for a list of papers in this area.

继续阅读：alignment linguistics

Natural Language Processing - Word Alignment

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？