Segmentation rules for non latin based languages like Chinese, Japanese
While exploring globalsight.com ,I came across the segmentation rules(link).It uses full stop(.) as a language delimiter. which segmentaion rules can we use for segment the non latin based Languages for which a dot(.) mean something other than a delimiter or for the languages which don't have any delimite开发者_开发知识库rs Example –Chinese,Japanese and Korean
What are the language segmentation rules used for these “non latin”(Chinese,Japanese) languages? How are the segmentation rules developed ?
Thanks in advance, Manjushree
Japanese uses kinsoku shori. Not sure about the other two though.
Trados, the leading translation memory application, uses the following segmentation rules:
For Japanese and Chinese:
Full Stop: 。
Colons: ::
Punctuation: ?!?!
精彩评论