开发者

OpenOffice Hyphenation algorithm - what does the parameters mean?

I am looking the hyphenation aglorithm downloaded from the OpenOffice site, but I couldn't understand what the parameter rep, pos, and cut are for after looking at the comment. Could someone with开发者_开发百科 the knowledge tell me what these parameters do? Here are the comments.

From the example, it seems like it's saying ff can be replaced with a single f, but what does that have to do with hyphenation?

Thanks,


/*

int hnj_hyphen_hyphenate2(): non-standard hyphenation.

(It supports Catalan, Dutch, German, Hungarian, Norwegian, Swedish etc. orthography, see documentation.)

input data: word: input word word_size: byte length of the input word

hyphens: allocated character buffer (size = word_size + 5) hyphenated_word: allocated character buffer (size ~ word_size * 2) or NULL rep, pos, cut: pointers (point to the allocated and zeroed buffers (size=word_size) or with NULL value) or NULL

output data: hyphens: hyphenation vector (hyphenation points signed with odd numbers) hyphenated_word: hyphenated input word (hyphens signed with ='), optional (NULL input) rep: NULL (only standard hyph.), or replacements (hyphenation points signed with=' in replacements); pos: NULL, or difference of the actual position and the beginning positions of the change in input words; cut: NULL, or counts of the removed characters of the original words at hyphenation,

Note: rep, pos, cut are complementary arrays to the hyphens, indexed with the character positions of the input word.

For example: Schiffahrt -> Schiff=fahrt, pattern: f1f/ff=f,1,2 output: rep[5]="ff=f", pos[5] = 1, cut[5] = 2

Note: hnj_hyphen_hyphenate2() can allocate rep, pos, cut (word_size length arrays):

char ** rep = NULL; int * pos = NULL; int * cut = NULL; char hyphens[MAXWORDLEN]; hnj_hyphen_hyphenate2(dict, "example", 7, hyphens, NULL, &rep, &pos, &cut);

See example in the source distribution.

*/

int hnj_hyphen_hyphenate2 (HyphenDict *dict, const char *word, int word_size, char * hyphens, char *hyphenated_word, char * rep, int ** pos, int ** cut);


I believe you are referring to the following comment:

// For example:
//  Schiffahrt -> Schiff=fahrt,
//  pattern: f1f/ff=f,1,2
//  output: rep[5]="ff=f", pos[5] = 1, cut[5] = 2

The example refers to German hyphenation rules as they were before the spelling reform from the 1990ies. Compound nouns in German are written as one word and according to the old rules the third consonant such as the 'f' in the word 'Schifffahrt' (constisting of 'Schiff' and 'Fahrt') was omitted in case that a vowel is following ('Schifffahrt' was written as 'Schiffahrt'), but the omitted letter was still written when hyphenating.

So the meaning of the example is not that 'ff' can be replaced with a single 'f', but rather that 'ff' can be replaced with 'ff-f'.

The meaning of the parameters therefore would be:

  • rep: contains the replacement 'ff-f' which is used instead of 'ff'
  • pos: a value of 1 means that the replacement starts one letter before the hyphenation posistion of 5
  • cut: a value of 2 means that 2 characters need to be removed from the input word.

These parameters only seem to be used for the rare case that a word is spelled differently when hyphenated.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜