
Ignoring apostrophes in sphinx indexes

In my sphinx config file, I have the following:

ignore_chars: "U+0027"
charset_table: "0..9, a..z, _, A..Z->a..z, U+00C0->a, U+00C1->a,
  U+00C2->a, U+00C3->a, U+00C4->a, U+00C5->a, U+00C7->c, U+00C8->e,
  U+00C9->e, U+00CA->e, U+00CB->e, U+00CC->i, U+00CD->i, U+00CE->i [SNIP]"

(The charset_table entry is from here: http://speeple.com/unicode-maps.txt)

The expected result is that querying kyles will return all records matching kyles and/or kyle's, since I'm telling sphinx to exclude ' (single quote/apos) from the index (ab'cd -> abcd). However, in practice, this is not happening.

I believe adding it to the ignore_chars has the opposite of the desired effect. This is telling sphinx not to split on that character, but instead it will collapse the word around the characters to be ignored. So, kyle's will become kyles instead of kyle and s.

The solution I just tried for this issue that seems to have worked was to add s to my list of stopwords (might need 's in there also, can't remember). Sphinx seems to split kyle's up into the words kyle and 's. Because match all mode is on, some documents fail on the match for 's. Adding it to the stop words seems to have the desired effect.

It seems like the normal stemming should take care of this however, so maybe we're both doing something wrong...





验证码 换一张
取 消

