Looking for a TokenFilter
I am indexing some files written in spanish in Solr, and sometimes appears chars like ¿D é ....
I wonder if there is some TokenFilter to avoid this chars when the text has accent (á, é, í, ó...) or 开发者_如何转开发letter ñ.Thanks
I added it where every other filters are:
fieldType name="textTight" class="solr.TextField"
positionIncrementGap="100" >
analyzer>
tokenizer class="solr.WhitespaceTokenizerFactory"/>filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
.... !-- Filtro para quitar acentos y ñññ-->
charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/> ....
/analyzer>
/fieldType>
Of course I rebuild my index after that.
(I add this answer, because in the comment it wasn't clear enaugh)
If you need it for a latin language an easier solution is to use
solr.ASCIIFoldingFilterFactory
like in :
<fieldType name="text" class="solr.TextField" omitNorms="false">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="Romanian" />
</analyzer>
</fieldType>
see http://wiki.apache.org/solr/LanguageAnalysis for more advance usages.
精彩评论