开发者

Solr indexing HTML entities

I am indexing documents by Solr, which were scraped from the web. The documents contain HTML entities (such as £ or £). Mostly the do开发者_StackOverflow中文版cuments contain central european characters. Is there any charfilter for this task? I know solr.MappingCharFilterFactory, but using this would mean, that I have to define the mappings myself. I would be happier with a shared solution maintained by a community. Thanks for your help!


There is solr.HTMLStripCharFilterFactory, which converts HTML entities, but it also strips HTML tags.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜