开发者

Lucene Search with Unicode Characters

I have indexed a database of some texts and the database texts are of Unicode encoding. When I search for an English word with Lucene search everything goes OK. But when I use a non-English quer开发者_StackOverflowy like "تو" it gives me the following exception:

Exception in thread "main" org.apache.lucene.queryParser.ParseException: Cannot parse '??': '*' or '?' not allowed as the first character in WildcardQuery
        at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:187)
        at Search.main(Search.java:151)
Caused by: org.apache.lucene.queryParser.ParseException: '*' or '?' not allowed as first character in WildcardQuery
        at org.apache.lucene.queryParser.QueryParser.getWildcardQuery(QueryParser.java:923)
        at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1347)
        at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1250)
        at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1178)
        at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1167)
        at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:182)
        ... 1 more

What should I do?

Thank you.


Two points here -

  • What is the encoding type of your source file (*.java). Make sure it is UTF-8
  • The default encoding of Java is likely to be something other than utf8. Make sure you specify the encoding like:

    InputStreamReader( new FileInputStream(filename), "UTF-8");`

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜