Lucene Search with Unicode Characters
I have indexed a database of some texts and the database texts are of Unicode encoding. When I search for an English word with Lucene search everything goes OK. But when I use a non-English quer开发者_StackOverflowy like "تو" it gives me the following exception:
Exception in thread "main" org.apache.lucene.queryParser.ParseException: Cannot parse '??': '*' or '?' not allowed as the first character in WildcardQuery
at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:187)
at Search.main(Search.java:151)
Caused by: org.apache.lucene.queryParser.ParseException: '*' or '?' not allowed as first character in WildcardQuery
at org.apache.lucene.queryParser.QueryParser.getWildcardQuery(QueryParser.java:923)
at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1347)
at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1250)
at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1178)
at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1167)
at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:182)
... 1 more
What should I do?
Thank you.
Two points here -
- What is the encoding type of your source file (*.java). Make sure it is UTF-8
- The default encoding of Java is likely to be something other than utf8. Make sure you specify the encoding like:
InputStreamReader( new FileInputStream(filename), "UTF-8");`
精彩评论