nutch crawler is crawling ' as â€
nutch crawler is crawling let's
as Let’s
y??? is there is 开发者_StackOverflowany setting to change the this charset..
’
is the UTF-8 encoding of the single closing quote (not the apostrophe), and you're interpreting it as Windows-1252. You need to use the right encoding (UTF-8). This link may help.
I haven't used Nutch myself, but this page looks like it's relevant:
To enable passing of UTF-8 characters, edit $TOMCAT/conf/server.xml. Locate the <Connector> tag for the web (look for "8080") and insert this parameter assignment: URIEncoding="UTF-8" as explained in Tomcat 5 FAQ at http://tomcat.apache.org/faq/connectors.html#utf8
精彩评论