Converting MS word "curly" quotes and apostrophes
How do I 开发者_运维百科convert the MS Word quotes and apostrophes to regular quotes and apostrophes characters in Java? What's the unicode number for these characters?
“how are you doing?”
‘howdy’
Since Stack Overflow autofixes them, here's how they appear in an editor
to
"how are you doing?"
'howdy'
Going off Thomas's answer, the code is:
return text.replaceAll("[\\u2018\\u2019]", "'")
.replaceAll("[\\u201C\\u201D]", "\"");
Here's a very useful link for everyone dealing with Unicode: Unicode codepoint lookup/search tool.
Searching for "quotation mark" gives
‘ (U+2018) LEFT SINGLE QUOTATION MARK
’ (U+2019) RIGHT SINGLE QUOTATION MARK
“ (U+201C) LEFT DOUBLE QUOTATION MARK
” (U+201D) RIGHT DOUBLE QUOTATION MARK
There are several other quote-like symbols that you might consider replacing.
Thank to Nick van Esch at C# How to replace Microsoft's Smart Quotes with straight quotation marks?
Here is the code ('\u2019' is ’ in MS Word), it's useful because it covers problematic word characters.
if (buffer.IndexOf('\u2013') > -1) buffer = buffer.Replace('\u2013', '-');
if (buffer.IndexOf('\u2014') > -1) buffer = buffer.Replace('\u2014', '-');
if (buffer.IndexOf('\u2015') > -1) buffer = buffer.Replace('\u2015', '-');
if (buffer.IndexOf('\u2017') > -1) buffer = buffer.Replace('\u2017', '_');
if (buffer.IndexOf('\u2018') > -1) buffer = buffer.Replace('\u2018', '\'');
if (buffer.IndexOf('\u2019') > -1) buffer = buffer.Replace('\u2019', '\'');
if (buffer.IndexOf('\u201a') > -1) buffer = buffer.Replace('\u201a', ',');
if (buffer.IndexOf('\u201b') > -1) buffer = buffer.Replace('\u201b', '\'');
if (buffer.IndexOf('\u201c') > -1) buffer = buffer.Replace('\u201c', '\"');
if (buffer.IndexOf('\u201d') > -1) buffer = buffer.Replace('\u201d', '\"');
if (buffer.IndexOf('\u201e') > -1) buffer = buffer.Replace('\u201e', '\"');
if (buffer.IndexOf('\u2026') > -1) buffer = buffer.Replace("\u2026", "...");
if (buffer.IndexOf('\u2032') > -1) buffer = buffer.Replace('\u2032', '\'');
if (buffer.IndexOf('\u2033') > -1) buffer = buffer.Replace('\u2033', '\"');
精彩评论