开发者

Converting MS word "curly" quotes and apostrophes

How do I 开发者_运维百科convert the MS Word quotes and apostrophes to regular quotes and apostrophes characters in Java? What's the unicode number for these characters?

“how are you doing?”

‘howdy’

Since Stack Overflow autofixes them, here's how they appear in an editor

Converting MS word "curly" quotes and apostrophes

to

"how are you doing?"

'howdy'


Going off Thomas's answer, the code is:

return text.replaceAll("[\\u2018\\u2019]", "'")
           .replaceAll("[\\u201C\\u201D]", "\"");


Here's a very useful link for everyone dealing with Unicode: Unicode codepoint lookup/search tool.

Searching for "quotation mark" gives

‘ (U+2018) LEFT SINGLE QUOTATION MARK
’ (U+2019) RIGHT SINGLE QUOTATION MARK
“ (U+201C) LEFT DOUBLE QUOTATION MARK
” (U+201D) RIGHT DOUBLE QUOTATION MARK

There are several other quote-like symbols that you might consider replacing.


Thank to Nick van Esch at C# How to replace Microsoft's Smart Quotes with straight quotation marks?

Here is the code ('\u2019' is ’ in MS Word), it's useful because it covers problematic word characters.

if (buffer.IndexOf('\u2013') > -1) buffer = buffer.Replace('\u2013', '-');
if (buffer.IndexOf('\u2014') > -1) buffer = buffer.Replace('\u2014', '-');
if (buffer.IndexOf('\u2015') > -1) buffer = buffer.Replace('\u2015', '-');
if (buffer.IndexOf('\u2017') > -1) buffer = buffer.Replace('\u2017', '_');
if (buffer.IndexOf('\u2018') > -1) buffer = buffer.Replace('\u2018', '\'');
if (buffer.IndexOf('\u2019') > -1) buffer = buffer.Replace('\u2019', '\'');
if (buffer.IndexOf('\u201a') > -1) buffer = buffer.Replace('\u201a', ',');
if (buffer.IndexOf('\u201b') > -1) buffer = buffer.Replace('\u201b', '\'');
if (buffer.IndexOf('\u201c') > -1) buffer = buffer.Replace('\u201c', '\"');
if (buffer.IndexOf('\u201d') > -1) buffer = buffer.Replace('\u201d', '\"');
if (buffer.IndexOf('\u201e') > -1) buffer = buffer.Replace('\u201e', '\"');
if (buffer.IndexOf('\u2026') > -1) buffer = buffer.Replace("\u2026", "...");
if (buffer.IndexOf('\u2032') > -1) buffer = buffer.Replace('\u2032', '\'');
if (buffer.IndexOf('\u2033') > -1) buffer = buffer.Replace('\u2033', '\"');
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜