
Accented characters appear weird (have their accents displayed after the character) after pasting

Not sure if this is the appropriate place to ask or doctype, but i'm gonna ask anyway... B开发者_运维百科een working with a german client, and this very wierd problem started showing up..

So while populating the content for the website, I copy/paste from their pdf sheet into my editor (Espresso). The wierd thing is while all the text looks pristine on the editor, upon opening in the browser, we start getting wierd anomalies with the accented character's accents, being pushed forward. So a ' Ö ' shows up as O" , and so on.

I thought it was some unicode problem, but the site is declared as utf8 , and there is no rich text or anything being entered, its just raw text from the editor. So it is really mind boggling. If the client edits the files directly, its appears correctly. And I found out if I type and replace the accented characters manually, it's fine too.

Anyone had similar experiences / solutions ?

I'd have though there shouldn't be a localization/font problem since these are essentially latin characters ? (correct me if i'm wrong)

You can't reliably cut and paste from a PDF. The internal format is not as it appears. :-)

PDFs may use special encodings internally to make the printed page look correct, but that does not mean that you can copy paste.

Here is a StackExchange question that has a little bit of background: https://tex.stackexchange.com/questions/22213/how-to-get-accented-unicode-characters-that-can-be-copy-pasted.

While it's not quite the same question as yours, it does show that how you make the PDF does matter. It is possible that some Latin-1 (accented) characters are encoded within the PDF not as characters with the expected Unicode codepoint(s) but with drawing instructions to make the character appear correctly.

Perhaps this product can help you. I have not used it, so cannot recommend it, but a little searching may yield something you can use. (This one claims to support German.)





验证码 换一张
取 消

