smarter character replacement using ruby gsub and regexp
I'm trying to create permalink like behavior for some article titles and i don't want to add a new db field for permalink. So i decided to write a helper that will convert my article title from:
"O "focoasă" a pornit cruciada, împotriva bărbaţilor zgârciţi" to "o-focoasa-a-pornit-cruciada-impotriva-开发者_开发百科barbatilor-zgarciti".
While i figured out how to replace spaces with hyphens and remove other special characters (other than -) using:
title.gsub(/\s/, "-").gsub(/[^\w-]/, '').downcase
I am wondering if there is any other way to replace a character with a specific other character from only one .gsub method call, so I won't have to chain title.gsub("ă", "a") methods for all the UTF-8 special characters of my localization.
I was thinking of building a hash with all the special characters and their counterparts but I haven't figured out yet how to use variables with regexps.
What I was looking for is something like:
title.gsub(/\s/, "-").gsub(*replace character goes here*).gsub(/[^\w-]/, '').downcase
Thanks!
I solved this in my application by using the Unidecoder gem:
require 'unidecode'
def uninternationalize(str)
Unidecoder.decode(str).gsub("[?]", "").gsub(/`/, "'").strip
end
If you want to only transliterate from one character to another, you can use the String#tr
method which does exactly the same thing as the Unix tr
command: replace every character in the first list with the character in the same position in the second list:
'Ünicöde'.tr('ÄäÖöÜüß', 'AaOoUus') # => "Unicode"
However, I agree with @Daniel Vandersluis: it would probably be a good idea to use some more specialized library. Stuff like this can get really tedious, really fast. Also, a lot of those characters actually have standardized transliterations (ä → ae, ö → oe, ..., ß → ss), and users may be expecting to have the transliterations be correct (I certainly don't like being called Jorg – if you really must, you may call me Joerg but I very much prefer Jörg) and if you have a library that provides you with those transliterations, why not use them? Note that there are a lot of transliterations which are not single characters and thus can't be used with String#tr
anyway.
精彩评论