I am trying to create a 'normalized' copy of a string, to help reduce duplicate names in a database. The names contain many international characters (ie. accented letters), and I want to create a copy with the accents removed.
I did come across the method below, but cannot get it to work. I can't seem to find what the Unicode Hacks plugin is.
# Utility method that retursn an ASCIIfied, downcased, and sanitized string. # It relies on the Unicode Hacks plugin by means of String#chars. We assume # $KCODE is 'u' in environment.rb. By now we support a wide range of latin # accented letters, based on the Unicode Character Palette bundled inMacs. def self.normalize(str) n = str.chars.downcase.strip.to_s n.gsub!(/[ÃáâãäåÄÄ?]/u, 'a') n.gsub!(/æ/u, 'ae') n.gsub!(/[ÄÄ?]/u, 'd') n.gsub!(/[çÄ?ÄÄ?Ä?]/u, 'c') n.gsub!(/[èéêëÄ?Ä?Ä?Ä?Ä?]/u, 'e') n.gsub!(/Æ?/u, 'f') n.gsub!(/[ÄÄ?ġģ]/u, 'g') n.gsub!(/[ĥħ]/, 'h') n.gsub!(/[ììÃîïīĩÄ]/u, 'i') n.gsub!(/[įıijĵ]/u, 'j') n.gsub!(/[ķĸ]/u, 'k') n.gsub!(/[Å?ľĺļÅ?]/u, 'l') n.gsub!(/[ñÅ?Å?Å?Å?Å?]/u, 'n') n.gsub!(/[òóôõöøÅÅ?ÅÅ]/u, 'o') n.gsub!(/Å?/u, 'oe') n.gsub!(/Ä?/u, 'q') n.gsub!(/[Å?Å?Å?]/u, 'r') n.gsub!(/[Å?Å¡Å?ÅÈ?]/u, 's') n.gsub!(/[ťţŧÈ?]/u, 't') n.gsub!(/[ùúûüūůűÅũų]/u,'u') n.gsub!(/ŵ/u, 'w') n.gsub!(/[ýÿŷ]/u, 'y') n.gsub!(/[žżź]/u, 'z') n.gsub!(/\s+/, '') n.gsub!(/[^\sa-z0-9_-]/, '') n end
Do I need to 'require' a particular library/gem? Or maybe someone could recommend another way to go about this.
I am not using Rails, nor do I plan on doing so.