I got a Ruby On Rails project and a MySQL table/column with utf8 character set (collation utf8_unicode_ci), and I want to keep it this way for now.I wrote some code to clean invalid characters from strings before saving the records to the DB and avoid "Invalid string value" errors:
value.each_char.select{|c| c.bytes.count < 4 }.join('')
This worked fine with emojis. However, I found a case where the character is 3 bytes and the db still crashes:\xE2\x86\x92
, which is the character →
I also tried this: "→".force_encoding("UTF-8")
but it returns the string unchanged.
How can I detect these invalid characters using ruby?I use ruby 1.9.3,