I'm parsing a file that contains both alpha strings and unicode/UTF-8 strings containing IPA pronunciations.
I want to be able to obtain the last character of a string, but sometimes those characters occupy two spaces, e.g.
syl = 'tyl' # plain ascii last_char = syl[-1] # last char is 'l' syl = 'tl̩' # contains IPA char last_char = syl[-1] # last char erroneously contains: '̩' which is a diacritical mark on the l # want the whole character 'l̩'
If I try using .decode() it fails with
'str' object has no attribute 'decode'
If I try to use .encode().decode(), I'm right back where I started where I just get the diacritical mark instead of the full character.
How to obtain the last character of the unicode/utf-8 string (when you don't know if it's ascii or unicode string)
I guess I could use a lookup table to known characters and if it fails, go back and grab syl[-2:]. Is there an easier way?
.....
In response to some comments, here is the complete list of IPA characters I've collected so far
a, b, d, e, f, f̩, g, h, i, i̩, i̬, j, k, l, l̩, m, n, n̩, o, p, r, s, s̩, t, t̩, t̬, u, v, w, x, z, æ, ð, ŋ, ɑ, ɑ̃, ɒ, ɔ, ə, ə:, ɚ, ɚ:, ɛ, ɜ, ɜ˞, ɝ, ɡ, ɪ, ɵ, ɹ, ɹ:, ɾ, ʃ, ʃ̩, ʊ, ʌ, ʒ, ʤ, θ, ∅,