Quantcast
Channel: Active questions tagged utf-8 - Stack Overflow
Viewing all articles
Browse latest Browse all 1039

Korean characters encoded in utf-8 that look the same(using print) at jupyter notebook, but are actually different [duplicate]

$
0
0

I have a local folder called '안녕'

os.listdir("/Users/mac/Desktop/folder1")

['.DS_Store', '안녕']

I tried to write the folder name '안녕' to an image by Pillow, but the font turned out to be broken.broken font

After a few tryouts, i realized the problem was that the '안녕' in ['.DS_Store', '안녕'] is different from '안녕' by manually typing it.

if '안녕' != '안녕':    #i typed the left '안녕', and copied the right '안녕' from ['.DS_Store', '안녕']    print("they are different!")

they are different!

Q: why are they different??

What I've tried:

  1. I first thought they might have different encodings -> they both used utf-8
import chardetprint(chardet.detect('안녕'.encode()))print('from os.listdir\n')print(chardet.detect('안녕'.encode()))print('manually typed')

{'encoding': 'utf-8', 'confidence': 0.7525, 'language': ''}from os.listdir

{'encoding': 'utf-8', 'confidence': 0.99, 'language': ''}manually typed

  1. -> but they had different encodings
print('안녕'.encode())print('manually typed\n\n')print('안녕'.encode())print('from os.listdir')

b'\xec\x95\x88\xeb\x85\x95'manually typed

b'\xe1\x84\x8b\xe1\x85\xa1\xe1\x86\xab\xe1\x84\x82\xe1\x85\xa7\xe1\x8>6\xbc'from os.listdir


Viewing all articles
Browse latest Browse all 1039

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>