Quantcast
Channel: Active questions tagged utf-8 - Stack Overflow
Viewing all articles
Browse latest Browse all 1060

Why does Python3 use 'surrogatepass' file-system error handler on Windows?

$
0
0

For Python 3 (3.8 and previous versions back to 3.6), surrogatepass is the default error handler.

This can cause problems for users with file-paths that don't match this encoding.

Why does windows use surrogatepass instead of surrogateescape as other platforms do (Linux, macOS), which can handle these bytes. eg:

>>> import sys>>> sys.getfilesystemencoding(), sys.getfilesystemencodeerrors()('utf-8', 'surrogateescape')>>>>>> # This raises an error:>>>>>> b'C:\\Users\\me\\OneDrive\\\xe0\xcd\xa1\xca\xd2\xc3\\my.txt'.decode('utf-8', errors="surrogatepass")Traceback (most recent call last):  File "<stdin>", line 1, in <module>UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0 in position 24: invalid continuation byte>>> # Compared to:>>> b'C:\\Users\\me\\OneDrive\\\xe0\xcd\xa1\xca\xd2\xc3\\my.txt'.decode('utf-8', errors="surrogateescape")'C:\\Users\\me\\OneDrive\\\udce0͡\udcca\udcd2\udcc3\\my.txt'

Note, at a guess I would assume this might be necessary because the underlying NTFS filesystem uses UTF-16 instead of null terminated bytes, requiring some constraints on Python's filesystem encoding not present on Linux/macOS.


Viewing all articles
Browse latest Browse all 1060

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>