Quantcast
Channel: Active questions tagged utf-8 - Stack Overflow
Viewing all articles
Browse latest Browse all 1060

Encoding problem on MySQL: Why some non-ASCII characters get encoded on more than 4 bytes?

$
0
0

I've encountered a problem using MySQL on Docker. When I directly insert non-ASCII characters in the database using the initialization sql script, the characters are correctly shown on MySQL's console, but their encodings are wrong.

I coded a MySQL container with a minimal sql script to reproduce the problem.

Here's the structure of my directory:

.├── docker-compose.yml└── my-sql├── Dockerfile└── ddl└── mySQL.sql

docker-compose.yml

version: '3.8'services:  mysql:     build:       context: ./my-sql/       dockerfile: Dockerfile     container_name: mysql     expose:       - 3306     ports:       - 3306:3306     environment:       MYSQL_ROOT_PASSWORD: test       MYSQL_USER: test       MYSQL_PASSWORD: test       MYSQL_DATABASE: test     volumes:       - "mysql:/var/lib/mysql"volumes: mysql:

Dockerfile

FROM mysql:8.2.0USER 999:999COPY ./ddl /docker-entrypoint-initdb.d/

mySQL.sql

CREATE TABLE IF NOT EXISTS `test`(    `test` VARCHAR(100)) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;INSERT INTO `test` (`test`) VALUES ("🤮");INSERT INTO `test` (`test`) VALUES ("平");ALTER DATABASE test CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; # Does not work

When I use MySQL using the console and that I select the test column of the test table, I get this:

mysql> SELECT `test` FROM `test`;+------+| test |+------+| 🤮 || 平  |+------+
mysql> SELECT HEX(`test`) FROM `test`;+------------------+| HEX(`test`)      |+------------------+| C3B0C5B8C2A4C2AE || C3A5C2B9C2B3     |+------------------+

I did some research to find the correct encoding of these characters in various encodings and I didn't see these encodings, and as far as my knowledges go, maximum size for UTF-8 character is 4 bytes. I also noticed that the "|" alignment of what MySQL prints is wrong (and proportionally wronger to the size of the character hexadecimal encoding).

I looked at the hexadecimal encoding of the .sql script using VScode and, at least, the emoji is correctly encoded (F0 9F A4 AE).

I also tried another MySQL version (8.0.36), but it still doesn't work.

Thanks in advance


Viewing all articles
Browse latest Browse all 1060

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>