Quantcast
Channel: Active questions tagged utf-8 - Stack Overflow
Viewing all articles
Browse latest Browse all 1074

Java/terminal issues when getting and printing out utf-8 encoded strings

$
0
0

I've wanted to make an utf 8 supported chatroom, but it didn't work only for some specific utf 8 characters, so after spending a week in pure frustration, i narrowed it down to something being wrong with my user input handling, i asked chatgpt too and read countless forums but i couldn't figure it out.

I'm on Windows, I use vscode, updated version, the terminal there uses utf8 encoding i checked with chcp - returns 65001, same goes for cmd, so i don't think its a problem with the terminal, i tried brute forcing java System.out to be in utf-8 didn't fix it

System.setOut(new PrintStream(System.out, true, StandardCharsets.UTF_8));

I don't have a problem when i have preset a String containing utf8 encoded characters and print that outfor example:

String random = "háló";System.out.println(random);

returns: háló`

I've tried Scanner, BufferedReader, InputStreamReader, converting to bytes

import java.io.BufferedReader;import java.io.IOException;import java.io.InputStreamReader;import java.nio.charset.StandardCharsets;import java.util.Arrays;public class Main {    public static void main(String[] args) {        try {            BufferedReader reader = new BufferedReader(                    new InputStreamReader(System.in, StandardCharsets.UTF_8)            );            System.out.println("Enter some text (UTF-8 characters supported):");            String userInput = reader.readLine();            // Print the user input to verify            System.out.println("You entered: " + userInput);            byte[] bytes = userInput.getBytes(StandardCharsets.UTF_8);            System.out.println(Arrays.toString(bytes));        } catch (IOException e) {            e.printStackTrace();        }    }}

returns:

Enter some text (UTF-8 characters supported):hálóYou entered: hl[104, 0, 108, 0]

note: im a beginner java dev, coming from python

EDIT:New findings/ mentioning previously left out things:so i could run a python chatroom before that was able to use utf 8 encoded characters and print them out correctly, as I mentioned in java I was able to print them out too if i preset them to a variable, but not when i return them from user input

I downloaded intellij and tried out more terminals, and found out it worked correctly in intellij and in bash (with windows subsystem for linux):

Intellij:

Enter some text (UTF-8 characters supported):hálóYou entered: háló[104, -61, -95, 108, -61, -77]

bash:

root@DESKTOP:/mnt/x/javaProjects/UTF 8 SUFFERING# java MainEnter some text (UTF-8 characters supported):hálóYou entered: háló[104, -61, -95, 108, -61, -77]root@DESKTOP:/mnt/x/javaProjects/UTF 8 SUFFERING# java -versionopenjdk version "17.0.11" 2024-04-16OpenJDK Runtime Environment (build 17.0.11+9-Ubuntu-122.04.1)OpenJDK 64-Bit Server VM (build 17.0.11+9-Ubuntu-122.04.1, mixed mode, sharing)

it is interesting that it prints out a different byte array.Now i tried cmd, powershell, git bash, vscode terminal and those don't work.I have also tried different fonts as suggested by SedJ601and it was interesting because by default i used consolas on cmd which supports utf8 characters (so it recognised characters like "á" and displayed them correctly, but when java returned it from user input it didn't work):

Enter some text (UTF-8 characters supported):hálóYou entered: h l[104, 0, 108, 0]

but when i tried different font for example SimSun-ExtB I got different byte arrays and results:

Enter some text (UTF-8 characters supported):hálóYou entered: h�l�[104, -17, -65, -67, 108, -17, -65, -67]

so there is something wrong with how java and my terminal interacts

I have recent version of java

X:\javaProjects\UTF 8 SUFFERING>java -versionPicked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8openjdk version "17.0.11" 2024-04-16OpenJDK Runtime Environment Temurin-17.0.11+9 (build 17.0.11+9)OpenJDK 64-Bit Server VM Temurin-17.0.11+9 (build 17.0.11+9, mixed mode, sharing)

i tried setting enviorment variables which didn't fix anything:

set JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF-8set JAVA_OPTS=-Dfile.encoding=UTF-8

and java is set to utf 8, so im even more confused:

X:\javaProjects\UTF 8 SUFFERING>java -XshowSettings:properties -versionPicked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8Property settings:    file.encoding = UTF-8

maybe java and my terminals use a different lookup table for special characters, or i have no idea. I even changed the system locale to my country, that didn't help either i'm clueless any help would be appreciated!


Viewing all articles
Browse latest Browse all 1074

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>