Quantcast
Channel: Active questions tagged utf-8 - Stack Overflow
Viewing all articles
Browse latest Browse all 1135

Java/terminal issues when getting and printing out utf-8 encoded strings

$
0
0

I've wanted to make an utf 8 supported chatroom, but it didn't work only for some specific utf 8 characters, so after spending a week in pure frustration, i narrowed it down to something being wrong with my user input handling, i asked chatgpt too and read countless forums but i couldn't figure it out.

I'm on Windows, I use vscode, updated version, the terminal there uses utf8 encoding i checked with chcp - returns 65001, same goes for cmd, so i don't think its a problem with the terminal, i tried brute forcing java System.out to be in utf-8 didn't fix it

System.setOut(new PrintStream(System.out, true, StandardCharsets.UTF_8));

I don't have a problem when i have preset a String containing utf8 encoded characters and print that outfor example:

String random = "háló";System.out.println(random);

returns: háló`

I've tried Scanner, BufferedReader, InputStreamReader, converting to bytes

import java.io.BufferedReader;import java.io.IOException;import java.io.InputStreamReader;import java.nio.charset.StandardCharsets;import java.util.Arrays;public class Main {    public static void main(String[] args) {        try {            BufferedReader reader = new BufferedReader(                    new InputStreamReader(System.in, StandardCharsets.UTF_8)            );            System.out.println("Enter some text (UTF-8 characters supported):");            String userInput = reader.readLine();            // Print the user input to verify            System.out.println("You entered: " + userInput);            byte[] bytes = userInput.getBytes(StandardCharsets.UTF_8);            System.out.println(Arrays.toString(bytes));        } catch (IOException e) {            e.printStackTrace();        }    }}

returns:

Enter some text (UTF-8 characters supported):hálóYou entered: hl[104, 0, 108, 0]

note: im a beginner java dev, coming from python

EDIT:New findings/ mentioning previously left out things:so i could run a python chatroom before that was able to use utf 8 encoded characters and print them out correctly, as I mentioned in java I was able to print them out too if i preset them to a variable, but not when i return them from user input

I downloaded intellij and tried out more terminals, and found out it worked correctly in intellij and in bash (with windows subsystem for linux):

Intellij:

Enter some text (UTF-8 characters supported):hálóYou entered: háló[104, -61, -95, 108, -61, -77]

bash:

root@DESKTOP:/mnt/x/javaProjects/UTF 8 SUFFERING# java MainEnter some text (UTF-8 characters supported):hálóYou entered: háló[104, -61, -95, 108, -61, -77]root@DESKTOP:/mnt/x/javaProjects/UTF 8 SUFFERING# java -versionopenjdk version "17.0.11" 2024-04-16OpenJDK Runtime Environment (build 17.0.11+9-Ubuntu-122.04.1)OpenJDK 64-Bit Server VM (build 17.0.11+9-Ubuntu-122.04.1, mixed mode, sharing)

it is interesting that it prints out a different byte array.Now i tried cmd, powershell, git bash, vscode terminal and those don't work.I have also tried different fonts as suggested by SedJ601and it was interesting because by default i used consolas on cmd which supports utf8 characters (so it recognised characters like "á" and displayed them correctly, but when java returned it from user input it didn't work):

Enter some text (UTF-8 characters supported):hálóYou entered: h l[104, 0, 108, 0]

but when i tried different font for example SimSun-ExtB I got different byte arrays and results:

Enter some text (UTF-8 characters supported):hálóYou entered: h�l�[104, -17, -65, -67, 108, -17, -65, -67]

so there is something wrong with how java and my terminal interacts

I have recent version of java

X:\javaProjects\UTF 8 SUFFERING>java -versionPicked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8openjdk version "17.0.11" 2024-04-16OpenJDK Runtime Environment Temurin-17.0.11+9 (build 17.0.11+9)OpenJDK 64-Bit Server VM Temurin-17.0.11+9 (build 17.0.11+9, mixed mode, sharing)

i tried setting enviorment variables which didn't fix anything:

set JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF-8set JAVA_OPTS=-Dfile.encoding=UTF-8

and java is set to utf 8, so im even more confused:

X:\javaProjects\UTF 8 SUFFERING>java -XshowSettings:properties -versionPicked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8Property settings:    file.encoding = UTF-8

maybe java and my terminals use a different lookup table for special characters, or i have no idea. I even changed the system locale to my country, that didn't help either i'm clueless any help would be appreciated!


Viewing all articles
Browse latest Browse all 1135

Trending Articles


FLASHBACK WITH SIRASA FM AT GALGAMUWA 2022


Mp3 Download: Mdu - Mazola


Imitation gun was fired at motorist in Leicester road-rage incident


Ndebele names


MCKINNEY EMALINE “EMMA” OF WES...


Okra & Motia — The Workshop (Prod by Hammer)


Skint TV teen to be sentenced


Moondru Mudichu 19-09-2017 – Polimer tv Serial


YOSVANI JAMES Arrested by Miami-Dade County Corrections on Jan 10, 2017


Stories • Goddess Stepmom



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>