Quantcast
Channel: Active questions tagged utf-8 - Stack Overflow
Viewing all articles
Browse latest Browse all 1060

Recursively reading files in directories and adding the file names to a ListView results in question marks �

$
0
0

After investigating, I found out that the most performant way to read a directory and its sub-directories' files is using Files.walk() method.

private void selectPrimaryFolder(ResourceBundle resourceBundle, ListView<String> lvFileList) {        DirectoryChooser chooser = new DirectoryChooser();        chooser.setTitle(resourceBundle.getString("selectPrimaryFolder"));        File selectedDirectory = chooser.showDialog((Stage) btnSelectPrimaryFolder.getScene().getWindow());        System.out.println("SELECTED DIRECTORY: " + selectedDirectory.toPath());        // TODO        try (Stream<Path> stream = Files.walk(selectedDirectory.toPath()).sorted()) {            stream.forEach(path -> addFileToList(path.toFile(), lvFileList));        } catch (IOException ioException) {            ioException.getMessage();        }        /*System.out.println(listFileTree(selectedDirectory));        for (File f: listFileTree(selectedDirectory)){            lvFileList.getItems().add(f.getName());        }*/    }private static void addFileToList(File file, ListView<String> lvFileList) {        if (file.isDirectory()) {            System.out.println("Directory: " + file.getAbsolutePath());        } else {            System.out.println("File: " + file.getAbsolutePath());            lvFileList.getItems().add(file.getName());        }    }public static Collection<File> listFileTree(File dir) {        Set<File> fileTree = new HashSet<File>();        if(dir==null||dir.listFiles()==null){            return fileTree;        }        for (File entry : dir.listFiles()) {            if (entry.isFile()) fileTree.add(entry);            else fileTree.addAll(listFileTree(entry));        }        return fileTree;    }

Unfortunately, in the IntelliJ console, for any Cyrillic characters, it outputs ????, while in JavaFX's ListView, I get �����.

The listFileTree() method, which I found here (Recursively list files in Java), is another way I tried to get the list of files, but weirdly enough, it straight up ignores files that contain any Cyrillic characters.

According to a post here (both File.isFile() and File.isDirectory() is returning false), isFile() might be having an issue with the encoding.

The Encoding

From what I found, this might be an encoding issue.

I checked my IntelliJ's file enconding settings and they're all set to UTF-8 (it's also set to create UTF-8 files with no BOM), including the console's default encoding.

I've set both of these in the VMOptions:

-Dconsole.encoding=UTF-8-Dfile.encoding=UTF-8

Byte vs Character Streams

Something interesting I ran into is that if I use FileChooser and select a single file, the characters are properly displayed in the ListView (albeit in the console they still show as question marks).

As per the documentation, File.walk() returns a stream (of path), which I'm assuming is done via single byte stream (I admit, I got a bit lost in the documentation), because according to Ted Hopp's comment in Byte Stream vs Character Stream in Java, it should be able to read the Cyrillic if it was a "Character Stream" (assuming the original file is enconded in UTF-8, of course).

...To test this, try a file that contains something that requires morethan one byte to represent (such as Greek, Cyrillic, or Arabiccharacters). With a byte-oriented stream, these won't work. With acharacter-oriented stream, the characters will be preserved as long asboth the streams are using encodings that supports those characters(such as UTF-8) and the input file was stored in the encoding used forthe input stream...

Figuring out the original file encoding

I thought I'd just look for a way to detect the encoding and set up a condition, but after reading many, many, many answers, it turns out it's not something that can be done. The best option one has is to try and "guess" it. Which I did.

Using new String(bytes, charset) / String.getBytes() mentioned in Encoding conversion in java, I tested the most common ones, as per this question (What is the most common encoding of each language?), via File.walk().

NONE OF THEM WORKED! Every single one resulted in question marks, in the console and in the ListView.

I thought my files might be bugged but VLC doesn't seem to have any issues with encoding. All the text is presented as it should.

Question

How do I go about this? Is there a recursive character stream alternative to File.walk()? Have I missed something else?

P.S. I have not tried any non-recursive solutions as recursion is a requirement.

P.S. 2. I've also tried the ones mentioned in io-recurse-tests, but alas, nothing.

EDIT 1 - Additional information

@Basil Bourque

Post an example of a file name with the problematic Cyrillic characters.

Cyrillic file name examples:

Ъпсурт - Колега.mp3

Kingsize - Оставамсебеси.mp3

@Basil Bourque

What is “VLC” you mentioned?

I was referring to VLC Media Player.

@Basil Bourque

What file system? What is the host operating system?

I'm using Fedora, which uses Btrfs: the b-tree filesystem with UTF-8 as default system encoding.

@g00se

Not quite sure why a) you're mixing Path with File (perhaps because of your positive results with FileChooser or something?)...

ListView<String> lvPrimaryList = new ListView<>();Button btn = new Button("Select file");btn.setOnMouseClicked(new EventHandler<MouseEvent>() {    @Override    public void handle(MouseEvent mouseEvent) {        FileChooser chooser = new FileChooser();        chooser.setTitle(resourceBundle.getString("selectPrimaryFolder"));        File selectedFile = chooser.showOpenDialog((Stage) btn.getScene().getWindow());        System.out.println("SELECTED FILE: " + selectedFile.getAbsolutePath());        lvPrimaryList.getItems().add(selectedFile.getAbsolutePath());    }});

The above code successfully adds /home/user/music/Kingsize - Оставамсебеси.mp3 to the ListView.

But if I try to get a list of all of the files in a directory using File.walk() mentioned above, it ends up as /home/user/music/Kingsize - �������������.mp3.

@g00se

...b) why you think recursion/non-recursion has any bearing on this...

I mentioned recursion in case there are solutions that work but only get a list of files in the current directory and I need to be able to get any possible files within any sub-directories.

@jewelsea

Create an App with a label that displays the hardcoded text of one of > the problem file names: new Scene(new Label("problem text")). That is > all it needs to do. Nothing else. Does the text display correctly?

Using Label does work.

ListView<Label> lvPrimaryList = new ListView<>();Button btn = new Button("Select file");btn.setOnMouseClicked(new EventHandler<MouseEvent>() {    @Override    public void handle(MouseEvent mouseEvent) {        lvPrimaryList.getItems().add(new Label("Ъпсурт - Колега.mp3"));    }});

Hard coding the text via the following code also works:

ListView<String> lvPrimaryList = new ListView<>();Button btn = new Button("Select file");btn.setOnMouseClicked(new EventHandler<MouseEvent>() {    @Override    public void handle(MouseEvent mouseEvent) {        lvPrimaryList.getItems().add("Ъпсурт - Колега.mp3");    }});

@g00se

new Scene(new Label("\u041E\u0434\u0438\u043D"));

ListView<Label> lvPrimaryList = new ListView<>();Button btn = new Button("Select file");btn.setOnMouseClicked(new EventHandler<MouseEvent>() {    @Override    public void handle(MouseEvent mouseEvent) {        lvPrimaryList.getItems().add(new Label("\u041E\u0434\u0438\u043D"));    }});

This does work. I get, Один.

@Basil Bourque

You may need to specify a font you know to have glyphs for yourdesired characters. Other Questions have pointed to JavaFX failing torun through fonts properly to find one with needed glyphs.

I'm using IBM Plex Sans font.

In my Main.java file, I've added it like so:

Font.loadFont(Objects.requireNonNull(getClass().getResource("fonts/IBMPlexSans-Light.ttf")).toExternalForm(), 14);Font.loadFont(Objects.requireNonNull(getClass().getResource("fonts/IBMPlexSans-Medium.ttf")).toExternalForm(), 14);Font.loadFont(Objects.requireNonNull(getClass().getResource("fonts/IBMPlexSans-Regular.ttf")).toExternalForm(), 14);Font.loadFont(Objects.requireNonNull(getClass().getResource("fonts/IBMPlexSans-SemiBold.ttf")).toExternalForm(), 14);

In my style.css file, I've loaded it via:

.root {    -fx-font-family: "IBM Plex Sans";}

I reviewed the font glyphs and it does include Cyrillic, as mentioned in the specifications.

I did remove it, just in case, but I still get ����� in the ListView when using File.walk().

EDIT 2 - Further testing

Using FileChooser and selecting multiple files produces the desired results in the ListView, but the code lacks recursion.

ListView<String> lvPrimaryList = new ListView<>();Button btn = new Button("Select file");btn.setOnMouseClicked(new EventHandler<MouseEvent>() {    @Override    public void handle(MouseEvent mouseEvent) {        FileChooser chooser = new FileChooser();        chooser.setTitle("Select files");        List<File> selectedFiles = chooser.showOpenMultipleDialog((Stage) btn.getScene().getWindow());        for (File file : selectedFiles) {            System.out.println("SELECTED FILE: " + file.getAbsolutePath());            lvPrimaryList.getItems().add(file.getAbsolutePath());        }    }});

It's also a bit less user friendly when there are hundreds of files and the user would have to select all of them. It'd be much better to prompt the user to select a single directory and leave the program to do the "heavy lifting".

EDIT 3 - Minimal reproducible example

In the process of creating a full MRE, I ran into an issue where, after creating the files myself, their names were all: ?????? - ??????.mp3

Files with question marks in their names.

Main.java

public class Main extends Application {    @Override    public void start(Stage stage) throws IOException {        FXMLLoader fxmlLoader = new FXMLLoader(Main.class.getResource("main.fxml"));        Scene scene = new Scene(fxmlLoader.load(), 800, 600);        scene.getStylesheets().add(getClass().getResource("style/style.css").toExternalForm());        stage.setTitle("My Program");        stage.setScene(scene);        stage.show();    }    public static void main(String[] args) {        launch();    }}

MainController.java

public class MainController implements Initializable {    @FXML    private ListView<String> lvPrimaryList;    @FXML    private Button btnSelectFile;    @Override    public void initialize(URL url, ResourceBundle resourceBundle) {        // Select primary list        btnSelectFile.setOnMouseClicked(new EventHandler<MouseEvent>() {            @Override            public void handle(MouseEvent mouseEvent) {                initTest();            }        });    }    private void initTest() {        System.out.println(java.nio.charset.Charset.defaultCharset());        // Edit these to match your operating system        String osDelimiter = "/";        String rootString = "/home/user/Documents/init-test" + osDelimiter;        Path rootPath = Paths.get(rootString);        try {            Files.createDirectory(rootPath);            File songOne = new File(rootString +"Ъпсурт - Колега.mp3");            songOne.createNewFile();            File songTwo = new File(rootString +"Kingsize - Оставамсебеси.mp3");            songTwo.createNewFile();            //System.setOut(new PrintStream(new FileOutputStream(rootString +"Ъпсурт - Колега.mp3"), true, StandardCharsets.UTF_8));            //System.setOut(new PrintStream(new FileOutputStream(rootString +"Kingsize - Оставамсебеси.mp3"), true, StandardCharsets.UTF_8));            //System.setOut(new PrintStream(new FileOutputStream(rootString +"Ъпсурт - Колега.mp3"), true, "Cp1252"));            //System.setOut(new PrintStream(new FileOutputStream(rootString +"Kingsize - Оставамсебеси.mp3"), true, "Cp1252"));        } catch (IOException exception) {            exception.printStackTrace();        }    }}

init-test.fxml

<?xml version="1.0" encoding="UTF-8"?><?import javafx.geometry.Insets?><?import javafx.scene.control.Button?><?import javafx.scene.control.Label?><?import javafx.scene.control.ListView?><?import javafx.scene.layout.VBox?><?import javafx.scene.text.Font?><VBox alignment="CENTER" maxHeight="-Infinity" maxWidth="-Infinity" minHeight="-Infinity" minWidth="-Infinity" spacing="5.0" xmlns="http://javafx.com/javafx/21" xmlns:fx="http://javafx.com/fxml/1" fx:controller="com.project.test.MainController"><children><Label alignment="CENTER" maxWidth="1.7976931348623157E308" text="Init Test"><font><Font name="System Bold" size="13.0" /></font></Label><ListView fx:id="lvPrimaryList" /><Button fx:id="btnSelectFile" mnemonicParsing="false" text="Select file" /></children><padding><Insets bottom="5.0" left="5.0" right="5.0" top="5.0" /></padding></VBox>

I also ran into this post (How to support Cyrillic alphabet in Eclipse?).

It explains that the symbol doesn't have anything to do with the encoding but the font not having the necessary glyphs to support Cyrillic, as mentioned by Basil Bourque - which is weird because of the above experiments resulting in a success when manually selecting the files.

I'll keep updating as I keep investigating.


Viewing all articles
Browse latest Browse all 1060

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>