Quantcast
Channel: Active questions tagged utf-8 - Stack Overflow
Viewing all articles
Browse latest Browse all 1060

Java UTF-8 filenames with IBM JVM (AIX)

$
0
0

I'm having trouble understanding the way the IBM JVM's implementation of java.io.File deals with UTF-8 on AIX on the JFS2 filesystem. I suspect there's a system property that I'm overlooking, but I have not yet been able to find it.

Let's assume I have a file named othér (where é is U+00E9 or UTF-8 bytes0xc3 0xa9). The filename is encoded in UTF-8, and was created by a C program:

char filename[] = { 'o', 't', 'h', 0xc3, 0xa9, 'r', 0 };open(filename, O_RDWR|O_CREAT, 0666);

If I create a Unicode string in Java that is representative of the filename, it fails to open it. Further, if I use File.listFiles() in Java, it insists on treating this as a Latin1 string. For example:

String expectedName = new String(new char[] { 'o', 't', 'h', 0xe9, 'r' });File expected = new File(expectedName);if (expected.exists())    System.out.println(expectedName +" exists");else    System.out.println(expectedName +" DOES NOT exist");for (File child : new File(".").listFiles()){    System.out.println(child.getName());    System.out.print("Chars:");    for (char c : child.getName().toCharArray())        System.out.print(" 0x"+ Integer.toHexString((int)c));    System.out.println();}

The results of this program are:

% java -Dfile.encoding=UTF8 FileTestothér DOES NOT existothérChars: 0x6f 0x74 0x68 0xc3 0xa9 0x72

So it appears that my filenames are getting treated as Latin1. I've tried setting the file.encoding system property to UTF8 and the client.encoding.override system property to UTF-8 to no avail. My LANG and LC_ALL settings are en_US.UTF-8:

% echo $LANGen_US.UTF-8% echo $LC_ALLen_US.UTF-8

My system's "Primary Language Environment", as configured by SMIT, is "ISO8859-1". I don't really know the full impact this setting has, but I cannot change it. I suspect that if I could change this to "UTF8 English" then that may fix the problem, but since JFS2 stores filenames in Unicode and Java operates in Unicode internally, I feel like there should be a more general solution to the problem.

Is there another system property to J9 that I can set that will make force it to use UTF-8 filenames regardless of my SMIT setting?

AIX version is 5.2, Java version is IBM J9 (1.5.0), filesystem is JFS2:

rs6000% uname -aAIX rs6000 2 5 000A9B7C4C00rs6000% java -versionjava version "1.5.0"Java(TM) 2 Runtime Environment, Standard Edition (build pap32dev-20091106a (SR11 ))IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 AIX ppc-32 j9vmap3223-20091104 (JIT enabled)J9VM - 20091103_45935_bHdSMrJIT  - 20091016_1845_r8GC   - 20091026_AA)JCL  - 20091106rs6000% mount|grep /home         /dev/hd1         /home            jfs2   Jun 27 16:02 rw,log=/dev/hd8 

Update: this still occurs on Java6:

% java -versionjava version "1.6.0"Java(TM) SE Runtime Environment (build pap3260sr11-20120806_01(SR11))IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 AIX ppc-32 jvmap3260sr11-20120801_118201 (JIT enabled, AOT enabled)J9VM - 20120801_118201JIT  - r9_20120608_24176ifx1GC   - 20120516_AA)JCL  - 20120713_01

Viewing all articles
Browse latest Browse all 1060

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>