Quantcast
Channel: Active questions tagged utf-8 - Stack Overflow
Viewing all articles
Browse latest Browse all 1052

Why the need to `decode_utf8` LDAP attribute values in an UTF-8 enviromment?

$
0
0

I wrote a Perl program in an UTF-8 environment (LC_CTYPE="en_US.UTF-8", Emacs displays "UUU") that reads an UTF-8 encoded file, inserts some attribute values read from an OpenLDAP server, and then writes the result into an UTF-8 encoded file.

It seems I had to use

use utf8;use open IO => ':locale';use open ':std' => ':locale';use feature qw(unicode_strings);

in addition to binmode $fh, ":utf8"; before writing the output to $fh.

With these additions the string lengths inside Perl seem correct, and the output file, too.

However as I found out lately, attribute values from LDAP (the result of some $ldap->search()) that contain non-ASCII characters (like a German Umlaut) are not correct.As they looked to me like "double-encoded", I added decoding them like this (to create a hash reference for the entry, keyed by DN):

$data{$entry->dn} = {map {    $_ => [map { decode_utf8($_) } $entry->get_value($_)]} $entry->attributes(NLK_NO_OPTIONS => 1)};

While working, this looks strange, and I wonder whether there is a more elegant solution (after all I don't really understand what's going on internally).

For testing I had created a user like this:

dn: uid=testuser,ou=people,dc=company,dc=orgcn: User TestgidNumber: 54321givenName: UserhomeDirectory: /tmp/testuserloginShell: /bin/bashsn: Testuid: testuseruidNumber: 54321objectClass: topobjectClass: posixAccountobjectClass: inetOrgPersondisplayName:: QsO2c2VyIFVtbGF1dCBpbSBOYW1lbg==

Without the fix the displayName would be displayed as "Böser Umlaut im Namen", and after the fix it's displayed as "Böser Umlaut im Namen".


Viewing all articles
Browse latest Browse all 1052

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>