I wrote a Perl program in an UTF-8 environment (LC_CTYPE="en_US.UTF-8"
, Emacs displays "UUU") that reads an UTF-8 encoded file, inserts some attribute values read from an OpenLDAP server, and then writes the result into an UTF-8 encoded file.
It seems I had to use
use utf8;use open IO => ':locale';use open ':std' => ':locale';use feature qw(unicode_strings);
in addition to binmode $fh, ":utf8";
before writing the output to $fh
.
With these additions the string lengths inside Perl seem correct, and the output file, too.
However as I found out lately, attribute values from LDAP (the result of some $ldap->search()
) that contain non-ASCII characters (like a German Umlaut) are not correct.As they looked to me like "double-encoded", I added decoding them like this (to create a hash reference for the entry, keyed by DN):
$data{$entry->dn} = {map { $_ => [map { decode_utf8($_) } $entry->get_value($_)]} $entry->attributes(NLK_NO_OPTIONS => 1)};
While working, this looks strange, and I wonder whether there is a more elegant solution (after all I don't really understand what's going on internally).
For testing I had created a user like this:
dn: uid=testuser,ou=people,dc=company,dc=orgcn: User TestgidNumber: 54321givenName: UserhomeDirectory: /tmp/testuserloginShell: /bin/bashsn: Testuid: testuseruidNumber: 54321objectClass: topobjectClass: posixAccountobjectClass: inetOrgPersondisplayName:: QsO2c2VyIFVtbGF1dCBpbSBOYW1lbg==
Without the fix the displayName
would be displayed as "Böser Umlaut im Namen", and after the fix it's displayed as "Böser Umlaut im Namen".