[Kolab-devel] [SOLVED] Re: ASCII representation of unicode characters
Jeroen van Meeuwen (Kolab Systems)
vanmeeuwen at kolabsys.com
Mon Dec 5 15:20:59 CET 2011
On 2011-12-05 11:51, Mathieu Parent wrote:
> Hi,
>
> 2011/12/5 Jeroen van Meeuwen (Kolab Systems)
> <vanmeeuwen at kolabsys.com>:
>> On 2011-12-05 9:40, Aleksander Machniak wrote:
>>> On 05.12.2011 10:31, Jeroen van Meeuwen (Kolab Systems) wrote:
>>>
>>>> I have created a table of characters going from through to
>>>> Ѐ[2] (there's more[3]) and I am seeking a logical, codified
>>>> approach to "normalizing" as much of the unicode to ascii. I would
>>>> appreciate your help in outlining what the rules would need to
>>>> be(come).
>>>
>>> You could try using iconv with //TRANSLIT.
>>>
>>>
>>>
>>> http://stackoverflow.com/questions/4910627/php-iconv-translit-for-removing-accents-not-working-as-excepted
>>
>> I don't think this is satisfactory, iconv() outputs, given the
>> following code-snippet:
>>
>> $original = 'Ü';
>> $translated = iconv('UTF-8', 'ASCII//TRANSLIT', $original);
>> print "$original\t$translated\n";
>>
>> $ php unicode-to-ascii.php
>> Ü U
>> ü u
>> $
>>
>> We currently use 'ue' as the substitute for 'ü' however
>> (bruederli at kolabsys.com for Thomas, for example).
>>
>
> It seems that setting LC_ALL to the intended language does
> transliteration right:
> http://php.net/manual/en/function.iconv.php#105507
>
You're right, it does.
So it seems we want to be setting the locale / language depending on
the account created/managed... In order to get the account details to be
filled in to the recipient user expectations...
The example case is our systems using en_US.UTF-8, but many German /
Swiss names.
I suppose we can use preferredLanguage in LDAP[1] with valid content
described in [2], and fall back to ... the active system language or any
specifically configured preferred language perhaps.
[1] http://tools.ietf.org/html/rfc2798#section-2.7
[2] http://tools.ietf.org/html/rfc2068#section-14.13
Here's how a simple routine would look in Python (which is where this
segment of the code is implemented in its primary function):
#!/usr/bin/python
# -*- coding: utf-8 -*-
# On the command line, this would look as follows:
# $ echo "Brüderli" | env LANG=de_CH.ISO8859-1 iconv -f 'UTF-8' -t
'ASCII//TRANSLIT' -s
import locale
import subprocess
(locale_name,locale_charset) = locale.normalize('de_CH').split('.')
locale.setlocale(locale.LC_ALL, (locale_name,locale_charset))
command = [ '/usr/bin/iconv',
'-f', 'UTF-8',
'-t', 'ASCII//TRANSLIT',
'-s' ]
process = subprocess.Popen(command, stdout=subprocess.PIPE,
stdin=subprocess.PIPE, stderr=subprocess.PIPE, env={'LANG':
locale.normalize('de_CH')})
print >> process.stdin, "Brüderli\n"
print process.communicate()[0].strip()
That'll settle it, I think.
Kind regards,
Jeroen van Meeuwen
--
Senior Engineer, Kolab Systems AG
e: vanmeeuwen at kolabsys.com
t: +44 144 340 9500
m: +44 74 2516 3817
w: http://www.kolabsys.com
pgp: 9342 BF08
More information about the devel
mailing list