[Kolab-devel] ASCII representation of unicode characters

Mathieu Parent math.parent at gmail.com
Mon Dec 5 12:51:38 CET 2011


Hi,

2011/12/5 Jeroen van Meeuwen (Kolab Systems) <vanmeeuwen at kolabsys.com>:
> On 2011-12-05 9:40, Aleksander Machniak wrote:
>> On 05.12.2011 10:31, Jeroen van Meeuwen (Kolab Systems) wrote:
>>
>>> I have created a table of characters going from  through to
>>> Ѐ[2] (there's more[3]) and I am seeking a logical, codified
>>> approach to "normalizing" as much of the unicode to ascii. I would
>>> appreciate your help in outlining what the rules would need to
>>> be(come).
>>
>> You could try using iconv with //TRANSLIT.
>>
>>
>> http://stackoverflow.com/questions/4910627/php-iconv-translit-for-removing-accents-not-working-as-excepted
>
> I don't think this is satisfactory, iconv() outputs, given the
> following code-snippet:
>
>     $original = 'Ü';
>     $translated = iconv('UTF-8', 'ASCII//TRANSLIT', $original);
>     print "$original\t$translated\n";
>
> $ php unicode-to-ascii.php
> Ü       U
> ü       u
> $
>
> We currently use 'ue' as the substitute for 'ü' however
> (bruederli at kolabsys.com for Thomas, for example).
>

It seems that setting LC_ALL to the intended language does
transliteration right:
http://php.net/manual/en/function.iconv.php#105507

<?php
//some German
$utf8_sentence = 'Weiß, Goldmann, Göbel, Weiss, Göthe, Goethe und Götz';

//UK
setlocale(LC_ALL, 'en_GB');

//transliterate
$trans_sentence = iconv('UTF-8', 'ASCII//TRANSLIT', $utf8_sentence);

//gives [Weiss, Goldmann, Gobel, Weiss, Gothe, Goethe und Gotz]
//which is our original string flattened into 7-bit ASCII as
//an English speaker would do it (ie. simply remove the umlauts)
echo $trans_sentence . PHP_EOL;

//Germany
setlocale(LC_ALL, 'de_DE');

$trans_sentence = iconv('UTF-8', 'ASCII//TRANSLIT', $utf8_sentence);

//gives [Weiss, Goldmann, Goebel, Weiss, Goethe, Goethe und Goetz]
//which is exactly how a German would transliterate those
//umlauted characters if forced to use 7-bit ASCII!
//(because really ä = ae, ö = oe and ü = ue)
echo $trans_sentence . PHP_EOL;

---------------

Regards
-- 
Mathieu Parent




More information about the devel mailing list