[Kolab-devel] ASCII representation of unicode characters

Jeroen van Meeuwen (Kolab Systems) vanmeeuwen at kolabsys.com
Mon Dec 5 10:31:00 CET 2011


Hello,

I'm hoping you can help me solve the following problem, or get closer 
to a solution.

When for example a recipient policy^1 wants to translate a givenName 
and sn (surname) into an email address, the two names can hold virtually 
any character set but the email address must be in ASCII.

^1: A recipient policy uses data from the name for an entity to compose 
other attributes for the entity given a template (i.e. "Jeroen" 
(givenName), "van Meeuwen" (surname) becomes, given the template 
'%(givenname)s.%(sn)s@%(domain)s' for the email address: 
'jeroen.vanmeeuwen at kolabsys.com').

Examples are people's names with accents (grave, acute, circumflex), 
German (umlauts, ß[1]), and of course entirely non-roman alphabets such 
as Cyrillic and Greek.

My problem is translating these characters from the input value into 
the output value. I only speak/understand a limited number of languages, 
but from what I understand mostly the non-ascii characters are 
translated into their 'phonetic equivalent representation'. 'Ü' usually 
becomes 'Ue', for example. I think for some characters or instances 
thereof, however, it's not safe to just translate them. 'ß' for example, 
I believe, can become 'ss' or 'sz', depending on a couple of rules that 
humans understand but that are hard to codify.

I have created a table of characters going from  through to 
Ѐ[2] (there's more[3]) and I am seeking a logical, codified 
approach to "normalizing" as much of the unicode to ascii. I would 
appreciate your help in outlining what the rules would need to be(come).

Thanks in advance!

[1] http://en.wikipedia.org/wiki/%C3%9F
[2] http://hosted.kolabsys.com/~vanmeeuwen/unicodechars.htm
[3] http://ascii-table.com/unicode.php (17 times 65536)

Kind regards,

Jeroen van Meeuwen

-- 
Senior Engineer, Kolab Systems AG

e: vanmeeuwen at kolabsys.com
t: +44 144 340 9500
m: +44 74 2516 3817
w: http://www.kolabsys.com

pgp: 9342 BF08




More information about the devel mailing list