Imposing a rational structure on an existing system that has evolved over thousands of years is almost guaranteed to be impossible. Nevertheless, The Unicode Consortium is engaged in such a task: providing a "universal, efficient, uniform, and unambiguous" encoding of all the various 'alphabetic' and other printing characters from the world's major writing systems, whilst maintaining backwards compatibility with existing Standards, such as the 8-bit ASCII/ISO-Latin. These goals are impossible to satisfy simultaneously, which makes the actual achievement of this work all the more remarkable.
Just think of the problems they have had to address: different alphabets sharing some but not all of their characters, accents and doubled letters changing collation order, left-to-right or right-to-left printing, Greek letters in the Greek alphabet and for mathematics, Latin letters in the Latin alphabets and also as numerals, Dingbats, special computer control codes, ...
The Standard contains pages and pages of characters, with the various Chinese and Japanese scripts dominating, naturally, and also including Latin, Greek, Cyrillic, Armenian, Hebrew, Arabic, Devanagari, Bengali, Gurmakhi, Gujarati, Oriya, Tamil, Telengu, Kannada, Malayalam, Thai, Lao, Tibetan, Georgian, ..., and symbols. Of more interest to the general reader are some fascinating chapters on the problems faced, and the design solutions and compromises chosen, lucidly explained. For example, the majority of accents are defined as 'combining characters', allowing them to be combined with plain letters to produce accented letters (with an amazing aside on how to combine several combining characters...). But since the Latin-1 supplement (characters x80--xFF), which contains explicit accented characters sufficient for most western European languages, is an existing commonly used standard, it gets included, too.
I started looking at Unicode as part of the Z language Standardisation effort (unbelievably, not all of Z's character set is in the vastness of Unicode 2.0), and I thought that, like many Standards, I would find it a fairly turgid read. But I became fascinated.
1999: Not everyone has my high opinion of the success of Unicode, however. I saw a recent email on the subject, which I paraphrase and summarise as:Unicode unifies the 'Chinese characters' that occur in Chinese, Japanese and Korean. But this unification is too crude, and is quite unsatisfactory for the Japanese language.The crudeness can be explained by analogy: unify A in the Latin alphabet with Greek capital alpha and the first letter of Cyrillic alphabet, because they all look the same. And similarly unify any other letters of these three alphabets that resemble one anther.