[Web4lib] What is the microkey?

Lars Aronsson lars at aronsson.se
Thu Jun 23 04:52:46 EDT 2005


Bob Rasmussen wrote:
> 1. From the charmap utility, select it and copy it to the 
> clipboard. Then in whatever program, Paste it in.

Most operating systems and web browsers should also be able to 
copy-and-paste from 
http://en.wikipedia.org/wiki/Table_of_Unicode_characters%2C_128_to_999
or
http://unicode.coeurlumiere.com/

If you forget these links, just google for "unicode table".

Fifteen years ago everybody around me were using 7 bit ASCII (or 
ISO 646) codes which allowed for 96 different printable 
characters, i.e. the English alphabet A-Z, lower case a-z, digits 
0-9 and the usual !"#$%&/()[]{}+-/*=.  Each European language had 
their own national version of ISO 646 where some characters from 
U.S. ASCII were substituted with the necessary national umlauts.  
Britain only needed to substitute # with £.  For Sweden it meant 
an Ä (A-umlaut) typed on a Swedish keyboard would show up as a [ 
on a U.S. screen, and any Pascal or C program would look very 
strange on a Swedish printer, since a[4] would come out as aÄ4Å.  
On a Danish screen or printer the [ would instead show up as a Æ 
(AE-ligature), which is fine, since Danes have little use for Ä 
anyway.  Since both the Swedish Ä and Danish Æ sound somewhat like 
an E and the [ looks somewhat like an E, most programmers would 
learn to read text like R[KSM\RG]S just like RÄKSMÖRGÅS.  There is 
a lot of programmer folklore about this.  For example, RÄKSMÖRGÅS 
(shrimp sandwich) is a good test word that contains all three 
special characters used in Swedish.  You smile when your local 
store or airline check-in desk prints receipts with U.S. [\] 
instead of the Swedish ÄÖÅ, because you know the reason.

Then in the years around 1990 three things happened: The Soviet 
Union fell apart, the Internet expanded beyond the academic, and 
the 7 bit ASCII was being replaced by the 8 bit ISO 8859 standard.  
This too was a framework standard with lots of versions, but one 
of them, called ISO 8859-1 or Latin-1 contained all 192 characters 
used in "Western and Northern European languages", i.e. both Ä, Æ, 
[, and ].  This became the default for Unix, Linux, HTML pages, 
even for Apple Macintoshes sold on Iceland.  Microsoft in Windows 
3.1 adapted their own expanded version called Windows-1252.

Of course, these changes were interrelated.  Internet's expansion 
required that Swedes and Danes could easily communicate through 
e-mail without confusing Ä for Æ, so we clearly needed codes for 
both.  But as the Soviet Union fell apart, we also started to need 
codes for Cyrillic and all the accent marks used in Eastern 
Europe.  These were not present in ISO 8859-1 but in -2, -3, -4, 
etc. and there was no easy way to switch between them.  The new 
solution was Unicode or ISO 10646, which is a 16 bit code with 
room for 65,000 different printable characters, including those 
mentioned before plus Chinese, Japanese, Korean (CJK), Thai, 
Arabic, Hebrew, Greek, phonetic, and what have you not.  Well, 
some medieval European ligatures are still missing, so that needs 
to be fixed in the future.  But I guess that Unicode will do for 
the next fifteen years anyway.

Many web and software projects are now converting and adapting to 
Unicode, including Wikipedia, the Linux operating system from 
Redhat version 9, the MySQL database from version 4.1, the Perl 
programming language from version 5.8, the Pine mail reader from 
version 4.61, etc.  But most users are still unaware of this big 
change, that will take several years to complete.  All major web 
browsers (but not Lynx, I'm told) support Unicode already.

Of course, most people will never have keyboards with 65,000 keys.  
But all software will need to handle cut-and-paste, and screens 
and printers will need fonts that can reproduce them all. 
Fortunately, typewheel printers are long since gone, so we don't 
need to imagine what they would have looked like.

Already ISO 8859-1 contains µ, but Unicode has the entire 
αβγδεζηθικλμνξοπρςστυφχψω.  If that looks like Greek to you, it 
means things are working as they should.  If instead it looks like 
garbage, either you or someone along the line should consider to 
upgrade to a Unicode-enabled version.


-- 
  Lars Aronsson (lars at aronsson.se)
  Aronsson Datateknik - http://aronsson.se


More information about the Web4lib mailing list