[Web4lib] What is the microkey?
Lars Aronsson
lars at aronsson.se
Thu Jun 23 04:52:46 EDT 2005
Bob Rasmussen wrote:
> 1. From the charmap utility, select it and copy it to the
> clipboard. Then in whatever program, Paste it in.
Most operating systems and web browsers should also be able to
copy-and-paste from
http://en.wikipedia.org/wiki/Table_of_Unicode_characters%2C_128_to_999
or
http://unicode.coeurlumiere.com/
If you forget these links, just google for "unicode table".
Fifteen years ago everybody around me were using 7 bit ASCII (or
ISO 646) codes which allowed for 96 different printable
characters, i.e. the English alphabet A-Z, lower case a-z, digits
0-9 and the usual !"#$%&/()[]{}+-/*=. Each European language had
their own national version of ISO 646 where some characters from
U.S. ASCII were substituted with the necessary national umlauts.
Britain only needed to substitute # with £. For Sweden it meant
an Ä (A-umlaut) typed on a Swedish keyboard would show up as a [
on a U.S. screen, and any Pascal or C program would look very
strange on a Swedish printer, since a[4] would come out as aÄ4Å.
On a Danish screen or printer the [ would instead show up as a Æ
(AE-ligature), which is fine, since Danes have little use for Ä
anyway. Since both the Swedish Ä and Danish Æ sound somewhat like
an E and the [ looks somewhat like an E, most programmers would
learn to read text like R[KSM\RG]S just like RÄKSMÖRGÅS. There is
a lot of programmer folklore about this. For example, RÄKSMÖRGÅS
(shrimp sandwich) is a good test word that contains all three
special characters used in Swedish. You smile when your local
store or airline check-in desk prints receipts with U.S. [\]
instead of the Swedish ÄÖÅ, because you know the reason.
Then in the years around 1990 three things happened: The Soviet
Union fell apart, the Internet expanded beyond the academic, and
the 7 bit ASCII was being replaced by the 8 bit ISO 8859 standard.
This too was a framework standard with lots of versions, but one
of them, called ISO 8859-1 or Latin-1 contained all 192 characters
used in "Western and Northern European languages", i.e. both Ä, Æ,
[, and ]. This became the default for Unix, Linux, HTML pages,
even for Apple Macintoshes sold on Iceland. Microsoft in Windows
3.1 adapted their own expanded version called Windows-1252.
Of course, these changes were interrelated. Internet's expansion
required that Swedes and Danes could easily communicate through
e-mail without confusing Ä for Æ, so we clearly needed codes for
both. But as the Soviet Union fell apart, we also started to need
codes for Cyrillic and all the accent marks used in Eastern
Europe. These were not present in ISO 8859-1 but in -2, -3, -4,
etc. and there was no easy way to switch between them. The new
solution was Unicode or ISO 10646, which is a 16 bit code with
room for 65,000 different printable characters, including those
mentioned before plus Chinese, Japanese, Korean (CJK), Thai,
Arabic, Hebrew, Greek, phonetic, and what have you not. Well,
some medieval European ligatures are still missing, so that needs
to be fixed in the future. But I guess that Unicode will do for
the next fifteen years anyway.
Many web and software projects are now converting and adapting to
Unicode, including Wikipedia, the Linux operating system from
Redhat version 9, the MySQL database from version 4.1, the Perl
programming language from version 5.8, the Pine mail reader from
version 4.61, etc. But most users are still unaware of this big
change, that will take several years to complete. All major web
browsers (but not Lynx, I'm told) support Unicode already.
Of course, most people will never have keyboards with 65,000 keys.
But all software will need to handle cut-and-paste, and screens
and printers will need fonts that can reproduce them all.
Fortunately, typewheel printers are long since gone, so we don't
need to imagine what they would have looked like.
Already ISO 8859-1 contains µ, but Unicode has the entire
αβγδεζηθικλμνξοπρςστυφχψω. If that looks like Greek to you, it
means things are working as they should. If instead it looks like
garbage, either you or someone along the line should consider to
upgrade to a Unicode-enabled version.
--
Lars Aronsson (lars at aronsson.se)
Aronsson Datateknik - http://aronsson.se
More information about the Web4lib
mailing list