[WEB4LIB] macrons in html?

Bob Rasmussen ras at anzio.com
Tue May 23 14:42:36 EDT 2000


On Tue, 23 May 2000, Kenneth  R.  Irwin wrote:

> Hi folks -- am I crazy, or is there no way to do a useful macron in HTML?
> 
> theres the ¯ entity for a macron by itself, but there's no ē or
> &macre. 

Some more background, beyond what's already been offered:

The "named" character entities (such as "&macr") generally cover only those
characters in the Latin 1 character set, which does not include e-macron. For
other characters, you can code the value using its decimal or hex numeric
equivalents. The standard for these values is Unicode. 

Note that this applies not just to accented characters. It can also be used to
represent, Chinese, Cyrillic, Hebrew, and more; even Burmese and Syriac (as of
Unicode 3). As has been pointed out by others, later versions of web browsers
do a better job of handling these than do earlier versions.

Note that the 30 "combining diacritics" defined in USMARC (now MARC 21, I
believe) all exist in Unicode. A character table is available at
   http://lcweb.loc.gov/marc/specifications/speccharlatin.html

Unicode defines many diacritics in both a "spacing" form and a
"combining" form. Thus a spacing macron is defined as hex AF, and a combining
macron is hex 304. In Unicode, a combining character combines with the
character BEFORE it (opposite of MARC). 

When you want to make a web page display a character/diacritic combination, IF
that combination is defined in Unicode, it is generally preferable and more
reliable to specify that combination. However, many combinations do NOT exist
in Unicode, because they "do not appear in nature"; that is, they are not used
in an actual written language but only for Romanized (transliterated) forms.
So if you want to display a 'v' with an acute accent, for instance, you would
have to do a 'v' followed by a combining acute (hex 301). There are even MORE
chances for browsers (and other software) to get this wrong, with dependencies
on the design of the font being used.

All of which brings me to a "pitch". There is a great deal of work in
international character sets going on in the Unicode organization
(www.unicode.org). Their semi-annual conferences are a great place to learn,
discuss, and affect development. The presentations are quite fascinating. And
yet I rarely see more than two or three people from the library industry
there. Please check out their web site and consider attending.

(Disclosure: I'll be speaking at the next Unicode conference.)

-- 
Regards,
....Bob Rasmussen,   President,   Rasmussen Software, Inc.

personal e-mail: ras at anzio.com
 company e-mail: rsi at anzio.com 
          voice: (US) 503-624-0360 (9:00-6:00 Pacific Time)
            fax: (US) 503-624-0760
            web: http://www.anzio.com         



More information about the Web4lib mailing list