[Web4lib] RSS and diacritics

Andrew Cunningham andrewc at vicnet.net.au
Tue Nov 27 19:09:30 EST 2007



Jonathan Gorman wrote:
> There's a lot of software and fonts that don't have very complete character sets.  Arial Unicode so far has the most complete that I know of. People using a browser will have to have it set to use a unicode font to 
see unicode characters correctly.  On top of that, there's a lot of 
software that mishandles combining diacritics (IE 6 is one example, if I 
recall correctly) and will never display them correctly.
> 

There are a few common misconceptions here.

all modern web browsers are Unicode based at the core, older 8-bit 
legacy encodings are supported by transcoding to Unicode on the fly

This has been the case since Netscape 4 and IE 3/4

All core operating system fonts (Windows and MacOS are Unicode based) 
even core fonts on Windows 98.

There are no pan unicode fonts. There are too many characters in unicode 
to be able to have a single font support them. Fonts have physical limits.

Arial Unicode MS only supports a very old version of Unicode, and that 
incompletely. It is useful for characters with diacritics when those 
characters are precomposed characters. It is not suitable for combining 
diacritics. It doesn't have the required mark and mkmk OpenType features 
for the Latin script.

Combining diacritic support on the Windows platform requires:

1) an appropriate font, and
2) an appropriate font rendering system

For Windows this means:

a) using Windows Vista, or
b) using Windows XP (Service Pack 2) and installing an appropriate font. 
There are a small number of fonts available and enabling complex script 
support.

IE6 will display combining diacrtics correctly on Windows XP SP2 (with 
complex script support enabled) and if you are using an appropriate 
font, e.g. Doulos SIL, Charis SIL, the Gentium Book beta, 
African/Aboriginal Sans , African/Aboriginal Serif, Code 2000, and 
possibly the latest DejaVu fonts, etc..


> Other issues like bi-directionality are ambiguous and not clear even now.  For example, if you have Korean and English in one document, it's not clear what layer of the software is required to do the work necessary so each can be read in the right direction.
> 

Korean doesn't require bidi support. I think you are thinking of 
vertical text layout here, not bidi support.

Also in XML a schema or DTD should define mechanisms for handling bdi 
support or should reference ITS namespace. The RSS schemas/DTD do not. 
Lack of bidi support in RSS has been a long standing issue.


-- 
Andrew Cunningham
Research and Development Coordinator (Vicnet)
State Library of Victoria
328 Swanston Street
Melbourne VIC 3000
Australia

Email: andrewc+AEA-vicnet.net.au
Alt. email: lang.support+AEA-gmail.com

Ph: +613-8664-7430                    Fax:+613-9639-2175
Mob: 0421-450-816

http://www.slv.vic.gov.au/            http://www.vicnet.net.au/
http://www.openroad.net.au/           http://www.mylanguage.gov.au/
http://home.vicnet.net.au/~andrewc/


More information about the Web4lib mailing list