[Web4lib] RSS and diacritics
Andrew Cunningham
andrewc at vicnet.net.au
Tue Nov 27 19:09:30 EST 2007
Jonathan Gorman wrote:
> There's a lot of software and fonts that don't have very complete character sets. Arial Unicode so far has the most complete that I know of. People using a browser will have to have it set to use a unicode font to
see unicode characters correctly. On top of that, there's a lot of
software that mishandles combining diacritics (IE 6 is one example, if I
recall correctly) and will never display them correctly.
>
There are a few common misconceptions here.
all modern web browsers are Unicode based at the core, older 8-bit
legacy encodings are supported by transcoding to Unicode on the fly
This has been the case since Netscape 4 and IE 3/4
All core operating system fonts (Windows and MacOS are Unicode based)
even core fonts on Windows 98.
There are no pan unicode fonts. There are too many characters in unicode
to be able to have a single font support them. Fonts have physical limits.
Arial Unicode MS only supports a very old version of Unicode, and that
incompletely. It is useful for characters with diacritics when those
characters are precomposed characters. It is not suitable for combining
diacritics. It doesn't have the required mark and mkmk OpenType features
for the Latin script.
Combining diacritic support on the Windows platform requires:
1) an appropriate font, and
2) an appropriate font rendering system
For Windows this means:
a) using Windows Vista, or
b) using Windows XP (Service Pack 2) and installing an appropriate font.
There are a small number of fonts available and enabling complex script
support.
IE6 will display combining diacrtics correctly on Windows XP SP2 (with
complex script support enabled) and if you are using an appropriate
font, e.g. Doulos SIL, Charis SIL, the Gentium Book beta,
African/Aboriginal Sans , African/Aboriginal Serif, Code 2000, and
possibly the latest DejaVu fonts, etc..
> Other issues like bi-directionality are ambiguous and not clear even now. For example, if you have Korean and English in one document, it's not clear what layer of the software is required to do the work necessary so each can be read in the right direction.
>
Korean doesn't require bidi support. I think you are thinking of
vertical text layout here, not bidi support.
Also in XML a schema or DTD should define mechanisms for handling bdi
support or should reference ITS namespace. The RSS schemas/DTD do not.
Lack of bidi support in RSS has been a long standing issue.
--
Andrew Cunningham
Research and Development Coordinator (Vicnet)
State Library of Victoria
328 Swanston Street
Melbourne VIC 3000
Australia
Email: andrewc+AEA-vicnet.net.au
Alt. email: lang.support+AEA-gmail.com
Ph: +613-8664-7430 Fax:+613-9639-2175
Mob: 0421-450-816
http://www.slv.vic.gov.au/ http://www.vicnet.net.au/
http://www.openroad.net.au/ http://www.mylanguage.gov.au/
http://home.vicnet.net.au/~andrewc/
More information about the Web4lib
mailing list