[Web4lib] RSS and diacritics

Andrew Cunningham andrewc at vicnet.net.au
Thu Nov 29 16:29:25 EST 2007


Hi Bob

Bob Rasmussen wrote:
 > On Thu, 29 Nov 2007, Thomas Dowling wrote:
 >
 >> The more adept browsers out there figured this out quite a while 
ago.  If the
 >> font they're using doesn't have a glyph for the character requested, 
they pull
 >> the correct glyph from a font that does have it.  Awkwardly, there's 
a less
 >> adept browser that fails to do this, that has about 80% market share...
 >>
 >> CSS2 requires that browsers work their way down the list of 
specified fonts to
 >> find the right glyph, not just find a matching font name.  IIRC, 
Gecko-based
 >> browsers and Opera go beyond that to find any system font with the right
 >> glyph.
 >
 > As an aside, that is precisely the approach taken by Anzio, our 
terminal emulation package, and Print Wizard, our printing utility. 
These programs also take many steps to handle combining diacritics well, 
including raising the "above" diacritics where necessary to avoid 
collision with the base character.
 >
 > My perception of the most common issues in regards to library systems 
displaying (and printing) diacritics and non-Latin characters:
 >
 > 1) Very few fonts have the combining double tilde and combining 
double ligature marks, used mostly with transliterated Russian.
 >

Try the SIL fonts. Charis SIl and Doulos SIL, have had those diacritics 
displaying correctly. Hopefully Gentium Book when its finally released 
will also support these diacritics.

But also depends on your font rendering technology in use, either latest 
Uniscribe, or Graphite within Windows.

My gut reaction though is the core limitation is going to be, not the 
fonts or the font rendering system, it is actually the web pages 
generated by the vendors. Well structured content following web 
internationalization and accessibility best practice would be a breeze 
to tweak and get all languages to display fine.

 > 2) Software does not correctly combine combining diacritics.

This is simply poor softwrae internationalization. On the right 
operating system there is no excuse for diacritics not displaying 
properly. if the default rendering of the operating system supports it, 
there is no excuse for an application that is well internationalised to 
not support it.

Personally, I think vendors are let off too lightly. Generally, they say 
they support Unicode, but never spell out what parts they support and 
what parts they don't.

 From the perspective of my work place, all our web interfaces should 
support our state government's web standards. I doubt there is a single 
vendor solution in use in our state that does meet those standards.

 > 3) Fonts are inconsistent in the way they specify the X-location of 
combining diacritics.

A font should use the mark and mkmk features in the GPOS table to 
indicate the placement of a diacritic relative to a specific base 
character or relative to another diacritic.

But currently few do.

And my current compliant about Vista core fonts is that it positions 
combining diacritics conistently at a different height than the 
diacritic placement of precomposed glyphs, makes for ugly text when 
using a mix of precomposed and composed forms which may be necessary in 
some languages.

 > 4) Library software I have worked with does not give the browsers 
information about the language contained in a particular section of 
text. Thus the browser can not take advantage of the user's 
language-specific font preferences. This is especially a problem in 
rendering Han characters, which could be part of a Japanese, Korean, 
Simplified Chinese, or Traditional Chinese title, for instance. With IE, 
this seems to force the user to use one super-font, which inevitably has 
shortcomings.
 >

Yes, An in this scenario , different browsers will have different 
responses. Richard Ishida (w3C) but together a test of html CJK data 
that wasn't language tagged. Some browsers will default to displaying 
CJK data with a Japanese font, others will use a Simplified Chinese 
font, in at least one case an older version of opera defaulted to a 
Korean font.

 > Finally, Andrew Cunningham mentioned Font Linking. According to MS's 
documentation, this should make it possible to define a large virtual 
font by linking together multiple fonts, without physically combining 
the files. So theoretically I could create a font with the missing 
ligature marks (see 1 above), and link it to Arial Unicode, for 
instance. However, I have never succeeded in this in regards to IE. Has 
anyone succeeded in doing this?

Not quite that simple.

To support the missing ligature marks, you'd be better off with a whole 
new OpenType font.

To properly handle combining diacritics, esp the double diacritics, you 
need to treat the Latin script as a complex script. Which for Windows, 
means dealing with uniscribe. And a lot of the font linking smarts 
Microsoft uses in its applications are script dependent and built into 
Uniscribe. Often this is a fallback.


If you are on Win XP SP2, or Vista download the Charis SIL font at 
http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=CharisSIL_download, 
its released under OFL so can be redistributed or modified under that 
license.

Then have a look at http://www.openroad.net.au/test/sample.html

Andrew

-- 
Andrew Cunningham
Research and Development Coordinator (Vicnet)
State Library of Victoria
328 Swanston Street
Melbourne VIC 3000
Australia

Email: andrewc+AEA-vicnet.net.au
Alt. email: lang.support+AEA-gmail.com

Ph: +613-8664-7430                    Fax:+613-9639-2175
Mob: 0421-450-816

http://www.slv.vic.gov.au/            http://www.vicnet.net.au/
http://www.openroad.net.au/           http://www.mylanguage.gov.au/
http://home.vicnet.net.au/~andrewc/



More information about the Web4lib mailing list