[Web4lib] RSS and diacritics
Andrew Cunningham
andrewc at vicnet.net.au
Thu Nov 29 16:29:25 EST 2007
Hi Bob
Bob Rasmussen wrote:
> On Thu, 29 Nov 2007, Thomas Dowling wrote:
>
>> The more adept browsers out there figured this out quite a while
ago. If the
>> font they're using doesn't have a glyph for the character requested,
they pull
>> the correct glyph from a font that does have it. Awkwardly, there's
a less
>> adept browser that fails to do this, that has about 80% market share...
>>
>> CSS2 requires that browsers work their way down the list of
specified fonts to
>> find the right glyph, not just find a matching font name. IIRC,
Gecko-based
>> browsers and Opera go beyond that to find any system font with the right
>> glyph.
>
> As an aside, that is precisely the approach taken by Anzio, our
terminal emulation package, and Print Wizard, our printing utility.
These programs also take many steps to handle combining diacritics well,
including raising the "above" diacritics where necessary to avoid
collision with the base character.
>
> My perception of the most common issues in regards to library systems
displaying (and printing) diacritics and non-Latin characters:
>
> 1) Very few fonts have the combining double tilde and combining
double ligature marks, used mostly with transliterated Russian.
>
Try the SIL fonts. Charis SIl and Doulos SIL, have had those diacritics
displaying correctly. Hopefully Gentium Book when its finally released
will also support these diacritics.
But also depends on your font rendering technology in use, either latest
Uniscribe, or Graphite within Windows.
My gut reaction though is the core limitation is going to be, not the
fonts or the font rendering system, it is actually the web pages
generated by the vendors. Well structured content following web
internationalization and accessibility best practice would be a breeze
to tweak and get all languages to display fine.
> 2) Software does not correctly combine combining diacritics.
This is simply poor softwrae internationalization. On the right
operating system there is no excuse for diacritics not displaying
properly. if the default rendering of the operating system supports it,
there is no excuse for an application that is well internationalised to
not support it.
Personally, I think vendors are let off too lightly. Generally, they say
they support Unicode, but never spell out what parts they support and
what parts they don't.
From the perspective of my work place, all our web interfaces should
support our state government's web standards. I doubt there is a single
vendor solution in use in our state that does meet those standards.
> 3) Fonts are inconsistent in the way they specify the X-location of
combining diacritics.
A font should use the mark and mkmk features in the GPOS table to
indicate the placement of a diacritic relative to a specific base
character or relative to another diacritic.
But currently few do.
And my current compliant about Vista core fonts is that it positions
combining diacritics conistently at a different height than the
diacritic placement of precomposed glyphs, makes for ugly text when
using a mix of precomposed and composed forms which may be necessary in
some languages.
> 4) Library software I have worked with does not give the browsers
information about the language contained in a particular section of
text. Thus the browser can not take advantage of the user's
language-specific font preferences. This is especially a problem in
rendering Han characters, which could be part of a Japanese, Korean,
Simplified Chinese, or Traditional Chinese title, for instance. With IE,
this seems to force the user to use one super-font, which inevitably has
shortcomings.
>
Yes, An in this scenario , different browsers will have different
responses. Richard Ishida (w3C) but together a test of html CJK data
that wasn't language tagged. Some browsers will default to displaying
CJK data with a Japanese font, others will use a Simplified Chinese
font, in at least one case an older version of opera defaulted to a
Korean font.
> Finally, Andrew Cunningham mentioned Font Linking. According to MS's
documentation, this should make it possible to define a large virtual
font by linking together multiple fonts, without physically combining
the files. So theoretically I could create a font with the missing
ligature marks (see 1 above), and link it to Arial Unicode, for
instance. However, I have never succeeded in this in regards to IE. Has
anyone succeeded in doing this?
Not quite that simple.
To support the missing ligature marks, you'd be better off with a whole
new OpenType font.
To properly handle combining diacritics, esp the double diacritics, you
need to treat the Latin script as a complex script. Which for Windows,
means dealing with uniscribe. And a lot of the font linking smarts
Microsoft uses in its applications are script dependent and built into
Uniscribe. Often this is a fallback.
If you are on Win XP SP2, or Vista download the Charis SIL font at
http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=CharisSIL_download,
its released under OFL so can be redistributed or modified under that
license.
Then have a look at http://www.openroad.net.au/test/sample.html
Andrew
--
Andrew Cunningham
Research and Development Coordinator (Vicnet)
State Library of Victoria
328 Swanston Street
Melbourne VIC 3000
Australia
Email: andrewc+AEA-vicnet.net.au
Alt. email: lang.support+AEA-gmail.com
Ph: +613-8664-7430 Fax:+613-9639-2175
Mob: 0421-450-816
http://www.slv.vic.gov.au/ http://www.vicnet.net.au/
http://www.openroad.net.au/ http://www.mylanguage.gov.au/
http://home.vicnet.net.au/~andrewc/
More information about the Web4lib
mailing list