[WEB4LIB] Re: Java and e-resource vendors

Walt_Crawford at notes.rlg.org Walt_Crawford at notes.rlg.org
Mon Sep 24 10:47:28 EDT 2001


Richard,

Responding only to issues of nonroman material in "Westernized" interfaces
(I don't know much about XML):

First, I would note that RLG was the first to offer cataloging tools in not
only CJK but also Cyrillic, Hebrew, and Arabic...("JACKPHY" is what LC
calls this expanded cluster)

Second, Eureka--our end-user search system--has provided full JACKPHY
display and retrieval since December 2000, using Unicode, _not_ using any
special downloaded software on the client side. (RLG is a founding member
of the Unicode Consortium, the only original member within the library
community.)

Unicode display works very well in Internet Explorer 5 and above, including
directionality, which IE takes care of. It's actually fairly spectacular to
see cataloging records fully displayed in Arabic, with left-aligned
romanized and right-aligned Arabic field pairs displayed (and with
positional characteristics handled properly. Some of us were astonished to
see just what a large proportion of RLG Union Catalog records actually
included nonroman text when we started to display it (I believe it's more
than three million records).

(We haven't fully tested Netscape 6.x, but earlier Netscape versions simply
ignore directionality flags, so Hebrew and Arabic are reversed on the
screen. Opera5, for all its virtues, doesn't understand Unicode at all.
Sigh.)

I'm not sure I understand the problem of _displaying_ nonroman text within
a "Westernized" application, any more than we've ever had a problem
displaying records in Latin or German or French even though the interface
is in English.

Searching is a slightly different beast, for two reasons: (1) Without
client software, composing Unicode characters outside the standard Latin1
set is non-trivial [we do support cut-and-paste searching and nonroman
hyperlinks from access points], (2) There's no good way to sort headings or
records across character sets, or even necessarily within a character set,
given that Unicode represents shapes rather than meanings.

As to multilingual _interfaces_, VTLS probably knows more about that field
than most...

NOTE: My comments only, based on overseeing our implementation of Unicode
within Eureka--but I'm no expert on script issues and this shouldn't be
taken as an official RLG statement!

-walt crawford, still senior analyst for Eureka at RLG-



More information about the Web4lib mailing list