[Web4lib] Authors name standardization. Your opinion?

Jakob Voss jakob.voss at gbv.de
Tue Nov 14 09:31:52 EST 2006

Thomas Krichel wrote (at the Web4lib mailinglist):

>>  Well, this could be a "solution" for Spanish (and Portuguese, I
>> guess) speaking countries --although it will be difficult to
>> convince some people to follow this rule at the first try-.
>   It is quite useless to want to impose rules on the consistency
>   of name writing. A consistent rule for all languages would 
>   very hard for anyone to remember, and could therefore
>   not be adequately implemented. 
>   You may want to have a look at the ACIS project 
>   http://acis.openlib.org. This builds software to run portals
>   where authors can register, and maintain a name variations
>   profile. Software is implemented for the RePEc digital
>   library at the RePEc author service, see
>   http://authors.repec.org.

Can you download the database of authors and name variations? Is it open
content? If every institution has to maintain their own authority
control service then the problem will only get worse. We need an Open
Authority File Initiative like the Open Archive Initiative (and don't
forget to mention "Semantic Web" in your research proposals because
Authority control is exactely what the Semantic Web misses ;-). Metadata
needs to be accessible for everyone to be usable! By the way the Open
Archive Initiative also has problems with different spellings of names -
I wonder how their Object Reuse and Exchange (ORE) initiative should
world if I look at the dirty metadata that is listed in DOAJ.

Karen Coyle wrote:

> An author ID would solve part of the problem -- that is, it could make
> it explicit that author 1234567 wrote both article A and article P, at
> least in a situation where the author ID was attached to both
> articles.

That's exactely what German National Library, Library of Congress, and
OCLC do in the VIAF (Virtual International Authority File) project:

Such author IDs are used in libraries and other instututions for years.
The problem is that these institutions do not redistribute their IDs but
you can only get the names!

Lars Aronsson wrote:
> Julie Nye wrote:
>> Isn't there a "party identifier" scheme under discussion now that
>> would do essentially this, for personal and corporate authors?
>> http://collectionscanada.ca/iso/tc46sc9/27729.htm
> Interesting.  Even though it isn't spelled out, I predict that the
> proposed ISPI scheme will be financed with fees from publishers,
> much like the existing ISBN and DOI schemes.  The purpose seems to
> be to make the job easier for RIAA, IFPI etc. to collect royalties
> from sales.

Thanks for yet another point while the ISO approach is crap. ISO is slow
and you even have to pay for the description of the standard! This is
ridiculous, we are in the age of Web 2.0! If you really wanted
identifiers for people then write an RFC and assign an URN-namespace or
use the info-URI namespace.


Look: there already *is* a namespace for Library of congress authority


> If I go to http://catalog.loc.gov/ or http://authorities.loc.gov/
> and do a search for Karel Capek, the name I find with 279 titles
> is "C?apek, Karel, 1890-1938", as distinguished from "C?apek,
> Karel, 1930-" and other people.  The LCCN for the authority record
> seems to be "n 50035042".  But is there any way I can make a web
> link based on such a name?

No. And you cannot get a full list of all LCCN records. And the info-URI
pointless because there is no service that understands
"info:lccn/n50035042". So it's totally useless and damages the
reputation of the whole URI/info-URI system. There should be a policy
that you only get a namespace if you provide your identifiers to the public.

Lars also mentions the effords of authority control in Wikipedia. My
first paper on this is already some month ago but still contains the
most important parts of the concept:


By the way you can also generate a dictionary of names in different
languages with Wikipedia articles and the links between different
languages versions. Here is the data:


Have a look at the files "page.sql.gz" and "langlinks.sql.gz" in each

We don't need no new concepts, standards and complex systems - we just
need a library 2.0 spirit of sharing and collaboration.


Jakob Voß
Jakob Voss
Voß, Jakob
Voss, Jakob
J. Voß
J. Voss

More information about the Web4lib mailing list