[WEB4LIB] Making the Invisible Web more Visible

Avi Rappoport avi-list at searchtools.com
Fri Jun 7 19:37:39 EDT 2002

The place to start is the Open Archives Initiative, 

The major search engines tend to be iffy about metadata because they 
can't trust many of their sources -- too much spamming and scamming. 
Cliff Lynch has written some good stuff about trust and credibility, 
but it's not a trivial problem.


At 3:34 PM -0700 6/7/02, Hanan Cohen wrote:
>I have an idea that I thought WEB4LIB would be the best place to tell
>about it and see if it has something in it. I am not a librarian so
>please excuse me for using the wrong terms to express the wrong ideas
>(or vice versa), excuse me if WEB4LIB is the wrong place for this kind
>of message and excuse me if what I suggest has already been done.
>The problem
>We all know that a lot (if not most) of the information available on the
>Internet is invisible to indexing robots. They know how to index
>information presented as HTML  and only recently Google was able to show
>us content stored in DOC,RTF,PPT and PDF files. What's missing? Databases.
>What we have today are manually collected database directories. The
>databases are collected manually because there is no automatic way to
>index their content or their meta-data.
>Search robots cannot index information stored in databases because each
>database has it's own query syntax. Search robots are only able to index
>the HTML pages leading to those databases.
>It would be very good if there was an agreed upon standard for
>"exposing" ALL the information to indexing robots, but we know it's very
>The solution
>What I suggest is something simpler. Creating a standard for making the
>METADATA on the databases available for automatic indexing.
>Publishers would publish an XML file with a standard structure
>describing what's in their database.
>Indexing robots would find the standard XML file and index it in a
>special index. Google (or any other search facility) would have a
>"databases" tab on its interface and users would be able to search for
>databases containing the information they need.
>I am not sure of what standardizing body should take it as their mission
>to develop such a standard but I think it's essential.
>Thank for listening.
>Hanan Cohen - http://www.info.org.il/english/
>***Love and Peace***a

Complete Guide to Search Engines for Web Sites and Intranets

More information about the Web4lib mailing list