[WEB4LIB] RE: Seattletimes.com: Public to taste life without its libraries
gary
gprice at gwu.edu
Tue Aug 20 21:31:25 EDT 2002
Nancy:
Google captures a copy of each page* it finds during its crawl and makes the page
available via the Google Cache. If a web site owner doesn't want a page(s)
cached, The Washington Post as an example, Google needs to be contacted or the
proper file needs to be placed on the server.
At the moment, Google has approx. 1300 pages from the www.spl.org domain in its
database.
I browsed through a few pages of results and all had cached versions available.
So, will SPL ask them to purge the cache? It's a good question.
Another question, if a web searcher were to access the page with links
to remotely accessible subscription databases via SPL
(http://216.239.51.100/search?q=cache:9xLUIuY--
C8C:www.spl.org/selectedsites/subscriptions.html+&hl=en&ie=UTF-8), will these
links be disconnected? What about the OPAC?
Finally, other search engines are caching pages. The very new Gigablast also
caches content. http://www.gigablast.com
Example of Cache:
http://www.gigablast.com/cgi/0.cgi?n=10&ns=2&sd=0&q=%22seattle+public+library%22
*Google crawls and caches the first 110k of a web pages. If a page is longer,
it's truncated at the 110 mark. According to Greg Notess, Google truncates most
pdf files at "about 120k". http://www.searchengineshowdown.com/new.shtml#may18
cheers,
gary
--
Gary D. Price, MLIS
Librarian
Gary Price Library Research and Internet Consulting
gary at freepint.com
The Virtual Acquisition Shelf and News Desk
http://resourceshelf.freepint.com
More information about the Web4lib
mailing list