[Web4lib] google & library catalogs

Michael McCulley drweb at san.rr.com
Thu Apr 13 22:10:50 EDT 2006


Good post, Jim.. You wrote, in part: "You have a topic, you put in 1-2
words, and Google, thanks to full-text searching, page rank, and plenty of
Web content so that your particular terms are likely to get found,
miraculously gets you pretty good results."

I don't know if Gary Price would update this now, but.. on this page from
2003, 
http://www.virtualchase.com/howto/gg_tips.html
he notes..

"Some documents are not completely indexed by Google. Indexing of the text
in Web pages stops after 101kb (For PDF, it's 120kb.)"

Aside: it's different for different engines, and I don't know which engine
(right now) indexes "more" of a Web page or document than Google. At one
time, AltaVista held that distinction, but now they are powered by Yahoo!
and it's not the same indexing.

Most people think 1MB is large for a Web "page" or "document," and it is..
in general. But for reports, government documents, books, etc., 1MB is not
that large. So, Google does not, in some cases, index entire documents, and
thus, in some cases, you are *not* searching the full text.

Best,
DrWeb

-- 
P. Michael McCulley aka DrWeb
mailto:drweb at san.rr.com
San Diego, CA 
http://drweb.typepad.com/

Quote of the Moment:
 I don't see you, so don't pretend to be there.
Thursday, April 13, 2006 6:57:44 PM 
 
>-----Original Message-----
>From: web4lib-bounces at webjunction.org 
>[mailto:web4lib-bounces at webjunction.org] On Behalf Of Jim Campbell
>Sent: Thursday, April 13, 2006 7:56 AM
>To: web4lib at webjunction.org
>Subject: RE: [Web4lib] google & library catalogs
>
>Ross makes an important point here, that if you're trying to 
>get users to
>notice your library materials when they're looking for 
>something on Google,
>getting them to limit their search in advance defeats the point. I'm a
>little cynical though about how often they will find our books 
>in a casual
>search.  When Open WorldCat first came up, I tried looking for 
>some current
>titles. Using the typical Google search of one or two 
>keywords, it was hard
>to find anything about a book, because most topics had a lot 
>of linked Web
>pages and page rank pulled them up first. Searching on exact 
>title typically
>got sites that mentioned the book and then 4-5 pages of 
>bookstore listings
>before a library link appeared.
>
>That said, most of the discussion of Google and opacs in 
>recent years has
>focused on discovery. You have a topic, you put in 1-2 words, 
>and Google,
>thanks to full-text searching, page rank, and plenty of Web 
>content so that
>your particular terms are likely to get found, miraculously 
>gets you pretty
>good results. Opacs lack full-text, any sort of linking that can help
>determine relevance (though circulation might be some help), 
>and they use a
>standardized vocabulary that may not how be you think of the 
>question.  So
>they're crippled to begin with and a lot of the opac "solutions" we're
>seeing these days are like putting lipstick on a pig.
[remainder snipped]



More information about the Web4lib mailing list