[Web4lib] How completely are you crawled?

Tim Spalding tim at librarything.com
Wed Jul 16 13:21:52 EDT 2008


>On the other hand, doing a "site:url" search tells me Google has indexed about 2100 URLs. Haven't figured that out yet.

Google is cagey about what it has and hasn't indexed, generally adding
a delay, to avoid tipping off Search Engine Optimization types.
Another trick is to search for the site minus some word that exists
nowhere on it.

Eg., "site:http://www.schenectadyhistory.org -dogslovemailmen"
(http://www.google.com/search?hl=en&safe=off&client=safari&rls=en-us&q=site%3Ahttp%3A%2F%2Fwww.schenectadyhistory.org+-dogslovemailmen&btnG=Search)

This is showing "about 2,100," but so is the straight site:url now anyway.

Tim




More information about the Web4lib mailing list