[Web4lib] How completely are you crawled?

Roy Tennant tennantr at oclc.org
Wed Jul 16 12:26:20 EDT 2008


In an email conversation with Debbie Campbell of the National Library of
Australia, it came out that although we both use Google Sitemaps[1] to
expose content to crawling that is behind a database wall (often termed the
"deep web"), neither of our sites was anywhere close to fully crawled by
Google despite this effort. Debbie reported something around 54% coverage of
Picture Australia's 1.5 million items while my coverage appeared to be about
37% of my 2,250 items. So size clearly does not appear to be an issue in
terms of percentage of coverage.

This made me wonder what others have been experiencing regarding crawling
coverage of their database-driven sites even when providing a Google
Sitemap. Can anyone else report their crawling statistics? If you're
registered your site map at Google Webmaster Tools[2], you can find the
appropriate statistic by selecting "Sitemap" from the menu on the left, then
clicking on the "Details" link beside the appropriate sitemap. Thanks,
Roy

[1] http://tinyurl.com/224cuu
[2] https://www.google.com/webmasters/tools/




More information about the Web4lib mailing list