[Web4lib] Sitemap.xml
Marshall Breeding
marshall.breeding at vanderbilt.edu
Fri Mar 30 12:54:35 EDT 2007
I've been using sitemap files extensively for some of the Web sites
I manage. As far as I know, Google is the only major search engine
that uses it to facilitate the indexing of Web servers, though they
have put the protocol out as an open specification for others to
use.
The documentation for the protocol is here:
<https://www.google.com/webmasters/tools/docs/en/protocol.html>
Google only looks for it if you have registered the sitemap through
your Google Webmaster Tools account. So Web sites that haven't
been registered with Google Sitemaps will not see unsolicited
requests for them.
A simple example can be seen here:
<http://www.librarytechnology.org/sitemap.xml>
The use of XML sitemaps is especially needed for sites that have
several thousand web pages, like our Vanderbilt Television News
Archive site that includes over 850,000 pages we want indexed by
Google. <http://openweb.tvnews.vanderbilt.edu/sitemap.xml>
Hope this helps.
-marshall
--On Friday, March 30, 2007 12:38 PM -0400 "VanderHart, Robert"
<Robert.VanderHart at umassmed.edu> wrote:
> A speaker on SEO at the IA Summit earlier this week stated that
> it's very important to have a sitemap.xml file for your website
> to indicate to spiders how often to visit your site. I know from
> reviewing our server access logs that spiders should request a
> robots.txt file before indexing a site, and when I grep the logs
> I see plenty of requests for that file. But when I grep
> "sitemap.xml", I don't see a single request.
>
>
> So the question is, if a sitemap.xml file is so important, why
> aren't any spiders looking for the file? I didn't raise the
> question to the speaker because I couldn't view our log files
> while I was at the Summit, so I wasn't certain whether we were
> getting any requests for sitemap.xml or not.
>
> Robert Vander Hart
> Electronic Resources Librarian
> Lamar Soutter Library
> University of Massachusetts Medical School
> Worcester MA 01655
>
> Voice: 508-856-3290
> Email: Robert.VanderHart at umassmed.edu
> Web: http://library.umassmed.edu
>
> _______________________________________________
> Web4lib mailing list
> Web4lib at webjunction.org
> http://lists.webjunction.org/web4lib/
-----------------------------------------------------------------
Marshall Breeding
Director for Innovative Technologies and Research
Vanderbilt University Library
615-343-6094
More information about the Web4lib
mailing list