[Web4lib] Sitemap.xml

Marshall Breeding marshall.breeding at vanderbilt.edu
Fri Mar 30 12:54:35 EDT 2007


I've been using sitemap files extensively for some of the Web sites 
I manage.  As far as I know, Google is the only major search engine 
that uses it to facilitate the indexing of Web servers, though they 
have put the protocol out as an open specification for others to 
use.

The documentation for the protocol is here:

<https://www.google.com/webmasters/tools/docs/en/protocol.html>

Google only looks for it if you have registered the sitemap through 
your Google Webmaster Tools account.  So Web sites that haven't 
been registered with Google Sitemaps will not see unsolicited 
requests for them.

A simple example can be seen here:

<http://www.librarytechnology.org/sitemap.xml>

The use of XML sitemaps is especially needed for sites that have 
several thousand web pages, like our Vanderbilt Television News 
Archive site that includes over 850,000 pages we want indexed by 
Google. <http://openweb.tvnews.vanderbilt.edu/sitemap.xml>

Hope this helps.

-marshall

--On Friday, March 30, 2007 12:38 PM -0400 "VanderHart, Robert" 
<Robert.VanderHart at umassmed.edu> wrote:

> A speaker on SEO at the IA Summit earlier this week stated that
> it's very important to have a sitemap.xml file for your website
> to indicate to spiders how often to visit your site.  I know from
> reviewing our server access logs that spiders should request a
> robots.txt file before indexing a site, and when I grep the logs
> I see plenty of requests for that file.  But when I grep
> "sitemap.xml", I don't see a single request.
>
>
> So the question is, if a sitemap.xml file is so important, why
> aren't any spiders looking for the file?  I didn't raise the
> question to the speaker because I couldn't view our log files
> while I was at the Summit, so I wasn't certain whether we were
> getting any requests for sitemap.xml or not.
>
> Robert Vander Hart
> Electronic Resources Librarian
> Lamar Soutter Library
> University of Massachusetts Medical School
> Worcester  MA  01655
>
> Voice: 508-856-3290
> Email: Robert.VanderHart at umassmed.edu
> Web: http://library.umassmed.edu
>
> _______________________________________________
> Web4lib mailing list
> Web4lib at webjunction.org
> http://lists.webjunction.org/web4lib/



-----------------------------------------------------------------
Marshall Breeding
Director for Innovative Technologies and Research
Vanderbilt University Library
615-343-6094


More information about the Web4lib mailing list