[Web4lib] Spidering speed

Tim Spalding tim at librarything.com
Sat Mar 31 01:03:12 EDT 2007


Related to the sitemap issue...

Does anyone know what sort of tolerances OPACs have for spidering? Can
most handle the industry-standard 1 hit/second? (It may be industry
standard, but insofar as most OPACs are not spiderable with
conventional means, they may not be tested for it.) Can they handle
faster?

We're going to be introducing a bunch of library widgets soon,
providing LibraryThing functionality to conventional OPACs. Some of
the widgets require some data to be fetched off the OPAC, basically
through screen-scraping. We need to know how hard we can hit them.

Tim

On 3/30/07, Cary Gordon <listuser at chillco.com> wrote:
> AFAIK. this just affects being found, perhaps on page 10,032.
>
> Google rank is determined by many things, but a big factor is the number of
> links into your site. That is why when you mistype a popular URL, you almost
> always get a page full of links. Domain squatters register these domains
> using domain kiting (so they never have to pay), then sell links through SEO
> companies.
>
> Cary Gordon
> The Cherry Hill Company
> Los Angeles, CA 90064
> 310-397-2999 (voice)
> 866-375-2191 (fax)
> http://www.chillco.com
>
>
> -----Original Message-----
> From: web4lib-bounces at webjunction.org
> [mailto:web4lib-bounces at webjunction.org] On Behalf Of Tim Spalding
> Sent: Friday, March 30, 2007 10:29 AM
> To: web4lib at webjunction.org
> Subject: Re: [Web4lib] Sitemap.xml
>
> It's really not that important. SEO people-I was one of them, once, but a
> good one-tend to present what they do as black magic, and something you
> certainly need their help for. Coming up with a valid sitemap.xml is a
> service they sell. But, while it can help a bit on the margins, most
> high-ranking sites don't have one either, and your time is usually time
> better spent making your site better in other, visitor-focused ways.
>
> Incidentally, you can change the spider rate by signing up for Google
> Webmaster Tools. Sitemaps allows you to control it area-by-area.
>
> The main reason libraries don't score well in search engines are the
> session-based URLs in their OPACs.
>
> Tim
>
> On 3/30/07, Thomas Dowling <tdowling at ohiolink.edu> wrote:
> > On 3/30/2007 12:38 PM, VanderHart, Robert wrote:
> >
> > > A speaker on SEO at the IA Summit earlier this week stated that it's
> > > very important to have a sitemap.xml file for your website to
> > > indicate to spiders how often to visit your site.  I know from
> > > reviewing our server access logs that spiders should request a
> > > robots.txt file before indexing a site, and when I grep the logs I
> > > see plenty of requests for that file.  But when I grep "sitemap.xml", I
> don't see a single request.
> > >
> > >
> > > So the question is, if a sitemap.xml file is so important, why
> > > aren't any spiders looking for the file?  I didn't raise the
> > > question to the speaker because I couldn't view our log files while
> > > I was at the Summit, so I wasn't certain whether we were getting any
> > > requests for sitemap.xml or not.
> >
> >
> > Unlike robots.txt, you have to explicitly tell the search engines
> > about your sitemap.xml files.
> >
> >
> > https://www.google.com/webmasters/tools/docs/en/sitemap-generator.html
> > #submitting
> >
> > https://siteexplorer.search.yahoo.com/submit
> >
> >
> > --
> > Thomas Dowling
> > _______________________________________________
> > Web4lib mailing list
> > Web4lib at webjunction.org
> > http://lists.webjunction.org/web4lib/
> >
> _______________________________________________
> Web4lib mailing list
> Web4lib at webjunction.org
> http://lists.webjunction.org/web4lib/
>
>


More information about the Web4lib mailing list